vbvarsel.custodian

Classes

UserDataHandler

Class object to represent user supplied data.

Module Contents

class vbvarsel.custodian.UserDataHandler[source]

Class object to represent user supplied data.

ExperimentValues[source]
normalise_data(data: pandas.DataFrame) numpy.ndarray[source]

Function that returns a normalised dataframe from a non-normalised input.

Params
data: pd.DataFrame

DataFrame of unsorted, unshuffled and non-normalised data to be used.

Returns
array_normalised: np.ndarray

2-D array of normalised data

shuffle_normalised_data(normalised_data: numpy.ndarray) pandas.Series | numpy.ndarray[source]

Shuffles a DataFrame of normalised data.

Params
normalised_data: pd.DataFrame

A normalisd dataframe of numerical data.

Returns
Tuple:
shuffled_data: pd.Series, shuffled_indices: np.ndarray

A tuple of shuffled data and their corresponding indices.

load_data(data_source: str | os.PathLike, cols_to_ignore: list[str] = None, labels: str | list[str] = None, header: int = 0, index_col: bool = False) None[source]

Loads data to be be used in simulations with option to clean data.

Params
data_loc: str | os.Pathlike

The file location of the spreadsheet, must be in CSV format. IMPORTANT Format note: Columns should be variables, and rows should be observation. Header rows and index column will not be loaded. CSV must only have numerical values. Non-numerical values can be passed, via the cols_to_ignore parameter.

cols_to_ignore: list[str] (Optional) (Default: None)

Any columns which are irrelevant or non-numerical to be excluded from analysis.

labels: str | list[str] (Optional) (Default: None)

Labels to be used to calculate the ARI to check clustering accuracy. This parameter is optional but strongly encouraged. If a string value is passed, logic assumes that is the name of a column to be used as labels. Alternatively, a list of strings may be passed separately. Label column can be included in the cols_to_drop parameter, labels are extracted before the ignored columns are dropped.

header: int (Optional) (Default: 0)

Parameter to determine the header of the incoming dataframe.

index_col: bool (Optional) (Default: False)

Parameter to determine whether to include an index col.

Returns

None