vbvarsel.custodian
Classes
Class object to represent user supplied data. |
Module Contents
- class vbvarsel.custodian.UserDataHandler[source]
Class object to represent user supplied data.
- normalise_data(data: pandas.DataFrame) numpy.ndarray [source]
Function that returns a normalised dataframe from a non-normalised input.
- Params
- data: pd.DataFrame
DataFrame of unsorted, unshuffled and non-normalised data to be used.
- Returns
- array_normalised: np.ndarray
2-D array of normalised data
- shuffle_normalised_data(normalised_data: numpy.ndarray) pandas.Series | numpy.ndarray [source]
Shuffles a DataFrame of normalised data.
- Params
- normalised_data: pd.DataFrame
A normalisd dataframe of numerical data.
- Returns
- Tuple:
- shuffled_data: pd.Series, shuffled_indices: np.ndarray
A tuple of shuffled data and their corresponding indices.
- load_data(data_source: str | os.PathLike, cols_to_ignore: list[str] = None, labels: str | list[str] = None, header: int = 0, index_col: bool = False) None [source]
Loads data to be be used in simulations with option to clean data.
- Params
- data_loc: str | os.Pathlike
The file location of the spreadsheet, must be in CSV format. IMPORTANT Format note: Columns should be variables, and rows should be observation. Header rows and index column will not be loaded. CSV must only have numerical values. Non-numerical values can be passed, via the cols_to_ignore parameter.
- cols_to_ignore: list[str] (Optional) (Default: None)
Any columns which are irrelevant or non-numerical to be excluded from analysis.
- labels: str | list[str] (Optional) (Default: None)
Labels to be used to calculate the ARI to check clustering accuracy. This parameter is optional but strongly encouraged. If a string value is passed, logic assumes that is the name of a column to be used as labels. Alternatively, a list of strings may be passed separately. Label column can be included in the cols_to_drop parameter, labels are extracted before the ignored columns are dropped.
- header: int (Optional) (Default: 0)
Parameter to determine the header of the incoming dataframe.
- index_col: bool (Optional) (Default: False)
Parameter to determine whether to include an index col.
- Returns
None