tidymut.utils.dataset_builders module

Functions are used in tidymut.cleaners.basic_cleaners.convert_to_mutation_dataset_format() >>> # format 1: >>> pd.DataFrame({ … ‘name’: [‘prot1’, ‘prot1’, ‘prot1’, ‘prot2’, ‘prot2’], … ‘mut_info’: [‘A0S,Q1D’, ‘C2D’, ‘WT’, ‘E0F’, ‘WT’], … ‘mut_seq’: [‘SDCDEF’, ‘AQDDEF’, ‘AQCDEF’, ‘FGHIGHK’, ‘EGHIGHK’], … ‘score’: [1.5, 2.0, 0.0, 3.0, 0.0] … }) >>> >>> # format 2: >>> df2 = pd.DataFrame({ … ‘name’: [‘prot1’, ‘prot1’, ‘prot2’], … ‘sequence’: [‘AKCDEF’, ‘AKCDEF’, ‘FEGHIS’], … ‘mut_info’: [‘A0K,C2D’, ‘Q1P’, ‘E1F’], … ‘score’: [1.5, 2.0, 3.0], … ‘mut_seq’: [‘KKDDEF’, ‘APCDEF’, ‘FFGHIS’] … })

tidymut.utils.dataset_builders.convert_format_1(df: pd.DataFrame, name_column: str, mutation_column: str, mutated_sequence_column: str, score_column: str, include_wild_type: bool, mutation_set_prefix: str, is_zero_based: bool, additional_metadata: Dict[str, Any] | None, sequence_class: Type[ProteinSequence | DNASequence | RNASequence]) Tuple[pd.DataFrame, Dict[str, str]][source]

Convert Format 1 (with WT rows) to mutation dataset format.

tidymut.utils.dataset_builders.convert_format_2(df: pd.DataFrame, name_column: str, mutation_column: str, sequence_column: str, score_column: str, mutation_set_prefix: str, is_zero_based: bool, additional_metadata: Dict[str, Any] | None, sequence_class: Type[ProteinSequence | DNASequence | RNASequence]) Tuple[pd.DataFrame, Dict[str, str]][source]

Convert Format 2 (with sequence column) to mutation dataset format.