tidymut.utils.cleaner_workers module

tidymut.utils.cleaner_workers.apply_single_mutation(row_data: Tuple, dataset_columns: Index, sequence_column: str, name_column: str, mutation_column: str, position_columns: Dict[str, str] | None, mutation_sep: str, is_zero_based: bool, sequence_class: Type[ProteinSequence | DNASequence | RNASequence]) Tuple[str | None, str | None][source]

Apply mutations to a single sequence.

Parameters:
  • row_data (Tuple) – Row data from the dataset

  • dataset_columns (Index) – Column names of the dataset

  • sequence_column (str) – Column name containing sequences

  • name_column (str) – Column name containing protein identifiers

  • mutation_column (str) – Column name containing mutation information

  • position_columns (Optional[Dict[str, str]]) – Position column mapping for sequence extraction

  • mutation_sep (str) – Separator for splitting multiple mutations

  • is_zero_based (bool) – Whether the mutation position is zero-based.

  • sequence_class (Type[Union[ProteinSequence, DNASequence, RNASequence]]) – Sequence class to use for mutation application

Returns:

(mutated_sequence, error_message) - either sequence or error, not both

Return type:

Tuple[Optional[str], Optional[str]]

tidymut.utils.cleaner_workers.infer_wt_sequence_grouped(group_data: Tuple[Any, pd.DataFrame], name_column: str, mutation_column: str, sequence_column: str, label_columns: List[str], wt_label: float, mutation_sep: str, is_zero_based: bool, handle_multiple_wt: Literal['error', 'separate', 'first'], sequence_class: Type[ProteinSequence | DNASequence | RNASequence], alphabet_class: Type[ProteinAlphabet | DNAAlphabet | RNAAlphabet]) Tuple[List[Dict[str, Any]], str][source]

Process a single protein group and return list of rows (including WT).

This is a module-level function that processes protein groups independently.

tidymut.utils.cleaner_workers.valid_single_mutation(args: Tuple) Tuple[str | None, str | None][source]

Process a single mutation string.

Parameters:

args (Tuple) – (mut_info, format_mutations, mutation_sep, is_zero_based, cache)

Returns:

(formatted_mutation, error_message) - one will be None

Return type:

Tuple[Optional[str], Optional[str]]