Using a protein model

We use apps to load unaligned DNA sequences and to translate them into amino acids.

from cogent3 import get_app

loader = get_app("load_unaligned", format="fasta")
to_aa = get_app("translate_seqs")
process = loader + to_aa
seqs = process("data/SCA1-cds.fasta")

Protein alignment with default settings

The default setting for “protein” is a WG01 model.

from cogent3 import get_app

aa_aligner = get_app("progressive_align", "protein")
aligned = aa_aligner(seqs)
aligned
0
HumanMKSNQERSNECLPPKKREIPATSRSSEEKAPTLPSDNHRVEGTAWLPGNPGGRGHGGGRH
Chimp............................................................
Rat........................P.....TA......C...V....ST..S........
Mouse........................P.....TA......C...V....ST..I........
Mouse Lemur...............................A.......A..AP................
Macaque........................P......A............................

6 x 825 (truncated to 6 x 60) protein alignment

Specify a different distance measure for estimating the guide tree

The distance measures available are percent or paralinear.

Note

An estimated guide tree has its branch lengths scaled so they are consistent with usage in a codon model.

aa_aligner = get_app("progressive_align", "protein", distance="paralinear")
aligned = aa_aligner(seqs)
aligned
0
HumanMKSNQERSNECLPPKKREIPATSRSSEEKAPTLPSDNHRVEGTAWLPGNPGGRGHGGGRH
Rat........................P.....TA......C...V....ST..S........
Mouse........................P.....TA......C...V....ST..I........
Mouse Lemur...............................A.......A..AP................
Macaque........................P......A............................
Chimp............................................................

6 x 825 (truncated to 6 x 60) protein alignment

Alignment settings and file provenance are recorded in the info attribute

aligned.info
{'Refs': {},
 'source': 'data/SCA1-cds.fasta',
 'align_params': {'indel_length': 0.1,
  'indel_rate': 1e-10,
  'guide_tree': "(((Rat:0.004763355238688913,Mouse:0.011219581285708921):0.052856143725369786,Mouse_Lemur:0.03580862702845759):0.024351474041303396,(Macaque:0.0023127545121458606,(Chimp:0.008168683695808834,Human:0.00019740149152159842):0.0030743103943117536)'AUTOGENERATED_NAME_SD':1e-06);",
  'model': 'JTT92',
  'lnL': -3208.5222197901767}}