RedLevel stand for redundancy level. GenBanK and RefSeq assembly have a unique identifier but the same assembly can be found in GenBanK and/or RefSeq. One assembly can have mulitple version (including minor change in their annotation for example). Here, several levels were defined to overcome the redundancy implied by assembly versionning and the database duo GenBank-RefSeq.
There are 4 levels of redundancy :
This section show the percentage of genome entry at different level of redundancy with or without ccyA.
{{ metrics_fig }}The chart below show the increasing number of genome at different level of redundancy over the time.
{{ genome_over_time }}The lineplot below show the total number of calcyanin sequences over the time for the higher level of redundancy. Therefore there might be duplicated sequences due to GenBank/RefSeq versionning and assembly versionning.
{{ sequence_over_time }}Sunburst and treemap chart display the same type of data. They show the number of sequence by categories in a hierarchical way. Starting from the N-ter type to the date of analysis. If you click on a specific area you will see the number of sequences for each sub-catergories.
The decision tree below is used to classify sequences with a significative match against the GlyX3 HMM profile. Red and green edges indicate respectively negative and positive answers. Shortly, for sequences with a match against the GlyX3 HMM profile, we look at the presence and order on the sequence of each Glycine Zipper and we use a set of known N-ter to infer the nature of the N-ter extremity of those sequences. Finaly a label is assign for each of them depending on their modular organization.
{{ decision_tree }}
This section is dedicated to the modular organization of the calcyanin.
Sequences are grouped based on their N-ter type whatever their flag (see Calcyanin classification section).
It makes it possible to visualize the size of the sequences and the position of the different domains.
Input field above the list can be used to filter entries based on major attributes.
Clicking on an entry will give you access to the protein sequence(s) attached to it (if any). Related
informations about the assembly and/or the sequence will be shown at the end of the section.
Additionnaly, you can use the green icon on the right of a ccyA+ entry to add to cart.
The MULTIPLE flag indicate that for this genome, multiple sequences had a hit against the GlyX3 HMM profile.
The browse data section contain all the datas about genomes processed by pcalf-annotate-workflow, from genomes to calcyanin features. You can filter datas based on Organism name, accession , sequence accession, flag or N-ter type. For that you should use the search bars below. The keyword order doesn't matter. On click, a detailed view of the entry will be produced, including genome metadatas , and sequence informations if any.
The picture below describe the workflow use by pcalf to retrieve calcyanin from a set of protein sequence or directly from genomes.