3.7. Regex patterns¶
This file hosts a centralized point for all regex patterns used in pymzML
Note
We need some comment lines for each regex, i.e. what does it catch .. maybe with examples and stuff ...
Collection of regular expressions to catch spectrum XML-tags.
-
pymzml.regex_patterns.
CHROMATOGRAM_AND_SPECTRUM_PATTERN_WITH_ID
= re.compile('<\\s*(chromatogram|spectrum)\\s*(id=(\\".*?\\")|index=\\".*?\\")\\s(id=(\\".*?\\"))*\\s*.*\\sdefaultArrayLength=\\"[0-9]+\\">')¶ Regex to catch combined chromatogram and spectrum patterns
-
pymzml.regex_patterns.
CHROMATOGRAM_ID_PATTERN
= re.compile('<chromatogram.*id="(.*?)".*?>')¶ Regex to catch chromatogram id patterns
-
pymzml.regex_patterns.
CHROMATOGRAM_PATTERN
= re.compile('<chromatogram.*id="(.*?)".*?>')¶ Regex to catch chromatogram id pattern (again ?)
-
pymzml.regex_patterns.
FILE_ENCODING_PATTERN
= re.compile(b'encoding="(?P<encoding>[A-Za-z0-9-]*)"')¶ Regex to catch xml file encoding
-
pymzml.regex_patterns.
MOBY_DICK_CHAPTER_PATTERN
= re.compile('CHAPTER ([0-9]+).*')¶ Regex to catch moby dick chapter number used in the index gezip writer example.
-
pymzml.regex_patterns.
SIM_INDEX_PATTERN
= re.compile(b'(?P<type>idRef=")(?P<nativeID>.*)">(?P<offset>[0-9]*)</offset>')¶ Regex pattern for SIM index
-
pymzml.regex_patterns.
SPECTRUM_CLOSE_PATTERN
= re.compile(b'</spectrum>')¶ Regex to catch spectrum xml close tags
-
pymzml.regex_patterns.
SPECTRUM_ID_PATTERN
= re.compile('[0-9]*$')¶ Simplified spectrum id regex. Greedly catches ints at the end of line
-
pymzml.regex_patterns.
SPECTRUM_INDEX_PATTERN
= re.compile(b'(?P<type>(scan=|nativeID="))(?P<nativeID>[0-9]*)">"(?P<offset>[0-9]*)</offset>')¶ Regex pattern for spectrum index works for obo format 1.1.0 until <last version checked>
- Catches:
- demo 1
- demo 2
-
pymzml.regex_patterns.
SPECTRUM_OPEN_PATTERN
= re.compile(b'<*spectrum[^>]*index="(?P<index>[0-9]+)" id="(?P<id>[^"]+)" defaultArrayLength="[0-9]+">')¶ Regex to catch specturm open xml tag with encoded array length
-
pymzml.regex_patterns.
SPECTRUM_TAG_PATTERN
= re.compile('<spectrum.*?id="(?P<index>[^"]+)".*?>')¶ Regex to catch spectrum tag pattern