3.7. Regex patterns

This file hosts a centralized point for all regex patterns used in pymzML

Note

We need some comment lines for each regex, i.e. what does it catch .. maybe with examples and stuff ...

Collection of regular expressions to catch spectrum XML-tags.

pymzml.regex_patterns.CHROMATOGRAM_AND_SPECTRUM_PATTERN_WITH_ID = re.compile('<\\s*(chromatogram|spectrum)\\s*(id=(\\".*?\\")|index=\\".*?\\")\\s(id=(\\".*?\\"))*\\s*.*\\sdefaultArrayLength=\\"[0-9]+\\">')

Regex to catch combined chromatogram and spectrum patterns

pymzml.regex_patterns.CHROMATOGRAM_ID_PATTERN = re.compile('<chromatogram.*id="(.*?)".*?>')

Regex to catch chromatogram id patterns

pymzml.regex_patterns.CHROMATOGRAM_PATTERN = re.compile('<chromatogram.*id="(.*?)".*?>')

Regex to catch chromatogram id pattern (again ?)

pymzml.regex_patterns.FILE_ENCODING_PATTERN = re.compile(b'encoding="(?P<encoding>[A-Za-z0-9-]*)"')

Regex to catch xml file encoding

pymzml.regex_patterns.MOBY_DICK_CHAPTER_PATTERN = re.compile('CHAPTER ([0-9]+).*')

Regex to catch moby dick chapter number used in the index gezip writer example.

pymzml.regex_patterns.SIM_INDEX_PATTERN = re.compile(b'(?P<type>idRef=")(?P<nativeID>.*)">(?P<offset>[0-9]*)</offset>')

Regex pattern for SIM index

pymzml.regex_patterns.SPECTRUM_CLOSE_PATTERN = re.compile(b'</spectrum>')

Regex to catch spectrum xml close tags

pymzml.regex_patterns.SPECTRUM_ID_PATTERN = re.compile('[0-9]*$')

Simplified spectrum id regex. Greedly catches ints at the end of line

pymzml.regex_patterns.SPECTRUM_INDEX_PATTERN = re.compile(b'(?P<type>(scan=|nativeID="))(?P<nativeID>[0-9]*)">"(?P<offset>[0-9]*)</offset>')

Regex pattern for spectrum index works for obo format 1.1.0 until <last version checked>

Catches:
  1. demo 1
  2. demo 2
pymzml.regex_patterns.SPECTRUM_OPEN_PATTERN = re.compile(b'<*spectrum[^>]*index="(?P<index>[0-9]+)" id="(?P<id>[^"]+)" defaultArrayLength="[0-9]+">')

Regex to catch specturm open xml tag with encoded array length

pymzml.regex_patterns.SPECTRUM_TAG_PATTERN = re.compile('<spectrum.*?id="(?P<index>[^"]+)".*?>')

Regex to catch spectrum tag pattern