3.2. Spectrum and Chromatogram¶
The spectrum class offers a python object for mass spectrometry data.
The spectrum object holds the basic information of the spectrum and offers
methods to interrogate properties of the spectrum.
Data, i.e. mass over charge (m/z) and intensity decoding is performed on demand
and can be accessed via their properties, e.g. peaks
.
The Spectrum class is used in the Reader
class.
There each spectrum is accessible as a spectrum object.
Theoretical spectra can also be created using the setter functions.
For example, m/z values, intensities, and peaks can be set by the
corresponding properties: pymzml.spec.Spectrum.mz
,
pymzml.spec.Spectrum.i
, pymzml.spec.Spectrum.peaks
.
Similar to the spectrum class, the chromatogram class allows interrogation with profile data (time, intensity) in an total ion chromatogram.
3.2.1. Spectrum¶
-
class
pymzml.spec.
Spectrum
(element=None, measured_precision=5e-06)[source]¶ Spectrum class which inherits from class
pymzml.spec.MS_Spectrum
Parameters: element (xml.etree.ElementTree.Element) – spectrum as xml element Keyword Arguments: measured_precision (float) – in ppm, i.e. 5e-6 equals to 5 ppm. -
ID
¶ Access the native id (last number in the id attribute) of the spectrum.
Returns: ID – native ID of the spectrum Return type: str
-
TIC
¶ Property to access the total ion current for this spectrum.
Returns: TIC – Total Ion Current of the spectrum. Return type: float
-
estimated_noise_level
(mode='median')[source]¶ Calculates noise threshold for function remove_noise.
Different modes are available. Default is ‘median’
Keyword Arguments: mode (str) – define mode for removing noise. Default = “median” (other modes: “mean”, “mad”) Returns: noise_level – estimate noise threshold Return type: float
-
extreme_values
(key)[source]¶ Find extreme values, minimal and maximum m/z and intensity
Parameters: key (str) – m/z : “mz” or intensity : “i” Returns: extrema – tuple of minimal and maximum m/z or intensity Return type: tuple
-
has_overlapping_peak
(mz)[source]¶ Checks if a spectrum has more than one peak for a given m/z value and within the measured precision
Parameters: mz (float) – m/z value which should be checked Returns: Boolean – Returns True
if a nearby peak is detected, otherwiseFalse
Return type: bool
-
has_peak
(mz2find)[source]¶ Checks if a Spectrum has a certain peak. Requires a m/z value as input and returns a list of peaks if the m/z value is found in the spectrum, otherwise
[]
is returned. Every peak is a tuple of m/z and intensity.Note
Multiple peaks may be found, depending on the defined precisions
Parameters: mz2find (float) – m/z value which should be found Returns: peaks – list of m/z, i tuples Return type: list Example:
>>> import pymzml >>> example_file = 'tests/data/example.mzML' >>> run = pymzml.run.Reader( ... example_file, ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for spectrum in run: ... if spectrum.ms_level == 2: ... peak_to_find = spectrum.has_peak(1016.5404) ... print(peak_to_find) [(1016.5404, 19141.735187697403)]
-
highest_peaks
(n)[source]¶ Function to retrieve the n-highest centroided peaks of the spectrum.
Parameters: n (int) – number of highest peaks to return. Returns: centroided peaks – list mz, i tupls with n-highest Return type: list Example:
>>> run = pymzml.run.Reader( ... "tests/data/example.mzML.gz", ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for spectrum in run: ... if spectrum.ms_level == 2: ... if spectrum.ID == 1770: ... for mz,i in spectrum.highest_peaks(5): ... print(mz, i)
-
i
¶ Returns the list of the intensity values. If the intensity values are encoded, the function
_decode()
is used to decode the encoded data.The i property can also be set, e.g. for theoretical data. However, it is recommended to use the peaks property to set mz and intensity tuples at same time.
- Returns
- i (list): list of intensity values from the analyzed spectrum
-
measured_precision
¶ Sets the measured and internal precision
Returns: value – measured precision (e.g. 5e-6) Return type: float
-
ms_level
¶ Property to access the ms level.
Returns: ms_level Return type: int
-
mz
¶ Returns the list of m/z values. If the m/z values are encoded, the function
_decode()
is used to decode the encoded data. The mz property can also be set, e.g. for theoretical data. However, it is recommended to use the peaks property to set mz and intensity tuples at same time.Returns: mz – list of m/z values of spectrum. Return type: list
-
peaks
(peak_type)[source]¶ Decode and return a list of mz/i tuples.
Parameters: peak_type (str) – currently supported types are: raw, centroided and reprofiled Returns: peaks – list or numpy array of mz/i tuples or arrays Return type: list or ndarray
-
ppm2abs
(value, ppm_value, direction=1, factor=1)[source]¶ Returns the value plus (or minus, dependent on direction) the error (measured precision ) for this value.
Parameters: - value (float) – m/z value
- ppm_value (int) – ppm value
Keyword Arguments: - direction (int) – plus or minus the considered m/z value. The argument direction should be 1 or -1
- factor (int) – multiplication factor for the imprecision. The argument factor should be bigger than 0
Returns: imprecision – imprecision for the given value
Return type: float
-
reduce
(mz_range=(None, None))[source]¶ Remove all m/z values outside the given range.
Parameters: mz_range (tuple) – tuple of min, max values Returns: peaks – list of mz, i tuples in the given range. Return type: list
-
remove_noise
(mode='median', noise_level=None)[source]¶ Function to remove noise from peaks, centroided peaks and reprofiled peaks.
Keyword Arguments: - mode (str) – define mode for removing noise. Default = “median”
- (other modes – “mean”, “mad”)
noise_level (float): noise threshold
Returns: reprofiled peaks – Returns a list with tuples of m/z-intensity pairs above the noise threshold
Return type: list
-
scan_time
¶ Property to access the retention time in minutes.
Returns: scan_time Return type: float
-
selected_precursors
¶ Property to access the selected precursors of an MS2 spectrum. Returns m/z, intensity tuples of the selected precursor ions.
Returns: selected_precursors Return type: list
-
set_peaks
(peaks, peak_type)[source]¶ Assign a custom peak array of type peak_type
Parameters: - peaks (list or ndarray) – list or array of mz/i values
- peak_type (str) – Either raw, centroided or reprofiled
-
similarity_to
(spec2, round_precision=0)[source]¶ Compares two spectra and returns cosine
Parameters: spec2 (Spectrum) – another pymzml spectrum that is compared to the current spectrum. Keyword Arguments: round_precision (int) – precision mzs are rounded to, i.e. round( mz, round_precision ) Returns: cosine – - value between 0 and 1, i.e. the cosine between the
- two spectra.
Return type: float Note
Spectra data is transformed into an n-dimensional vector, where m/z values are binned in bins of 10 m/z and the intensities are added up. Then the cosine is calculated between those two vectors. The more similar the specs are, the closer the value is to 1.
-
t_mz_set
¶ Creates a set of integers out of transformed m/z values (including all values in the defined imprecision). This is used to accelerate has_peak function and similar.
Returns: t_mz_set – set of transformed m/z values Return type: set
-
transform_mz
(value)[source]¶ pymzml uses an internal precision for different tasks. This precision depends on the measured precision and is calculated when
spec.Spectrum.measured_precision
is invoked. transform_mz can be used to transform m/z values into the internal standard.Parameters: value (float) – m/z value Returns: transformed value – to internal standard transformed mz value this value can be used to probe internal dictionaries, lists or sets, e.g. pymzml.spec.Spectrum.t_mz_set()
Return type: float Example
>>> import pymzml >>> run = pymzml.run.Reader( ... "test.mzML.gz" , ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> >>> for spectrum in run: ... if spectrum.ms_level == 2: ... peak_to_find = spectrum.has_deconvoluted_peak( ... 1044.5804 ... ) ... print(peak_to_find) [(1044.5596, 3809.4356300564586)]
-
transformed_mz_with_error
¶ Returns transformed m/z value with error
Returns: tmz values – Transformed m/z values in dictionary
{m/z_with_error : [(m/z,intensity), ...], ...
}
Return type: dict
-
transformed_peaks
¶ m/z value is multiplied by the internal precision.
Returns: Transformed peaks – Returns a list of peaks (tuples of mz and intensity). Float m/z values are adjusted by the internal precision to integers. Return type: list
-
3.2.2. Chromatogram¶
-
class
pymzml.spec.
Chromatogram
(element, measured_precision=5e-06, param=None)[source]¶ Class for Chromatogram access and handling.
-
peaks
¶ Returns the list of peaks of the spectrum as tuples (time, intensity).
Returns: peaks – list of time, intensity tuples Return type: list Example:
>>> import pymzml >>> run = pymzml.run.Reader( ... spectra.mzMl.gz, ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for entry in run: ... if isinstance(entry, pymzml.spec.Chromatogram): ... for time, intensity in entry.peaks: ... print(time, intensity)
Note
The peaks property can also be set, e.g. for theoretical data. It requires a list of time/intensity tuples.
-
profile
¶ Returns the list of peaks of the chromatogram as tuples (time, intensity).
Returns: peaks – list of time, i tuples Return type: list Example:
>>> import pymzml >>> run = pymzml.run.Reader( ... spectra.mzMl.gz, ... MS_precisions = { ... 1 : 5e-6, ... 2 : 20e-6 ... } ... ) >>> for entry in run: ... if isinstance(entry, Chromatogram): ... for time, intensity in entry.peaks: ... print(time, intensity)
Note
The peaks property can also be set, e.g. for theoretical data. It requires a list of time/intensity tuples.
-
time
¶ Returns the list of time values. If the time values are encoded, the function _decode() is used to decode the encoded data.
The time property can also be set, e.g. for theoretical data. However, it is recommended to use the profile property to set time and intensity tuples at same time.
Returns: time – list of time values from the analyzed chromatogram Return type: list
-
3.2.3. MS_Spectrum¶
-
class
pymzml.spec.
MS_Spectrum
[source]¶ General spectrum class for data handling.
-
get_element_by_name
(name)[source]¶ Get element from the original tree by it’s unit name.
Parameters: name (str) – unit name of the mzml element. Keyword Arguments: obo_version (str, optional) – obo version number.
-
get_element_by_path
(hooks)[source]¶ Find elements in spectrum by its path.
Parameters: hooks (list) – list of parent elements for the target element. Returns: elements – list of XML objects found in the path Return type: list Example
To access cvParam in scanWindow tag:
>>> spec.get_element_by_path(['scanList', 'scan', 'scanWindowList', ... 'scanWindow', 'cvParam'])
-
measured_precision
¶ Set the measured and internal precision.
Returns: value – measured Precision (e.g. 5e-6) Return type: float
-
precursors
¶ List the precursor information of this spectrum, if available.
Returns: precursor – list of precursor ids for this spectrum. Return type: list
-
to_string
(encoding='latin-1', method='xml')[source]¶ Return string representation of the xml element the spectrum was initialized with.
Keyword Arguments: encoding (str) – text encoding of the returned string.
Default is latin-1.
method (str) – text format of the returned string.
Default is xml, alternatives are html and text.
Returns: element – xml string representation of the spectrum.
Return type: str
-