Filters as used in msaccess, msconvert, and other ProteoWizard tools
Note:
Each filter entry listed below has the form
filtername filterargs
where filtername is the literal name of the filter and filterargs is a list of arguments that you replace with actual values.
For example, the "index" filter as described below would be used like this in the msconvert program:
--filter "index 1-15"
Filters are applied sequentially in the order that you list them, and the sequence order can make a large difference in your output. In particular, the peakPicking filter must be first in line if you wish to use the vendor-supplied centroiding algorithms since these use the vendor DLLs, which only operate on raw untransformed data.
Many filters take 'int_set' arguments. An int_set is a set of integers written as a list of intervals of the form [a,b] or a[-][b].
For example '[0,3]' and '0-3' both mean 'the set of integers from 0 to 3 inclusive'.
'1-' means 'the set of integers from 1 to the largest allowable number'.
'9' is also an integer set, equivalent to '[9,9]'.
'[0,2] 5-7' is the set '0 1 2 5 6 7'.
The Filters
index <index_value_set>
Selects spectra by index - an index value 0-based numerical order in which the spectrum appears in the input.
<index_value_set> is an int_set of indexes.
msLevel <mslevels>
This filter selects only spectra with the indicated <mslevels>, expressed as an int_set.
chargeState <charge_states>
This filter keeps spectra that match the listed charge state(s), expressed as an int_set. Both known/single and possible/multiple charge states are tested. Use 0 to include spectra with no charge state at all.
precursorRecalculation
This filter recalculates the precursor m/z and charge for MS2 spectra. It looks at the prior MS1 scan to better infer the parent mass. However, it only works on orbitrap and FT data,although it does not use any 3rd party (vendor DLL) code. Since the time the code was written, Thermo has since fixed up its own estimation in response, so it's less critical than it used to be (though can still be useful).
mzRefiner input1.pepXML input2.mzid [msLevels=<1->] [thresholdScore=<CV_Score_Name>] [thresholdValue=<floatset>] [thresholdStep=<float>] [maxSteps=<count>]
This filter recalculates the m/z and charges, adjusting precursors for MS2 spectra and spectra masses for MS1 spectra. It uses an ident file with a threshold field and value to calculate the error and will then choose a shifting mechanism to correct masses throughout the file. It only works on orbitrap, FT, and TOF data. It is designed to work on mzML files created by msconvert from a single dataset (single run), and with an identification file created using that mzML file. It does not use any 3rd party (vendor DLL) code. Recommended Score and thresholds: MS-GF:SpecEValue,-1e-10 (<1e-10); MyriMatch:MVH,35- (>35); xcorr,3- (>3)
lockmassRefiner mz=<real> mzNegIons=<real (mz)> tol=<real (1.0 Daltons)>
For Waters data, adjusts m/z values according to the specified lockmass m/z and tolerance. Distinct m/z value for negative ions is optional and defaults to the given mz value. For other data, currently does nothing.
precursorRefine
This filter recalculates the precursor m/z and charge for MS2 spectra. It looks at the prior MS1 scan to better infer the parent mass. It only works on orbitrap, FT, and TOF data. It does not use any 3rd party (vendor DLL) code.
peakPicking [<PickerType> [snr=<minimum signal-to-noise ratio>] [peakSpace=<minimum peak spacing>] [msLevel=<ms_levels>]]
This filter performs centroiding on spectra with the selected <ms_levels>, expressed as an int_set. The value for <PickerType> must be "cwt" or "vendor": when <PickerType> = "vendor", vendor (Windows DLL) code is used if available.
IMPORTANT NOTE: since this filter operates on the raw data through the vendor DLLs, IT MUST BE THE FIRST FILTER IN ANY LIST OF FILTERS when "vendor" is used.
The other option for PickerType is "cwt", which uses ProteoWizard's wavelet-based algorithm for performing peak-picking with a wavelet-space signal-to-noise ratio of <signal-to-noise ratio>.
Defaults:
<PickerType> is a low-quality (non-vendor) local maxima algorithm
<signal-to-noise ratio> = 1.0
<minimum peak spacing> = 0.1
<ms_levels> = 1-
scanNumber <scan_numbers>
This filter selects spectra by scan number. Depending on the input data type, scan number and spectrum index are not always the same thing - scan numbers are not always contiguous, and are usually 1-based.
<scan_numbers> is an int_set of scan numbers to be kept.
scanEvent <scan_event_set>
This filter selects spectra by scan event. For example, to include all scan events except scan event 5, use filter "scanEvent 1-4 6-". A "scan event" is a preset scan configuration: a user-defined scan configuration that specifies the instrumental settings in which a spectrum is acquired. An instrument may cycle through a list of preset scan configurations to acquire data. This is a more generic term for the Thermo "scan event", which is defined in the Thermo Xcalibur glossary as: "a mass spectrometer scan that is defined by choosing the necessary scan parameter settings. Multiple scan events can be defined for each segment of time.".
scanTime <scan_time_range>
This filter selects only spectra within a given time range.
<scan_time_range> is a time range, specified in seconds. For example, to select only spectra within the second minute of the run, use "scanTime [60,119.99]".
sortByScanTime
This filter reorders spectra, sorting them by ascending scan start time.
stripIT
This filter rejects ion trap data spectra with MS level 1.
metadataFixer
This filter is used to add or replace a spectra's TIC/BPI metadata, usually after peakPicking where the change from profile to centroided data may make the TIC and BPI values inconsistent with the revised scan data. The filter traverses the m/z intensity arrays to find the sum and max. For example, in msconvert it can be used as: --filter "peakPicking true 1-" --filter metadataFixer. It can also be used without peak picking for some strange results. Certainly adding up all the samples of profile data to get the TIC is just wrong, but we do it anyway.
titleMaker <format_string>
This filter adds or replaces spectrum titles according to specified <format_string>. You can use it, for example, to customize the TITLE line in MGF output in msconvert. The following keywords are recognized:
<RunId> - prints the spectrum's Run id - for example, "Data.d" from "C:/Agilent/Data.d/AcqData/mspeak.bin"
<Index> - prints the spectrum's index
<Id> - prints the spectrum's nativeID
<SourcePath> - prints the path of the spectrum's source data
<ScanNumber> - if the nativeID can be represented as a single number, prints that number, else index+1
<ActivationType> - for the first precursor, prints the spectrum's "dissociation method" value
<IsolationMz> - for the first precursor, prints the the spectrum's "isolation target m/z" value
<PrecursorSpectrumId> - prints the nativeID of the spectrum of the first precursor
<SelectedIonMz> - prints the m/z value of the first selected ion of the first precursor
<ChargeState> - prints the charge state for the first selected ion of the first precursor
<SpectrumType> - prints the spectrum type
<ScanStartTimeInSeconds> - prints the spectrum's first scan's start time, in seconds
<ScanStartTimeInMinutes> - prints the spectrum's first scan's start time, in minutes
<BasePeakMz> - prints the spectrum's base peak m/z
<BasePeakIntensity> - prints the spectrum's base peak intensity
<TotalIonCurrent> - prints the spectrum's total ion current
<MsLevel> - prints the spectrum's MS level
For example, to create a TITLE line in msconvert MGF output with the "name.first_scan.last_scan.charge" style (eg. "mydata.145.145.2"), use --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState>"
threshold <type> <threshold> <orientation> [<mslevels>]
This filter keeps data whose values meet various threshold criteria.
<type> must be one of:
count - keep the n=<threshold> [most|least] intense data points, where n is an integer. Any data points with the same intensity as the nth [most|least] intense data point are removed.
count-after-ties - like "count", except that any data points with the same intensity as the nth [most|least] data point are retained.
absolute - keep data whose absolute intensity is [more|less] than <threshold>
bpi-relative - keep data whose intensity is [more|less] than <threshold> percent of the base peak intensity. Percentage is expressed as a number between 0 and 1, for example 75 percent is "0.75".
tic-relative - keep data whose individual intensities are [more|less] than <threshold> percent of the total ion current for the scan. Again, precentage is expressed as a number between 0 and 1.
tic-cutoff - keep the [most|least] intense data points up to <threshold> percent of the total ion current. That is, the TIC of the retained points is <threshold> percent (expressed as a number between 0 and 1) of the original TIC.
<orientation> must be one of:
most-intense (keep m/z-intensity pairs above the threshold)
least-intense (keep m/z-intensity pairs below the threshold)
<mslevels> is an optional int_set of MS levels - if provided, only scans with those MS levels will be filtered, and others left
untouched.
mzWindow <mzrange>
keeps mz/intensity pairs whose m/z values fall within the specified range.
<mzrange> is formatted as [mzLow,mzHigh]. For example, in msconvert to retain data in the m/z range 100.1 to 307.5, use --filter "mzWindow [100.1,307.5]" .
mzPrecursors <precursor_mz_list>
Retains spectra with precursor m/z values found in the <precursor_mz_list>. For example, in msconvert to retain only spectra with precursor m/z values of 123.4 and 567.8 you would use --filter "mzPrecursors [123.4,567.8]". Note that this filter will drop MS1 scans unless you include 0.0 in the list of precursor values.
defaultArrayLength <peak_count_range>
Keeps only spectra with peak counts within <peak_count_range>, expressed as an int_set. (In mzML the peak list length is expressed as "defaultArrayLength", hence the name.) For example, to include only spectra with 100 or more peaks, you would use filter "defaultArrayLength 100-" .
zeroSamples <mode> [<MS_levels>]
This filter deals with zero values in spectra - either removing them, or adding them where they are missing.
<mode> is either removeExtra or addMissing[=<flankingZeroCount>] .
<MS_levels> is optional, when provided (as an int_set) the filter is applied only to spectra with those MS levels.
When <mode> is "removeExtra", consecutive zero intensity peaks are removed from spectra. For example, a peak list
"100.1,1000 100.2,0 100.3,0 100.4,0 100.5,0 100.6,1030"
would become
"100.1,1000 100.2,0 100.5,0 100.6,1030"
and a peak list
"100.1,0 100.2,0 100.3,0 100.4,0 100.5,0 100.6,1030 100.7,0 100.8,1020 100.9,0 101.0,0"
would become
"100.5,0 100.6,1030 100.7,0 100.8,1020 100.9,0"
When <mode> is "addMissing", each spectrum's sample rate is automatically determined (the rate can change but only gradually) and flanking zeros are inserted around non-zero data points. The optional [=<flankingZeroCount>] value can be used to limit the number of flanking zeros, otherwise the spectrum is completely populated between nonzero points. For example, to make sure spectra have at least 5 flanking zeros around runs on nonzero points, use filter "addMissing=5".
mzPresent <tolerance> <type> <threshold> <orientation> <mz_list> [<include_or_exclude>]
This filter is similar to the "threshold" filter, with a few more options.
<tolerance> is specified as a number and units (PPM or MZ). For example, "5 PPM" or "2.1 MZ".
<type>, <threshold>, and <orientation> operate as in the "threshold" filter (see above).
<mz_list> is a list of mz values of the form [mz1,mz2, ... mzn] (for example, "[100, 300, 405.6]"). Data points within <tolerance> of any of these values will be kept.
<include_or_exclude> is optional and has value "include" (the default) or "exclude". If "exclude" is used the filter drops data points that match the various criteria instead of keeping them.
scanSumming [precursorTol=<precursor tolerance>] [scanTimeTol=<scan time tolerance>]
This filter sums MS2 sub-scans whose precursors are within <precursor tolerance>(default: 0.05 Th.)and <scan time tolerance> (default: 10 secs.). Its use is intended for some Waters DDA data, where sub-scans should be summed together to increase the SNR. This filter has only been tested for Waters data.
MS2Denoise [<peaks_in_window> [<window_width_Da> [multicharge_fragment_relaxation]]]
Noise peak removal for spectra with precursor ions.
<peaks_in_window> - the number peaks to select in window, default is 6.
<window_width_Da> - the width of the window in Da, default is 30.
<multicharge_fragment_relaxation> - if "true" (the default), allows more data below multiply charged precursors.
The filter first removes any m/z values above the precursor mass minus the mass of glycine.
It then removes any m/z values within .5 Da of the unfragmented precursor mass.
Finally it retains only the<peaks_in_window> most intense ions within
a sliding window of<window_width_Da> Da.
If<multicharge_fragment_relaxation> is true, allows more peaks at lower mass (i.e. below precursor).
If<window_width_Da> is set to 0, the window size defaults to the highest observed mass in the spectrum (this leaving only<peaks_in_window> ions in the output spectrum).
Reference: "When less can yield more - Computational preprocessing of MS/MS spectra for peptide identification", Bernhard Y. Renard, Marc Kirchner, Flavio Monigatti, Alexander R. Ivanov, Juri Rappsilber, Dominic Winter, Judith A. J. Steen, Fred A. Hamprecht and Hanno Steen Proteomics, 9, 4978-4984, 2009.
MS2Deisotope [hi_res [mzTol=<mzTol>]] [Poisson [minCharge=<minCharge>] [maxCharge=<maxCharge>]]
Deisotopes ms2 spectra using the Markey method or a Poisson model.
For the Markey method, hi_res sets high resolution mode to "false" (the default) or "true".
<mzTol> sets the mz tolerance. It defaults to .01 in high resoltion mode, otherwise it defaults to 0.5.
Poisson activates a Poisson model based on the relative intensity
distribution.
<minCharge> (default: 1) and <maxCharge> (default: 3) define the charge search range within the Poisson deisotoper. (default: 1)
ETDFilter [<removePrecursor> [<removeChargeReduced> [<removeNeutralLoss> [<blanketRemoval> [<matchingTolerance> ]]]]]
Filters ETD MSn spectrum data points, removing unreacted precursors, charge-reduced precursors, and neutral losses.
<removePrecursor> - specify "true" to remove unreacted precursor (default is "false")
<removeChargeReduced> - specify "true" to remove charge reduced precursor (default is "false")
<removeNeutralLoss> - specify "true" to remove neutral loss species from charge reduced precursor (default is "false")
<matchingTolerance> - specify matching tolerance in MZ or PPM (examples: "3.1 MZ" (the default) or "2.2 PPM")
chargeStatePredictor [overrideExistingCharge=<true|false (false)>] [maxMultipleCharge=<int (3)>] [minMultipleCharge=<int (2)>] [singleChargeFractionTIC=<real (0.9)>] [maxKnownCharge=<int (0)>] [makeMS2=<true|false (false)>]
Predicts MSn spectrum precursors to be singly or multiply charged depending on the ratio of intensity above and below the precursor m/z, or optionally using the "makeMS2" algorithm
<overrideExistingCharge> : always override existing charge information (default:"false")
<maxMultipleCharge> (default 3) and <minMultipleCharge> (default 2): range of values to add to the spectrum's existing "MS_possible_charge_state" values.If these are the same values, the spectrum's MS_possible_charge_state values are removed and replaced with this single value.
<singleChargeFractionTIC> : is a percentage expressed as a value between 0 and 1 (the default is 0.9, or 90 percent). This is the value used as the previously mentioned ratio of intensity above and below the precursor m/z.
<maxKnownCharge> (default is 0, meaning no maximum): the maximum charge allowed for "known" charges even if override existing charge is false. This allows overriding junk charge calls like +15 peptides.
<algorithmMakeMS2> : default is "false", when set to "true" the "makeMS2" algorithm is used instead of the one described above.
activation <precursor_activation_type>
Keeps only spectra whose precursors have the specifed activation type. It doesn't affect non-MS spectra, and doesn't affect MS1 spectra. Use it to create output files containing only ETD or CID MSn data where both activation modes have been interleaved within a given input vendor data file (eg: Thermo's Decision Tree acquisition mode).
<precursor_activation_type> is any one of: ETD CID SA HCD BIRD ECD IRMPD PD PSD PQD SID or SORI.
analyzer <analyzer>
This filter keeps only spectra with the indicated mass analyzer type.
<analyzer> is any one of "quad" "orbi" "FT" "IT" or "TOF".
Sometimes people use the terms FT and Orbi interchangeably, which is OK because there are no hybrid FT+Orbi instruments - so this filter does too.
analyzerType <analyzer>
This is deprecated syntax for filtering by mass analyzer type.
<analyzer> can be "FTMS" or "ITMS".
polarity <polarity>
Keeps only spectra with scan of the selected <polarity>.
<polarity> is any one of "positive" "negative" "+" or "-".