Which proteins are hidden behind mass spectrometry spectra ?

Which proteins are hidden behind mass spectrometry spectra ?

During the past few years, we have seen an increase in the technicality of our tools, with the constant evolution of our analytical capacity and a marked rise in the volume of data generated.

This evolution concerns, in particular, mass spectrometers used in proteomic studies for the identification, quantification and characterisation of the proteins found in organisms. In these studies, the analysis of unknown protein mixtures generates series of masses that constitute experimental spectra. Peptide interpretation of an experimental spectrum consists of comparing the experimental spectra to a set of ideal spectra (also referred to as theoretical spectra) extrapolated from the predicted fragmentation of proteins deduced from the genomic databases. Unfortunately, current software programmes used to interpret spectra are not sophisticated enough to interpret more than about 25% of spectra generated by an experiment.

Many parameters affect the behaviour of algorithms, including pretreatment of experimental spectra, the definition of the set of theoretical spectra compared to experimental spectra, and the score function that computes similarity between spectra. In order to assist the scientific community to better understand these parameters, we performed several distinct interpretations of the same experimental datasets with four software programmes including the ones most commonly used1. We showed that most of the software programmes give the same peptide interpretation for each spectrum when the set of theoretical spectra is strictly equal. In contrast, each software programme has its own score function to rank the spectra identifications, from the most to the least reliable. This ranking could have a great impact on the results, depending on the accepted error threshold.

A good understanding of algorithms and their limitations is also a prerequisite for the conception of new approaches that resolve their drawbacks. In addition to making the results of the comparison of four software programmes available to the scientific community, we also proposed innovative algorithms2 able to compare a large set (several tens of thousands) of experimental spectra with several hundreds of thousands of theoretical spectra in just a few minutes. This new approach identifies large sets of peptides that display post-translational modifications that cannot be detected by traditional software programmes. The development of a software programme to implement this new method is now being finalised.

Publications

Tessier, D., Lollier, V., Larré, C., and Rogniaux, H. (2016) Origin of Disagreements in Tandem Mass Spectra
Interpretation by Search Engines, J Proteome Res 15, 3481-3488. http://dx.doi.org/10.1021/acs.jproteome.6b00024

David, M., Fertin, G., and Tessier, D. (2016). SpecTrees: An efficient without a priori data structure for
MS/MS spectra identification. In International Workshop on Algorithms in Bioinformatics, pages 65-76. Springer. https://link.springer.com/chapter/10.1007/978-3-319-43681-4_6

Modification date : 17 July 2019 | Publication date : 12 July 2017 | Redactor : V Rampon