Skip to main content

Data and Presentations

Below are some papers that you should read. To be clear - some of these papers may describe tools that no longer exist, but the concepts of the papers are the foundation for most subsequent research.

General Introduction

  • Mass spectrometry-based proteomics (pubmed)
  • How do shotgun proteomics algorithms identify proteins (pubmed)
  • Computational mass spectrometry-based proteomics (pubmed)

Peptide/Spectrum Matching

Foundational Papers

  • Sequest and database matching (link). Good explanation of matching candidate sequences to a spectrum.
  • Dancik paper (pubmed) This paper puts forth basic concepts to understand and explore spectra: the offset frequency function, self-convolution, spectra represented as graphs, de novo sequencing.

Error Models

Foundational Papers

  • PeptideProphet (pubmed) - Identifying true from false matches is a rigorous and statistically justified way.
  • Decoy Databases (pubmed) – An abstraction of the concepts from the PeptideProphet paper. This method has become popular due to the easy implementation and clear concept.
  • Comprehensive review (pubmed)

Protein Identification

Foundational Papers

  • The Protein Inference Problem (pubmed) - this paper describes why bottom-up proteomics has difficulty in unambiguously identifying proteins.
  • Parsimony (pubmed) - one of the frequently used criteria to help roll-up peptides into proteins.


Foundational Papers

  • Matching MS1 features across datasets (pubmed) - This paper describes matching MS1 features by accurate mass and retention time. The popular MaxQuant match-between-runs and all other related techniques are a reimplementation of this original method.
  • Isobaric labeling (pubmed) - this paper describes how to multiplex different experiments into one run using a special labeling technique.