1
Protein Analysis by Shotgun Proteomics1
Yu Gao1, and John R. Yates III2
1College of Pharmacy, University of Illinois at Chicago, Chicago, IL, USA
2Department of Molecular Medicine, Scripps Research, La Jolla, CA, USA
1.1 Introduction
1.1.1 Terminology
In mass-spectrometry-based protein analysis, there are two major strategies, the top-down method and the bottom-up method [1, 2]. The terms "top" and "bottom" refer to the complexity of the analyte, namely the more complex "protein" and less complex "peptide." In top-down protein analysis, the intact protein is directly analyzed by mass spectrometer. Mass information and fragment ions are generated from the intact protein ions and are then used for direct protein identification and characterization. In comparison, bottom-up method starts with digesting the protein into peptides by either chemical or enzymatic digestion. The peptide product is then analyzed by a tandem mass spectrometer, and the peptide molecular weight and fragmentation information is matched back to the original protein or protein mixture. When a mixture of proteins is analyzed by a bottom-up method, it is also called shotgun proteomics, owing to the similarity to shotgun genomic sequencing.
1.1.2 Power of Shotgun Proteomics
In a typical shotgun proteomics experiment performed on a modern instrument, one should expect to identify anywhere from 1000 to 10?000 proteins from a mammalian cell lysate [3, 4]. In comparison, a typical top-down experiment is able to identify hundreds or a thousand proteins with a similar sample, but it require extensive fractionation to simplify the protein mixtures entering the mass spectrometer [5-7]. In top-down proteomics, intact protein is highly complex in terms of molecular weight, charge state, hydrophobicity, molecular structure (shape), and so on, therefore, it is hard to find optimal conditions for ideal separation, fragmentation, and detection of all proteins presented in the sample.
1.1.3 Advantage of Shotgun Proteomics
Comparing to intact protein, a peptide is a much more unified class of analyte, with a narrow range of molecular weight and charge state. Because most digested peptides are denatured, peptides also have a more unified shape [8, 9]. Therefore, starting with peptides instead of the intact protein presents advantages over the top-down method, including more robust liquid-chromatography (LC) separation for peptides, more uniform electrospray ionization, more complete fragmentation in tandem mass spectrometry (MS/MS), and easier interpretation of the simplified fragmentation patterns. Due to these advantages, bottom-up/shotgun proteomics method has become the easier strategy for protein analysis over the past two decades. However, these advantages also come with some nontrivial challenges in sample preparation, peptide separation, data acquisition, and informatics [10-12]. This chapter will discuss typical procedures of shotgun proteomics experiment and some recent advances regarding existing challenges.
1.2 Overview of Shotgun Proteomics
A typical shotgun proteomics experiment consists of three main steps: (i) sample preparation, (ii) mass spectrometry data acquisition, and (iii) data processing. The sample preparation step transforms the biological sample to a peptide mixture. The data acquisition step obtains MS/MS data from the peptide mixture. The final data processing step performs statistical and mathematical analyses to elucidate the identity and quantity of peptide and protein (Figure 1.1).
Figure 1.1 Typical workflow of a bottom-up proteomics experiment. Proteins are first separated from biological samples, then digested into peptides. An LC-MS/MS system is typically used to fractionate and fragment peptides. The acquired mass spectra are then matched to existing peptide sequence using a database search algorithm and then inferred back to proteins.
In the sample preparation step, a protein mixture is first obtained by separating protein and nonprotein contents from a biological sample such as cell lysate or serum. The separated protein mixture is then chemically modified (reduced and then alkylated) to break all Cys-Cys disulfide bonds in order to linearize protein. Protease, for example, trypsin, is then added to the modified protein mixture to digest protein into peptides. After digestion, the peptide mixture is often loaded onto a C18 column and then washed to remove nonpeptide contents (salts, buffers, chaotropes, etc.).
Once the sample is digested and cleaned, an LC-MS system is used to fractionate peptides to increase the amount of MS/MS data obtained from the peptide mixture. As digested protein mixtures can create very complicated and complex peptide mixtures, to better resolve peptide mixture, various types of separation columns have been used either alone or in combinations, including reversed phase (RP), strong-cation exchange (SCX), size exclusion (SEC), hydrophilic interaction liquid chromatography (HILIC), and affinity purification. In general, the final separation method prior to introduction of peptides into electrospray ionization is reversed-phase as this method removes salts and other small-molecule interferants. The separated peptides are then ionized and injected into the mass spectrometer for analysis. In this step, the peptide mixture is first temporally separated by LC, then spatially separated by the electrical fields. This separation cascade provides enough resolution to separate hundreds of thousands of peptide species within hours.
In the final data processing step, the data obtained for each detected peptide species, including MS (whole mass) and tandem MS/MS (fragmentation masses) data, is analyzed by algorithms that search sequence databases to match spectra to the original protein sequence. If desired, the data can also be further analyzed for quantitation by either "labeled" or "label-free" methods.
1.3 Sample Preparation
1.3.1 Protein Separation
1.3.1.1 Overview
To analyze proteins from a complex biological sample, protein often needs to be separated from interfering small molecules and nucleotides. This is often done by nonspecific protein extraction such as protein precipitation or centrifugation [13-16]. Some of the most commonly used reagents/solvents/systems for protein precipitation include trichloroacetic acid (TCA)/water, chloroform/methanol, acetone, phenol/ammonium acetate/methanol, and so on. These methods can effectively separate the protein from other molecules such as salts, lipids, detergents (often introduced during lysis), DNA/RNA, and even the aqueous buffer. Therefore, the proteins are purified and concentrated for further processing. Centrifugation method such as sucrose gradient is also very useful for this purpose, but due to its lower throughput and efficiency, it is often used in combination of protein precipitation method to isolate proteins from specific cell organelles.
1.3.1.2 2D-Gel Approach
Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) is a robust, orthogonal approach, popularly applied for the simultaneous separation and fractionation of complex protein mixtures that have been recovered from biological samples for proteomic analysis [17, 18]. The method allows separation of several thousand proteins, on the basis of their molecular mass and the isoelectric point in a single gel. It used to be one of the most widely used methods for protein separation, and it has been used in studies related to proteins and protein complexes [19]. Once the separation is achieved via 2D PAGE, a protein spot or band can be visualized and then extracted. Coomassie brilliant blue or silver staining is commonly used for protein visualization. Coomassie brilliant blue is generally preferred over silver staining as it is a reversible stain and compatible with MS analysis [20]. Despite greater sensitivity, due to its limited compatibility and nonlinearity with the signal, silver staining may give disappointing results [21]. After protein visualization, the gel spots are digested with trypsin and identified by either protein fingerprinting using matrix-assisted laser desorption/ionization-mass spectrometry (MALDI-MS) or via peptide sequencing using LC-MS. Although 2D electrophoresis is associated with the start of proteomics and is still widely used for various purposes, large-scale proteomics is now associated with advanced separation and mass spectrometry technologies for protein identification. Technologies like LC-MS/MS, which offer superior separations, have taken over from 2D gel-based methods.
1.3.1.3 Separation of Membrane Protein
Membrane proteins are integral parts of the nucleus and cell membranes. They are permanently anchored to the outer surface of the membrane or embedded into the lipid bilayer and are actively involved in many crucial cell...