Data Analysis Principles of De Novo Protein Sequencing

Author: Prime Jones

It has been mentioned in The Principle of Protein De Novo Sequencing that the theoretical basis for de novo sequencing analysis of proteins is based on the fact that the peptides are regularly broken when fragmentation is induced by MS/MS.

According to the position of the peptide fragment, different types of ions can be generated. The ions produced near the N-terminus are a, b, and c types of ions, and the ions generated near the C-terminus are x, y, and z. Take the N-terminal cleavage ion as an example, the a-type ion is the N-terminal ion generated by the cleavage of the C-C bond in front of the first amino acid C=O, the b-type ion is the N-terminal iongenerated by the cleavage of the C-N bond between the first amino acid and the second amino acid, and the c-type ion is formed by the cleavage of the N-C bond of the second amino acid (as shown in Figure 2). For N-terminally cleaved ions, the different positions of the fractures produce different types of a, b, and c ions. In order to further distinguish the same type of ions generated at different break points, each ion is labeled with Roman numerals which refer to the location of the fracture.?As shown in Figure 2, a1and a3 refer to the same type of a-type ions with different fracture sites.?

According to the rules of peptide fragmentation, it can be found that each fragment has its own characteristics of ion fragmentation, in which the b-type ion is more special, because the position of the b-type ion break is located in two amino acid residues. Therefore, the mass difference between every two adjacent b ions is equal to NH2C2O-R (as shown in Figure 3). The R groups of different amino acids have different masses. Therefore, if the peak spectrum of b ion can be determined in many mass spectrometry peaks of the MS/MS, the mass of the R group can be calculated according to the difference in the mass of two adjacent b-type ions, and also, the corresponding amino acid can be determined based on the R group. In addition, if there are post-translational modifications on amino acid residues, the mass difference of the b-type ions can also be used to calculate the mass of the post-translational modification and thus calculate possible post-translational modifications.

Due to a fragmentation collision, most of the peptides will have only one break. Therefore, as long as the distribution pattern of b-type ions is found in the secondary spectrum, the quality difference of every two b-type ions is the quality of amino acids. You can use this information to find quality-matched amino acids. (As shown in Figure 4)

However, determining the mass spectrum peaks of b-type ions from the numerous ion peaks in MS/MS involves complex modeling and probability estimation. Therefore, it is unrealistic to select the peak pattern of b-type ions by artificial means. Currently, there are many de novo sequencing softwares that can help us to complete this step,PEAKS and NovoHMM, for instance. Different softwares have different principles for the estimation of b-type ion spectrums, but they are all speculative data that needs to be compared with the actual measured spectra for several times in order to achieve accurate analysis of the sequence.