Genetic and Proteomic Sequence Analysis for the Theoretical Prediction of O-Glycosylation Sites in Proteins
O-glycosylation is one of the most requisite and ubiquitous protein post translational modification by which the oligosaccharides are conjugated to polypeptide backbone at specific sites. Proteins involved in this modification are synthesized as encoded by their genetic information and the function of the glycoprotein is determined by the O-glycosylated sites and the conjugated glycans. The identification of O-glycosylation sites has been developing rapidly in recent years both by experimental and theoretical studies. In the present study, we adopted a method to predict O-glycosylation site by integrating genetic and proteomic sequence information. Data for the analysis are taken from O-GlycBase v6.0, EMBL-EBI and UniprotKB/Swissprot databases. The prediction is carried out on the collected datasets by following jack-knife procedure. As there is no predefined consensus motif identified for O-glycosylation, the preference of amino acids and codons within the window size of -3 to +3 positions of the glycosylated sites are computed. The analysis reveals the preference of specific codons and amino acids around the glycosylated sites. A prediction program is developed to identify the sites of glycosylation based on the preferences of codons and amino acids. Sensitivity, specificity and accuracy of our prediction method are 91%, 78% and 86% respectively. In order to access our prediction, a comparative study is carried out on prediction with some of the publically available online predictors. It resulted in high predictive performance. In addition to amino acids, preference of certain codons around the glycosylated sites might play a role in the glycosylability of glycoproteins and the methodology could be extended to study other such modifications in proteins to gain better insights.