- To compute the various physical and chemical parameters of a protein.
- To perform primary structure analysis of proteins.
- To introduce a protein analysis software that is available through the ExPASy server.
Proteins are one of the important fundamental units of all living cells. Proteins have a wide range of functions within all the living beings. Some of the important functions such as DNA replication, catalysis of metabolic reactions, transportation of molecules from one location to another etc. are performed with the help of proteins.
The building blocks of proteins are amino acids. Amino acids are made from an amine (-NH2) and a carboxylic acid (-COOH) functional groups as well as a side chain which is specific to each amino acid. There are almost 20 amino acids found in human body that usually varies in their R groups. In proteins, the amino acids are linked to each other by means of peptide bonds. A peptide bond is formed when the carboxyl group of one amino acid is linked to the amino group of another molecule through a covalent bond.
Proteins differ from one another in their structure, primarily in their sequence of amino acids. The structure explains the different levels of organization of a protein molecule. The protein structure is classified into primary, secondary, tertiary, and quaternary. The linear sequence the polypeptide chain of amino acid refers to the primary structure of proteins. The intermolecular and intra-molecular hydrogen bonding between the amide groups in primary structure of protein form secondary structure. Alpha helices and beta sheets are the two important secondary structures in protein. The three dimensional structure of a single protein molecule refers to the tertiary structure. The quaternary structure is formed by several protein molecules or polypeptide chains.
A pictorial representation of primary, secondary, tertiary and quaternary structure is shown in figure 1.
Image source: http://en.wikipedia.org
Figure 1: Representation of primary, secondary, tertiary and quaternary structure of proteins
There are different tools available through ExPasy server to analyze a protein sequence. ExPASy is the SIB Bioinformatics Resource Portal. It provides access to several scientific databases and software tools in many areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc. ProtParam is one among the protein analysis tool available on the ExPasy server and can be accessible through the link, http://www.expasy.org/tools/protparam.html. It is used for calculating various physiochemical parameters of a provided protein. The protein sequence is the only input provided to calculate such parameters. The protein can either be specified as a UniProtKB/Swiss-Prot or UniProtKB/TrEMBL accession number or ID or as sequences of amino acids.
The various parameters computed by ProtParam are molecular weight, amino acid composition, extinction coefficient, estimated half-life, theoretical pI, and grand average of hydropathicity (GRAVY), aliphatic index and instability index.
In protParam, the molecular weight of protein is calculated by the addition of average isotopic masses of amino acids in the provided protein and the average isotopic mass of one water molecule.
The extinction coefficient illustrates how much light a protein absorbs at a certain wavelength. It has been proved that (Gill, S.C. and von Hippel, P.H., 1989) it is possible to calculate the molar extinction coefficient of a protein from the information of its amino acid composition. From the molar extinction coefficient of tyrosine, tryptophan and cystine (cysteine does not absorb appreciably at wavelengths >260 nm, while cystine does) at a given wavelength, the extinction coefficient of the native protein in water can be computed using the following equation:
E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine)
Where (for proteins in water measured at 280 nm): Ext(Tyr) = 1490, Ext(Trp) = 5500, Ext(Cystine) = 125;
The absorbance (optical density) can be calculated using the following formula:
Absorb(Prot) = E(Prot) / Molecular_weight
Protein pI is calculated using pKa values of amino acids. The pKa value of Amino acids depends on its side chain. It has an important role in defining the pH dependent characteristics of a protein.
The half-life is a prediction of the time it takes for half of the amount of protein in a cell to disappear after its synthesis in the cell.
Grand average of hydropathicity (GRAVY)
The GRAVY value for a protein or a peptide is calculated by adding the hydropathy values (Kyte, J. and Doolittle, R.F., 1982) of each amino acid residues and dividing by the number of residues in the sequence or length of the sequence. Increasing positive score indicates a greater hydrophobicity.
The aliphatic index of a protein is described as the relative volume occupied by the amino acids such as alanine, valine, isoleucine and leucine, which have an aliphatic side chain in their structure. The aliphatic index of a protein is calculated according to the following formula (Ikai, A.J., 1980).
Aliphatic index = X(Ala) + a * X(Val) + b * ( X(Ile) + X(Leu) )
Where X(Ala), X(Val), X(Ile), and X(Leu) are mole percent (100 X mole fraction) of alanine, valine, isoleucine, and leucine.
The coefficients ‘a’ and ‘b’ are the relative volume of valine side chain (a = 2.9) and of Leu/Ile side chains (b = 3.9) to the side chain of alanine.
The instability index provides an estimate of the stability of your protein in a test tube. A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.
This experiment uses the Protparam tool, available through the ExPASy server: SIB bioinformatics resource portal.