. .
Retrieving structural data of a protein using PDB database






In this exercise one can learn how to retrieve structural data of a protein using PDB database. Open the browser and go to the PDB home page. 

For info on accessing the database please go to simulator tab.  

  Figure1: Home page of PDB


Here one can search for a protein molecule by typing its name or PDB ID in the box provided (Figure 2). For every structure in PDB there will be a unique ID. It will be a four letter alphanumeric ID. The first letter will be a number between one and nine. For example PDB ID ‘1HE8’ indicates a protein named PI3 kinase. Also there are options to search for ligand, sequence, macromolecule author or all categories. PDB format can be called as a standard format of protein structure files.

Figure 2. Search box in PDB


When the protein name is provided in the text box, results are displayed showing data like Molecule Name, PDB text, Structural domains, Ontology Terms etc. If one click on any of results,the particular information about the molecule is obtained.

If no information is obtained for the given query,advanced search (Figure 3) can be used. A query can be defined as a request that one uses to get  information from a database.


Figure 3. Advanced search option in PDB


Advanced search (Figure 4) can add more search criteria for to one's query. It can search for the proteins that are similar to query already provided.


Figure 4.  Advanced search interface


Search can be changed to a quick search/ ID(s) and Keywords/ Structure annotation/ Deposition. Click on the Result count. It gives the total count of PDB entries related to the query. If one clicks on ‘Add Search criteria’ option it will add an extra box to search the query type as shown below (Figure 5).


Figure 5. Add Search criteria option


Here is an example to retrieve the structure data of EGFR kinase domain (Figure 6).


Figure 6. Example to retrieve the sequence data of EGFR kinase domain



Following screen shots are from the result page. Here in Literature, one can see the Journal, PubMed ID, Primary Citation of Related Structure and also the abstract of the research publication (Figure 8). By clicking on 'Download Primary Citation', user can also download the primary citation using Mendeley or Endnote.


Figure 8: Literature


In Macromolecules, it describes about the protein Classification, Molecule, Total  Structure Weight, Chains, Lenght, organism etc. In this case it is a Transferase/ transferase inhibitor (Figure 9).


Figure 9: Macromolecules


Small Molecules shows the possible ligand that can bind to the protein of interest. The chains, formula name and interaction with protein of ligand is available (Figure 10). 


Figure 10: Small Molecules


Figure 11 shows the structure of ligand.


Figure 11: Structure of ligand


The 'Experimental Data & Validation' shows the data aquisition technique ike X-ray diffraction, NMR etc. that is used to obtain the structural data od a particular protein. The structure is validated using different methods like Ramachandran Plot, R value etc. and is shown under 'Structure Validation'.  (Figure 12). 


Figure 12: Experimental Data & Validation


Entry History gives the details of data like deposited date, released date, author details etc. (Figure 13).


Figure 13: Entry History


Click on the “View in 3D” option, the 3 dimensional structure of the protein can be seen (Figure 14).


Figure 14: View in 3D 


The 3D structure of protein (Figure 15). Here, one can move, rotate or zoom the picture to get a clear view. 

Figure 15: The 3D structure of protein 


Go to ‘Download Files” to download the FASTA Sequence, PDB file, mmCIF file etc. (Figure 16).


Figure 16: Download Files


FASTA sequence contains the computationally represented protein sequence. FASTA format is a simplest sequence format which starts with a ‘>’ symbol followed by the sequence ID, other comments and computationally represented protein sequence. The FASTA sequence of EGFR kinase domain is shown in Figure 17.


Figure 17: FASTA sequence of EGFR kinase domain


The PDB File explains the 3D structure of the proteins. The mmCIF (macromolecular Crystallographic Information File) describes the crystallographic information about the molecules. PDBML/XML File Format describes the structure of protein in XML format. XML file format is a file format used to store the documents in a computer. The formats can be used for word processor, excel spreadsheets and powerpoint presentations. The biological assembly is used to show a functional macromolecule. Figure 18 shows the example of biological assembly of EGFR kinase domain.


Figure 18: Biological assembly of EGFR kinase domain



This experiment uses: PDB database, http://www.rcsb.org


Cite this Simulator:

..... .....

Copyright @ 2020 Under the NME ICT initiative of MHRD

 Powered by AmritaVirtual Lab Collaborative Platform [ Ver 00.13. ]