For more info on the installation of the software and other prerequisites refer simulator tab.
The first step is to get the target sequence from NCBI. After getting the sequence run it for a BLAST search. Note the first four proteins which have maximum identity. To download the structure of these sequence the user have to search in PDB. After that, the user has to change the target sequence into PIR format. A sequence in PIR format can be read by modeller. This format starts with a “>” sign. It is followed by a two-letter code that describes the sequence type, a semicolon and the database ID code. The next line gives the textual description about the sequence. The following line contains the sequence that ends with an "*” asterisk.
The PIR sequence of insulin query sequence is given below.
Now the user can save the sequence in “.ali” format. The “profile.build()” command can be used to search the similar sequence of already known structures. The following script can be used for that.
Figure 1: Script 1
Execution of the program occurs as follows. First it initializes an ‘environment’. It creates a new object, ‘environ’ for the modeling run. One can use any name for it (in this case ‘env’ is used). The next step is to create a ‘sequence_db’ object. These objects can contain huge protein sequence database. The third step is to read a text file format into the sdb database. The sequences can be found in the file "pdb_95.pir" (which can be downloaded using the link given in the simulator. Go to Tutorial -> Basic Modeling and download the given zip file.). The text format file holds non-redundant PDB sequences of 95% identity. This file is also in PIR format. Next step is to write the binary machine specific file that holds the sequences read in the earlier step. Then, a new ‘alignment’ object has to be created. It reads the query sequence from the file containing the query sequence in PIR format (‘qseq1.ali’, in this case). It finally converts the file into a ‘prf’ profile which contains better information to alignments. Next step it to search the sequence database for ‘prf’ profile. The matching sequences from the ‘sdb’ will add to the profile. Then it creates the profile of query sequence and its homologs.
Here, in this experiment, the file is saved as “script1.py”. The command that is used in modeller to run this script is “mod9.11 script1.py”. If the user is using a different version of modeler the user has to change the command according to that. After execution, modeller always produces a log file. In this experiment output of the “script1” file is written to a new file named “script1.log”. Modeller writes the profile in a text format file named script1.prf (here). Certain lines in this file are given below.
Figure 2: certain lines in script1.prf file
The second column shows the compared PDB sequences with the target. The most important columns in this file are second, tenth, eleventh and twelfth columns. Eleventh column indicates percentage of identity between insulin sequence and the PDB sequence that normalized by alignment length which shows in the tenth column. If the alignment is not short, sequence identity value above than 25% represents a possible template. The twelfth column contains alignment values that obtained by e-value. Here almost 20 PDB sequences are having e value equals to zero which indicates that they are similar to the query sequence. From these sequences one has to find a suitable template for the query. It can be use alignment.compare_structures() command to compare the similarity between templates. The program is shown in script2 file (Figure 3) which is in “.py” format.
Figure 3: Script 2
Here an empty alignment object named ‘aln’ is created first. A python ‘for’ loop is used to read each PDB files by modeller. For this one must download all PDB files in the same directory where the script belongs to. It can be downloaded either from the PDB. The model_segment argument is used to read only one chain from each PDB file. The append_method is used to append the structure to the alignment. To improve this alignment malign command is used. It calculates the multiple sequence alignment. Malign3d command is to make starting point as the current alignment for an iterative least-squares superposition of 3D structures. The aligned structures (by malign3d) are compared by compare_structures command. It calculates difference in dihedral angles , percentage sequence identity etc. of main chain and sidechain, the deviations between positions and distances of atom etc. The id_table command writes a file includes pairwise sequence distances. It can be used as an input to dendrogram command which calculates a clustering tree according to the input. This helps in visualizing the difference in templates. Certain line in the script2.log file is given below.
The alignment of the query with the template is the next step.
The modeler command, “align2d()” can be used for this. It can align the query sequence with the template structure. When constructing the alignment, the command obtains structural information from the template. The following script aligns the target sequence in qseq1.ali with the template sequence in PDB file “tseq1.pdb”.
Figure 5: Script 3
Here, an ‘environ’ object (as input) and an empty alignment ‘aln’ and a protein model ‘mdl’ is created. The chain A segment of tseq1 PDB structure file is read into mdl. The append_model() command transfers the PDB sequence of this model to the alignment and assigns it the name of "tseq1" (align_codes). Then by append() command, the query sequence from the file, qseq1 added to the alignment. To align two sequences, the command, “align2d()” is used. It writes the alignment into two formats. “qseq1-tseq1.ali” in PIR format and “qseq1-tseq1.pap” in PAP format. Modeller uses PIR format for model building and PAP is used for visualization. In the PAP formatted file, “qseq1-tseq1.pap”, all the same positions are marked using “*”. A sample file is shown below.
Figure 6: qseq1-tseq1.pap
Fourth step is model building.
After constructing the alignment of target and template, using automodel class, the modeler automatically calculates a 3D model of the target. Script4 produce five similar insulin models according to structure of the template and alignment in “qseq1-tseq1.ali”.
Figure 7: script4
The first line setup the automodel class to use after loading it. The automodel object, ‘a’ is then created and all the parameters required for model building procedure is also produced. “alnfile” names the file which holds target-template alignment in PIR format. Whole known template structures contained in “alnfile” (“qseq1-tseq1”) are defined in “knowns”. Name of the target sequence in “alnfile” is defines by “sequence”. The assessment scores are requested by “assess_methods”. The number of models to be calculated is defined by starting_model and ending_model. The method “make” in the last line calculates the models. One of the important output file is “script4.log” that contains information about input restrains for modeling, errors, warnings etc.
Figure 8: script4.log
The log file contains the summary of all built models. It shows file name that lists the coordinates of model in PDB format, and also scores of each model.
The final step is the model evaluation.
There are different ways to select a good model among the newly created models for same target. One can select a model that has a lowest DOPE score or highest GA341 score that shows in the script4.log file. The following file, script5.py is used to evaluate a given model with DOPE potential.
Figure 9: script5
Here it uses “complete_pdb” script to read a PDB file for energy calculations. The modeler energy functions work on model atoms. So a selection of all atoms is created. The “assess_dope” command is used to calculate the DOPE energy. It also creates an energy file, that written into a file (“qseq1.profile”, in this case). It can be used as input for a graphing program.
Figure 10 shows the picture of a modeled protein
Figure 10: Model of insulin protein