. .
.
Finding ORF of a Given Sequence
.
.

 

Objective

 

  • To give an overview of Open Reading Frame (ORF) and its significance.
  • To learn how ORF Finder searches for open reading frames in a DNA sequence using the standard or alternative genetic codes.
  • To identify all the possible open reading frames in a sequence
  • To study how to perform a BLAST search for a particular ORF.

 

Theory

 

 

DNA (Deoxyribonucleic acid) is the genetic material that contains all the genetic information in a living organisms. The information is stored as genetic codes using adenine (A), guanine (G), cytosine(C) and thymine (T). During the transcription process, DNA is transcribed to mRNA. Each of these base pairs will bond with a sugar and phosphate molecule to form a nucleotide. Three nucleotides that codes for a particular amino acid during translation is called as a codon. The region of a nucleotide that starts from an initiation codon and ends with a stop codon is called an Open Reading Frame(ORF). Proteins are formed from ORF. By analyzing the ORF we can predict the possible amino acids that might be produced during translation. The ORF finder is a program available at NCBI website. It identifies all ORF or possible protein coding region from six different reading frame.

 


DNA (Deoxyribonucleic acid) is the genetic material that contains the genetic information for development and helps in maintaining all the functions in a living organisms.The information is stored as genetic codes using four different bases. They are adenine (A), guanine (G), cytosine(C) and thymine (T). In two strands of DNA, adenine always pair with thymine and guanine pair with cytosine. Each of these base pairs will bond with a sugar and phosphate molecule to form a nucleotide. The base pairing of DNA will result in a ladder shape structure of these strands which is called a double helix. RNA is differs from DNA only in 1 base pair i.e. in RNA it is uracil (U) instead of thymine(T). mRNA (messenger RNA) is a type of RNA which is formed from DNA transcription. During the transcription process, DNA is transcribed to mRNA in the nucleus and moves to the cytoplasm through the nuclear pores. This mRNA is translated to protein in the cytoplasm with the help of ribosomes. In mRNA, 3 nucleotides are considered at a time since a set of 3 nucleaotides (refered to as codon)  codes for an amino acid. The region of a nucleotide that starts from an initiation codon and ends with a stop codon is called an Open Reading Frame(ORF). An initiation codon is the triplet codon that codes for the first amino acid in the translation process. The translation process will start only with the initiation codon, ATG which codes for the amino acid methionine. The translation process stops when it comes across a stop codon. There are three stop codons:  TAA ("ochre"), TAG ("amber") and TGA ("opal" or "umber"). Any of these codons can stop the translation. Genetic codon can form 64 triplets(43) from the 4 nucleotides that codes for amino acids. Protein is formed from the ORF.

 

How to find ORF 

 

1. Consider a hypothetical sequence:

CGCTACGTCTTACGCTGGAGCTCTCATGGATCGGTTCGGTAGGGCTCGATCACATCGCTAGCCAT

 

2. Divide the sequence into 6 different reading frames(+1, +2, +3, -1, -2 and -3). The first reading frame is obtained by considering the sequence in words of 3.  

 

FRAME +1:  CGC TAC GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT CGG TAG GGC TCG ATC ACA TCG CTA GCC AT

 

The second reading frame is formed after leaving the first nucleotide and then grouping the sequence into words of 3 nucleotides  

 

FRAME +2:  C GCT ACG TCT TAC GCT GGA GCT CTC ATG GAT CGG TTC GGT AGG GCT CGA TCA CAT CGC TAG CCA T

 

The third reading frame is formed after leaving the first 2 nucleotides and then grouping the sequence into words of 3 nucleotides  

 

FRAME +3:  CG CTA CGT CTT ACG CTG GAG CTC TCA TGG ATC GGT TCG GTA GGG CTC GAT CAC ATC GCT AGC CAT

 

The other 3 reading frames can be found only after finding the reverse complement.

Complement :  GCGATGCAGAATGCGACCTCGAGAGTACCTAGCCAAGCCATCCCGAGCTAGTGTAGCGATCGGTA

Reverse complement:  ATGGCTAGCGATGTGATCGAGCCCTACCGAACCGATCCATGAGAGCTCCAGCGTAAGACGTAGCG

 

Now same process as that of +1, +2 and +3 strands is repeated for -1, -2 and -3 strands with reverse complement sequence

FRAME -1:  ATG GCT AGC GAT GTG ATC GAG CCC TAC CGA ACC GAT CCA TGA GAG CTC CAG CGT AAG ACG TAG CG

FRAME -2:  A TGG CTA GCG ATG TGA TCG AGC CCT ACC GAA CCG ATC CAT GAG AGC TCC AGC GTA AGA CGT AGC G

FRAME -3:  AT GGC TAG CGA TGT GAT CGA GCC CTA CCG AAC CGA TCC ATG AGA GCT CCA GCG TAA GAC GTA GCG

 

3. Now mark the start codon and stop codons in the reading frames 

 

FRAME +1:  CGC TAC GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT CGG TAG GGC TCG ATC ACA TCG CTA GCC AT

FRAME +2:  C GCT ACG TCT TAC GCT GGA GCT CTC ATG GAT CGG TTC GGT AGG GCT CGA TCA CAT CGC TAG CCA T

FRAME +3:  CG CTA CGT CTT ACG CTG GAG CTC TCA TGG ATC GGT TCG GTA GGG CTC GAT CAC ATC GCT AGC CAT

FRAME -1:  ATG GCT AGC GAT GTG ATC GAG CCC TAC CGA ACC GAT CCA TGA GAG CTC CAG CGT AAG ACG TAG CG

FRAME -2:  A TGG CTA GCG ATG TGA TCG AGC CCT ACC GAA CCG ATC CAT GAG AGC TCC AGC GTA AGA CGT AGC G

FRAME -3:  AT GGC TAG CGA TGT GAT CGA GCC CTA CCG AAC CGA TCC ATG AGA GCT CCA GCG TAA GAC GTA GCG

 

4. Identify the open reading frame (ORF) -  sequence stretch begining with a start codon and ending in a stop codon.

 

FRAME +2:  ATG GAT CGG TTC GGT AGG GCT CGA TCA CAT CGC TAG 

FRAME -1:  ATG GCT AGC GAT GTG ATC GAG CCC TAC CGA ACC GAT CCA TGA 

FRAME -3:  ATG AGA GCT CCA GCG TAA

5. Based on the amino acid table the peptide sequence is found

 

Figure 1: Amino Acid Table

 

 

FRAME +2:  ATG GAT CGG TTC GGT AGG GCT CGA TCA CAT CGC TAG 

                  met  asp  arg  phe  gly  arg  ala  arg  ser  his  arg  stop

 

FRAME -1:  ATG GCT AGC GAT GTG ATC GAG CCC TAC CGA ACC GAT CCA TGA 

                  met  ala  ser  asp  val  ile  glu  pro  tyr  arg  thr  asp  pro  stop

 

FRAME -3:  ATG AGA GCT CCA GCG TAA

                  met  arg  ala pro ala stop

 

By analyzing the ORF we can predict the possible amino acids that are producing during the translation process. The prediction of the correct ORF from a newly sequenced gene is an important step. Finding ORF helps to design the primers which are required for experiments like PCR, sequencing etc.

 

 

 ORF Finder:

 

The ORF finder is a program available at NCBI website. It identifies the all open reading frames or the possible protein coding region in sequence. It shows 6 horizontal bars corresponding to one of the possible reading frame. In each direction of the DNA there would be 3 possible reading frames. So total 6 possible reading frame (6 horizontal bars) would be there for every DNA sequence. The 6 possible reading frames are +1, +2, +3 and -1, -2 and -3 in the reverse strand. The resultant amino acids can be saved and search against various protein databases using blast for finding similar sequences or amino acids. The result displays the possible protein sequence and the length of the open reading frame etc.  
 

 


 

Cite this Simulator:

.....
..... .....

Copyright @ 2024 Under the NME ICT initiative of MHRD

 Powered by AmritaVirtual Lab Collaborative Platform [ Ver 00.13. ]