- Retrieve the gene information of Arabidopsis thaliana for a particular condition using TAIR database.
What is Arabidopsis thaliana?
Arabidopsis thaliana is a small flowering plant which belongs to the mustard family. Some special characteristics of Arabidopsis thaliana, like small genome, very short life period, quick growth etc. makes it a good genetic model organism. Since Arabidopsis thaliana is a model organism, most of the researches are conducted using this plant. The Arabidopsis Information Resource (TAIR) is a database which is an easy way to get a wide range of information about Arabidopsis thaliana. It provides data including genes, gene product, gene expression, the complete genome sequences, genetic markers, gene structure, physical markers, clones, mutants, metabolism, seed stocks, genome maps, sequence polymorphism, research publications and about the research community.
Genome can be defined as the total content of genetic information in the chromosome of an organism. Genome is made up of DNA. The aggregation of genes to form a long chain is called chromosome. The small pieces of DNA or RNA in one unit of genome that can translate to protein are called as gene. DNA (Deoxyribonucleic acid) is the genetic material that contains all the information for the development and also maintains functions in living organisms. RNA is differing from DNA only in 1 base pair i.e. in RNA it is uracil (U) instead of thymine. The genome of Arabidopsis thaliana was completely sequenced in the year 2000.
Since Arabidopsis thaliana has a small genome with five chromosomes, genetic manipulations are easier making the plant useful for sequencing and genetic mapping. As one may know, sequencing is the process of finding the amino acid sequence of protein whereas genetic mapping is the process of making genome maps. Genome maps are the graphical representation of a gene on a chromosome which gives information about its location. The plant has a very short life period. The quick growth and small size of Arabidopsis thaliana makes it suitable for cultivation in comparatively shorter time on a large scale. Arabidopsis thaliana has a vital role in the transfer of DNA in plant biotechnology. These characteristics of Arabidopsis thaliana make it a good genetic model organism. The information about Arabidopsis thaliana would be very helpful for researchers especially for researches in other flowering plants of same species.
Figure 1: Arabidopsis thaliana
(image src: http://commons.wikimedia.org/wiki/Arabidopsis_thaliana)
The Arabidopsis Information Resource (TAIR) is a database providing a wide range of information about Arabidopsis thaliana. Arabidopsis thaliana has been used as a model organism for research. TAIR database provides data including genes, gene product, gene expression, the complete genome sequences, genetic markers, gene structure, physical markers, clones, mutants, metabolism, seed stocks, genome maps, sequence polymorphism, research publications and about the research community.
Gene product can be either a protein or RNA (in non-protein coding genes) that is obtained from the gene expression data. Gene expression is the process of producing the efficient protein or RNA. The complete genome sequences is a process of finding an organism’s complete DNA. A genetic marker can be defined as a DNA sequence or gene which has a known chromosome location. The process of producing similar copies of organisms or cells or part of DNA is called cloning and the outcome is called as a clone. A suddenly inheritable change in the genetic material is called mutation. The resultant organism can be called as a mutant. All the chemical reactions in a living organism, which accounts for the growth, are referred to as metabolism. Genome maps are the graphical representation of a gene on a chromosome which gives information about its location. The DNA sequence that derived from an mRNA is called as cDNA. ESTs or expression sequence tag is a small section of a cDNA. Insertion of extra base pairs into DNA sequence is called insertion mutations. Sequence polymorphism can be defined as the variation occurring in the sequence among individual organisms. For example, the variation in single base pair in a DNA sequence can be called as Single Nucleotide Polymorphism (SNP).
The update of existing data with new information on the database is a continous process so as ensuring most current info. TAIR is situated at the Carnegie Institution for Science Department of Plant Biology, Stanford, California funded by the National Science Foundation. TAIR works in partnership with Arabidopsis Biological Resource Center (ABRC) established at The Ohio State University. Apart from the datasets, collection of seed, reproduction, preservation, distribution and DNA resource are the biological materials provided to the research community by ABRC. The stock search and order of ABRC is fully incorporated with TAIR.
TAIR stores the information about annotations on gene structure, gene / protein functional and metabolic pathway. Structural and functional annotation includes the recognition of gene locations, coding regions and prediction of its function. Sequence of reactions that occur in an organism for its development and survival is referred to as metabolic pathways.
TAIR helps to get the information about an individual gene through its different tools. The main tools are GBrowse, NBrowse, MapViewer, SeqViewer, the interaction viewer and Synteny viewer. Data on multiple genes can be obtained by Bulk download tools and ftp files. GBrowse search or browse the genes, cDNAs and ESTs, insertion mutants, SNPs, markers of Arabidopsis thaliana and was developed by Lincoln Stein.
NBrowse is an interactive graphical browser used to open Arabidopsis thaliana interaction network contains experimentally confirmed protein-protein interactions. MapViewer is a tool which gives information about different sequence maps. A graphical representation of the information can also be obtained from the tool. SeqViewer is a genome browser. For a given short sequence or name, it will search for the possible hits in the whole genome of Arabidopsis thaliana and give the result. It gives the information about the annotation, transcripts, clones, markers on sequence of Arabidopsis. The Synteny Viewer is a tool that compares and describes about the syntenic regions between genomes.
A sample output of a gene file is shown in the following figure.
Figure 2: A sample TAIR output file
Here information like last date of modification, locus, gene model, description, map detail image and annotation are available. The "Date last modified" line shows the last modified date of the particular record. A locus is defined as the genomic sequence corresponding to the stretch of DNA, that is transcribed into RNA. It is represented using alphanumeric ID, e.g. AT2G03340. Gene Model describes a gene product from different sources like mRNA sequencing, genetic characterization, computational prediction etc. Description gives the details about gene model. The "other names" line shows the other names for a gene model. The "Map Detail Image" line shows the chromosome image. the "Annotations" line describes about the annotated part of the chromosome. The keywords contains a set of keywords about the gene.