Objective
To find annotation for a genome from different organisms.
Theory
Gene genome annotation:
Genome refers to the total genetic material in an organism. Human genome consists of complete set of human genetic information. Genome annotation refers to the attachment of biological information to the sequences. Identification of the genomic portions, introducing the gene elements in the identified portion and adding the biological information to these elements are some of the important concepts in genome annotation.
Annotations can be again divided into different areas like structural annotation and functional annotation.
Structural annotation may include:
Functional annotation includes:
- Biological and Biochemical functions
- Gene regulation and gene interaction
ASAP (A Systematic Annotation Package) is a systematic annotation package for community analysis of genome. It has a genome sequence, annotations and experimental data related to multiple organisms. It is a database which can store and distribute the different gene sequence data and gene expression data. This system allows the user to upload the genome sequence data, its annotation and experimental data after a new study. It allows the user to view and download existing annotations and experiments too. This database provides information about eukaryotes and bacteria. From ASAP login page, one can login as guest to view the publically available sequence data. For registered users contribution of information to the database is also possible. To be a registered user, check for the privacy settings. ASAP supports three levels of user’s public viewers (submitters), annotators and curators.
Annotators view annotations that are reviewed and rejected by the curator. It allows annotators to revise and resubmit annotations by rectifying the errors.
Curators have all the functionalities available to the viewers and annotators, curators can edit or delete the existing features and also add new features to the genome. Details of curators can be available in summary data page.
Submitter allows submitting or recollecting the data.
Choosing a version for genome sequence is very important; usually one would prefer to work with an up-to date version of genome sequence.
ASAP can handle multiple sequences associated with different projects, such as short sequence assembled contigs, ESTs/complete genome from multiple chromosomes and plasmids etc. The tools that are available in ASAP searches for annotations across genomes and queries related to these searches produce matching objects. Each feature of these objects corresponds to a nucleotide sequence with in a genome.
HOME PAGE:
Figure 1: Home page of ASAP
Features of ASAP:
Improvement in comparative genomics :
ASAP provides information about genome sequences and their relationships between the annotated proteins. BLAST algorithm finds the best orthologs. PHP scripts are used to download the sequences, run the searches, find the best hits, import the results and parse the results. Adding orthologs is based on the statistics or user defined criteria like E-value and percent identity. BLAST analysis can also predict the false orthologs and have high probability of missing real orthologs.
Whole genome multiple sequence alignment :
Mauve, a multiple sequence alignment tool, used to construct multiple sequence alignment of genome sequences that are uploaded in ASAP. When browsing annotations for a particular gene a user can view the multiple alignments by launching a java applet, which displays genome alignment with the selected gene.
Handling experimental data:
ASAP serves as a repository for experimental data associated with genomes in the database. There are 2 main ways the user can interact with experimental data in ASAP. First way is to select a genome and view the experiment related datasets. The second way is through the gene annotations. When the user checks for a particular gene and if it has entries related to the query, it is shown as a list of experimental dataset o the gene annotation page.
BLAST:
ASAP includes processing and viewing of BLAST output to users as they view or annotate features. BLAST searches are the most common search for easily identifying the gene functions. By providing DNA/protein sequences as a BLAST queries some important features like characterization, motif search, structural characteristics and gene expression data can be obtained. It launches a visualization tool to examine the position of current feature of genome. BLAST searches against the NCBI, “nr†nucleotide database, Genpept, an MG1655-specific protein database, and Pfam protein family search.