Phylogenetic Analysis using PHYLIP - Rooted trees (Theory) : Bioinformatics Virtual Lab II : Biotechnology and Biomedical Engineering : Amrita Vishwa Vidyapeetham Virtual Lab

	Theory Procedure Self Evaluation Simulator Assignment Reference Feedback

Objective:

To find the evolutionary relationship between different organisms based on the time scale and to analyze the changes that occured in an organisms using PHYLIP.

Key words:

Phylogenetic analysis: Analyze the evolutionary relationships between different organisms and this analysis would help to find out the changes that occured in organisms during the evolution.

Boot Strapping: It is a way to test the reliability of Dataset.

Query: User can give input called as a query. This can be either a protein or nucleotide sequence.

Rooted tree: A tree which is having a special node as main node also called the root. A tree without root is treated as a free tree.

Tree topology: Tree topology refers to the arrangement of phylogenetic tree.

Theory :

PHYLIP is a complete phylogenetic analysis package which was developed by Joseph Felsestein at University of Washington. PHYLIP is used to find the evolutionary relationships between different organisms. Some of the methods available in this package are maximum parsimony method, distance matrix and likelihood methods. The data is presented to the program from a text file, which is prepared by the user using common text editors such as word processor, etc. Some of the sequence analysis programs such as ClustalW can write data files in PHYLIP format. Most of the programs look for the input file called "infile" -- if they do not find this file, then they ask the user to type in the file name of the data file. Before starting the computation, the program will ask the user to set options (optional) through a menu. Output is written into special files with names like outfile and outtree.

PHYLIP file format :

The input files have information about the number of sequences, nucleic acids and amino acids.

The sequence has 10 characters length. Spaces can be added to the end of the short sequences to make them long.

Gaps can be represented as ‘-‘.

Missing data can be represented as ‘?’

Spaces between the alignments are allowed usually after every 10 bases.

Example:

4 1061

4 indicates number of species taken for phylogenetic analysis

1061 indicates number of characters.

PHYLIP program :

The PHYLIP programs have to be run in sequential manner, output of one program is used as input of another program. User has to know how to use these programs in a sequential manner. Simple examples to run PHYLIP programs are given in the below flowcharts.

There are three different ways to determine the nucleotide and amino acid sequence:

Maximum parsimony method
Distance methods
Maximum likelihood methods

Maximum parsimony method: It is a character-based method which infers a phylogenetic tree, by minimizing the total number of evolutionary steps or the total tree length for a given set of data. It is also referred to as sequence based tree reconstruction method.

Distance methods: Evolutionary distances are calculated for all operational taxonomic units and build tree where distance between the operational taxonomic units matches these distances.

Maximum likelihood method: Refers a model of sequence evolution, finds the tree which gives the highest likelihood of the observed data.

Programs used in PHYLIP program:

The following are the methods available in PHYLIP program.

Dnapars: Estimates the phylogeny using parsimony method from nucleic acid sequence.

Dnamove: It is an interactive process used for construction of phylogeny from nucleic acid sequences using parsimony method.

Dnapenny: Estimates the parsimonious phylogeny for nucleic acid sequences which uses branch and bound theory.

Dnacomp: States the phylogeny of nucleic acids and searches for the largest sites which have uniquely evolved on the same tree.

Dnainvar: Computes the nucleic acid sequence which tests the alternative tree topologies. The programs tabulate (chart) the frequencies of occurrences of different nucleotide patterns.

Dnaml: Estimates the phylogenies from nucleotide sequences by maximum likelihood method without assuming molecular clock. Molecular clock defines to calculate timings of evolutionary events.

Dnamlk: It estimates the phylogeny using maximum likelihood method, it assumes the molecular clock.

Dnadist: Dnadist calculates the pair wise distances between the sequences. It also makes a table of percentage similarity among different sequences.

Seqboot: Reads a dataset, and produces multiple datasets by bootstrap resampling. Most of the programs in the current version allow processing of multiple datasets; this can be used together with the consensus tree program CONSENSE.

Concense: Computes consensus trees by the consensus tree method, which can allow one to easily find the consensus tree.

Protpars: Estimates the phylogenies from protein sequences which use parsimony method.

Protdist: It measures the distances of protein sequences using maximum likelihood method which is based on the PAM matrix, JTT model and PBM model. It can give the percentage of similarity among the sequences.

Promol: Estimates phylogeny from amino acid sequences by using maximum likelihood methods. The program allows us to find different changes at known sites. Proml is without a molecular clock.

Promlk: This estimates the phylogeny from amino acid sequence by using maximum likelihood method. It assumes a molecular clock. Molecular clock defines to calculate timings of evolutionary events.

Restml: Estimates the phylogeny using maximum likelihood method with restriction sites data. It does not allow the rate difference between the transitions and transversions.

Restdist: It estimates the phylogeny and calculates the distance from the restriction site data and restriction fragment data.

Fitch: Estimates phylogenies from distance matrix data under “additive tree model”. It uses fitch-Margoliash and some related least square criteria or the distance matrix method. It does not assume the evolutionary clock. The program computes the distance from molecular sequences, fragment distances, and genetic distances calculated from gene frequencies.

Kitsch: Estimates phylogenies from distance matrix data under “Ultrametric model” same as the additive tree model except the evolutionary clock is measured. It is similar to Fitch algorithm.

Neighbor: Neighbor joining is a distance matrix method which will produce an unrooted tree without the assumption of an evolutionary clock. This method is very fast, it can handle large data sets.

Dnadist : It’s a distance matrix method which can be used to find the distances between nucleic acid sequences. This can give the percentage similarity among the sequences.

Protdist: Computes distance between the protein sequences uses maximum likelihood method.

Restdist: Computes the distance calculated from restriction sites data and restriction fragment data.

Drawgram: It estimates the rooted phylogeny, cladograms, circular trees in a wide variety.

Drawtree: It estimates the unrooted phylogeny similar to Drawgram.

Cite this Simulator: