## Objective:

To analyze the topological features of biological networks.

## Theory

Cell is the most important principle unit of life, where several reactions take place. Cell contains the genetic material DNA that encodes for proteins. Proteins carry out many important functions, some of which include: acting as enzymes, providing energy for metabolic activities, giving shape to the cell, and enabling motility to cell. Along with these activities, it plays a very important role in cell signalling. All these functions and processes can be well understood with the help of genome analysis. During the last few years, we have witnessed the availability of complete genome sequences and post-genomics experimental data. With the availability of huge data, several networks at molecular level like gene regulatory network, metabolic network, etc., have come into existence. Targeting modeling at these networks would help us to understand the underlying phenomena in detail.

**How to represent these complex pathways or networks on a computer?**

In mathematics or computer science perspective, a network is represented as a graph. It has nodes otherwise called as vertices which are linked to each other through edges.

**Network** is simply a collection of nodes connected via edges. Minimum information required to form a network is connectivity rules i.e., which node connects to which node. Speaking in biological terms, the links might refer to function of a particular protein or gene (nodes) or their expression patterns. Networks are collections of interactions which contain pathways that are interlinked. All the pathways are subsets of networks which means, in a network pathways are interlinked.

Figure 1: Example of a complex biological network

**Why to design or model a biological network?**

Modeling a biological network would represent a biological scenario like a pathway and can help us to bring out the hidden properties of that system. This model can help us predict the dynamic behavor of the network which could be comparable with the experimental results. Accuracy of the model can be increased or it can be corrected with these predictions.

Biological pathways are grouped into networks and these in turn are classified into metabolic networks, cell signaling networks, gene regulatory networks and protein-protein interaction networks. To study such complex process, visualization of entire process is very much important. Thus modeling, reverse engineering and analysis of these macromolecular complex networks have interested computational biologists, to develop specific tools like cytoscape, cell designer, E-cell, J-Designer etc.

Cytoscape is an open source platform to visualise and analyze complex networks, which is protected under GNU LGPL (Lesser General Public License). It is available to all platforms. It is also extensible through plug-in architecture for different computational analyses. These biological networks are organised as graphs, connecting each node to another node through edges (called as interactions). This software becomes more powerful when it is used in connection with large databases of genetic interactions, protein-protein interactions.

### Analysis of Network

As the network model is built, first and foremost basic analysis is network topology. Topological analysis helps to understand the structure of a network which facilitates in understanding the hidden mechanisms.

In a network, every node is analyzed using cytoscape plugin called cytoscape. It is a software which has the facility to install external tools as a plugin for the analysis of networks.

"Network Analyzer" is the plugin for cytoscape which analyzes network degree, clustering coefficient, number of self loops and variety of other attributes or parameters. It also computes edge parameters."Network Analyzer" performs analysis on a network, which contain directed edges as well as undirected edges. Cytoscape needs the user input to analyze the network as directed network or as undirected network. We can analyze the different parameters for a network using this plugin.

There are two different network parameters: simple and complex.

**Simple network **parameters explain statistics of the network.

**Complex network** parameters explain and show distributions in graphical representations.

Analyzing network helps to see the number of connected components, parameters related to shortest paths, neighborhood parameters and clustering coefficient.

**Number of connected components** indicates connectivity in the network. Lower number suggests stronger connectivity. Two nodes are connected in a undirected network, if there is a path of edges between them. Nodes that connect pairwise form connected component.

**Shortest path related parameters:** Each node may be connected to multiple paths. The length of the path is the number of edges forming it. Shortest path length is the distance between two nodes having minimum edge length. Average shortest path length also called as characteristic path length gives expected distance between two connected nodes.

The maximum length of the shortest path between two nodes defined as network diameter. The network radius is defined as minimum among the non ÃƒÂ¢Ã¢â€šÂ¬Ã¢â‚¬Å“zero eccentricities (maximum non infinite length of a shortest path between two nodes in the network) of the nodes in network.

**Neighborhood parameters: **The neighborhood of a node is the set of its neighbors. Average number of neighbors indicates average connectivity of a node in network. Network density shows how dense the network is populated. , it ranges between 0 and 1.

**Clustering coefficient: **In undirected and directed network, the clustering coefficient is a ratio of number of edges between the neighbours of nodes / maximum number of edges that could possibly exist between the nodes. The value always exists between 0 and 1. So, the total network clustering coefficient will be average of clustering coefficients for all the nodes in the network.

Analyzing the network also gives and idea on the different distributions such as degree distributions, neighborhood connectivity distribution, shortest path distribution, etc..

**Degree distributions : **Degree of a node is number of edges linked to it. If there is a self loop, it will be counted as two edges for the degree. Distribution gives number of nodes with the degree which can be seen only in undirected network. In case of directed networks, it is considered as number of incoming edges (in-degree) and number of outgoing edges (out-degree). The distribution is as similar as undirected networks.

**Neighborhood connectivity distribution **gives the average of the neighborhood connectivity of all the nodes with the number of neighbors. The connectivity of a single node is the number of its neighbors. Neighborhood connectivity of a single node is defined as average connectivity of all neighbors. If network is directed and analyzed, it has in and out connectivity. There are three types of neighborhood connectivity.

1. Only in ÃƒÂ¢Ã¢â€šÂ¬Ã¢â‚¬Å“ average out connectivity of all in neighbors of n.

2. Only out- average in connectivity of all out neighbors of n.

3. In and out ÃƒÂ¢Ã¢â€šÂ¬Ã¢â‚¬Å“ average connectivity of all neighbors of n.

**Topological coefficient** is a measure for the extent to which a node shares neighbors with other nodes. The analyzer computes the topological coefficients for all nodes with more than one neighbor in the network.

Network analyzer provides one more useful feature to "Fit line" on the data points of some complex parameters. It uses the method of Least squares for linear regression. Fitting a line is used to identify the dependencies between the x and y axis coordinates.