1. Go to simulator tab to access the GEO database.
Figure 1: Home page of GEO
2. Querying can be made by entering any keywords like disease name or author name or dataset type into the GEO dataset or GEO profile query box or the GEO accession number (if known) can be typed to the accession number query box in the GEO navigation box. Here the user construct query for autism.
Figure 2: Searching for autism
3. User can apply limit to their search to make search more specific to the query. Advanced Search option allows user to perform more detailed and complex query. It is also possible to save our search records by logging into NCBI.
4. The result page will contain Organism name with its Expression type profile, Platform record number, Series record number, Dataset accession number, ID number and options to download data. It also provides links to access PubMed, GEO Profiles and other similar studies data.
5. The Filter option on the right side of the result page provides to filter the data by Dataset, Platform, Samples and Series data. User can manage organisms from the “Top Organisms” option, which will filter the results with user preferred organism.
Figure3: GEO Result Page
6. There are different alternative formats available for displaying the information which can be selected by clicking on the drop down options next to Display button as shown in Figure 4.
Figure 4: Different Display options
7. By selecting the respective result of interest from the list of results displayed, will enable the user to find more detailed information. The GEO dataset browser displays the results with its Dataset title, Summary report, Organism name, Platform, Citation, Reference Series (original series in which Dataset is based ), number of samples count, type of values and the date on which original series was made public.
8. There are different options to download data which are displayed on the right side. The Dataset full SOFT file contains dataset information with experimental variable subsets, expression value measurements, and updated gene annotation for the dataset platform. The Series family Soft file contains original submitter supplied records. These records are supplied as plain text or tab-delimited format. The Series family MiNiML file contains the complete original submitter supplied records in XML format. The Annotation Soft file contains complete updated annotation for the Dataset Platform.
9. The Thumbnail cluster image on the right side provides the default cluster heatmap. It is a graphical visualization tool to visualize high dimensional data.
Figure 5: Dataset Browser Result page
10. There are different Data Analysis Tools available in the Dataset Browser. The “Find Genes” enables the user to redirect to their respective GEO Profiles. The “Find genes that are up/down for this condition” enables the user to select the genes that are differentially expressed according to experimental subsets. It will categorize the results according to experimental conditions like genotype/variation or diseased state. User can select or deselect the check boxes according to their preference.
Figure 6: Data Analysis Tool for “Find genes”
11. The “Compare 2 sets of samples” provides to try a t-test. User need to identify genes with differences in expression level from two set of samples (Group A and Group B). Step 1 enables to select the test and its significance level. Step 2 will manage which of the samples to be in A or B. The colored block provides information on experimental variable subsets within Dataset. Selecting the respective sample accession number will visualize sample individually.
12. The “Query group A Vs B” will calculate the T-Test score or means fold difference. Genes that satisfy the user preferred conditions will be displayed in GEO Profiles.
Figure 7: Data Analysis Tool for “Compare 2 sets of samples”
13. The “Cluster heatmap” is a graphical visualization tool which enables to mine, cluster and visualize high-dimensional data.
Figure 8: Cluster heat map
14. The “Experimental design and value distribution “visualizes the graph in terms of box plot, which depicts the distribution of expression values of each sample within a Dataset.
Figure 9: Data Analysis Tool for “Experimental design and value distribution"