Improved annotation enriched with functional information:


To facilitate global gene expression analysis and extract useful information from massive amount of data (e.g., generated by Affymetrix GeneChips and microarrays) to reveal biological insights, we have decided to initiate an effort to classify Arabidopsis genes based on related functions. Although KEGG and TAIR (including GO, AraCyc and gene family databases) have provided substantial information and TIGR and MIPS have offered systematic and updated Arabidopsis gene annotations, there is still a need of community effort to provide more "biologically oriented" information for each individual researcher to effectively analyze expression patterns of the whole plant genome using free or commercially available programs. For instance, there are numerous published research articles and reviews and websites that contain rich and useful information. However, it is impossible for a single person or a single lab to acquire all the information efficiently and use them effectively. Although the information could be systematically collected and organized to some extend by computer programs, its quality and accuracy will be the highest if more researchers can participate in contributing, evaluating and editing the information as an open resource. The ultimate goal of our small effort is to encourage plant scientists working on diverse areas and metabolic and regulatory pathways of Arabidopsis to share their knowledge about gene functions that they have become familiar with through their own work. We believe that this organized information of Arabidopsis gene functions will be applicable to other plant species and useful for comparative plant genome analyses when more plant genome sequences are completed in the future. The resulting table can be downloaded here. Detailed information for each gene can also be obtained through searching our Affymetrix GeneChip ATH1 Genome Array annotation database.


In addition, we have also blasted the Affymetrix ATH1 target and/or probe sequences from the NetAffx web site against the TIGR databases. The systematic strategy used for determining the identity of NetAffx sequences is outlined below.


The strategy for determining the AGI name of the ATH1 probes:


Blast search was conducted with 22,746 GeneChip target and probe sequences using an E-value cutoff of 0.000001. Proper matches were established using the following criteria.

1. Using Affymetrix target sequences as query, we used these criteria: (100% identity AND match length >= 50) OR (98% identity AND match length equal to query length, i.e. the length of the target sequence). Found the largest number of matches: 22132.

2. For those Affymetrix target sequences that had no match in step 1 as query, we used these criteria: 98% identity, match length >= 50. Found 231 matches.

3. For those Affymetrix target sequences that had no match in step 1 or 2, we used these criteria: 98% identity, match length >= 30. Found 19 matches.

4. For those target sequences producing no matches using the above guidelines, the corresponding Affymetrix probe sequences was used as the query, using the following criteria: (100% identity AND match length >= 15) OR (98% identity AND match length equal to query length, i.e. the length of the target sequence). Found 222 matches.

5. For those Affymetrix probe sequences without matches in step 4, all 11 probes were used for the blast with the same criteria as above. Found 5 matches.

So the total number of matches from the blast above is 22609.

Comparison between the above result and Affymetrix's annotation revealed that there are 139 genes having different AGI numbers. When the AGI name changes in TIGR's database are taken into account, however there are a total of 133 genes with different AGI numbers.

The above blast result can be found in the following files:

tigr_annot_affy.all.xls: the annotation based on the above blast

comp_affy_tigr_annot.xls: the comparasion between the Affymetrix annotation and the TIGR blast results. The AGI names of When a probe set has different AGI numbers between Affymetrix annotation and TIGR blast result, the AGI is tagged with "!!!-!!!".

affy_new_agi_vs_tigr.xls: the comparasion between Affy annotation with
*updated AGI names* and TIGR blast results.