Problems in Affymetrix GeneChip annotation

The sequences used on the 8K Affymetrix GeneChip are annotated using either the BAC accession number or the mRNA accession number from NCBI. For example, the annotation for probe set 12528_at is "AC007168 /FEATURE=mRNA-11 /GENE=T26C19.14 /LABEL= /PRODUCT= /STRAND=forward /DEFINITION=Arabidopsis thaliana chromosome II BAC T26C19 genomic sequence, complete sequence." Since the assignment of AGI names to the Arabidopsis genes has provided a unique naming system that is more precise and standardized, we have made an effort to assign AGI names to each of the sequences represented on the 8K Affymetrix GeneChip, that has been widely used in the plant community.


We initially blasted the Affymetrix sequences from the NetAffx web site against the TIGR and MIPS databases. In some cases, problems existed that made it difficult to identify the correct AGI name for the Affymetrix sequence. Three types of inconsistency were identified between the annotation provided by Affymetrix (8247 probe sets) and the results gathered through blasting the Affymetrix target sequences against the TIGR database.

Type 1

Some of the BAC accession numbers have a number of protein prediction errors. For example, half of the protein sequence predicted from the NCBI BAC accession number of probe 12518_at hits At4g17895 and half hits At4g17890 (both near 100%). To provide the correct information, we carried out the blast search using the specific GeneChip sequences, which are provided by the NetAffx web site, rather than the accession numbers provided by Affymetrix for each probe.

Type 2

The same DNA sequence has two different AGI names in TIGR and MIPS, respectively. As an example, the probe 12528_at corresponds to At2g22200 in TIGR and At2g22210 in MIPS database. For consistency, when a discrepancy arose, we used the TIGR AGI name since it has been updated with cDNA information more recently.

Type 3

The target sequence is too short and not unique, hitting more than one gene. For example, the NetAffx sequence for 16940_at is just 101 b, and blasting hits both At2g33110 and At2g33120, which are unique genes. Due to the similarity of the coding sequence, mRNA from both genes should bind to the probe.