Problems
in Affymetrix GeneChip annotation
The sequences used on the 8K Affymetrix GeneChip
are annotated using either the BAC accession number or the
mRNA accession number from NCBI. For example, the annotation
for probe set 12528_at is "AC007168 /FEATURE=mRNA-11
/GENE=T26C19.14 /LABEL= /PRODUCT= /STRAND=forward /DEFINITION=Arabidopsis
thaliana chromosome II BAC T26C19 genomic sequence, complete
sequence." Since the assignment of AGI names to the Arabidopsis
genes has provided a unique naming system that is more precise
and standardized, we have made an effort to assign AGI names
to each of the sequences represented on the 8K Affymetrix
GeneChip, that has been widely used in the plant community. |
|
We initially blasted the Affymetrix sequences
from the NetAffx web site against
the TIGR and MIPS databases. In some cases, problems existed
that made it difficult to identify the correct AGI name for
the Affymetrix sequence. Three types of inconsistency were
identified between the annotation provided by Affymetrix (8247
probe sets) and the results gathered through blasting the
Affymetrix target sequences against the TIGR database. |
|
Type 1 |
Some of the BAC accession numbers have a number
of protein prediction errors. For example, half of the protein
sequence predicted from the NCBI BAC accession number of probe
12518_at hits At4g17895 and half hits At4g17890 (both near
100%). To provide the correct information, we carried out
the blast search using the specific GeneChip sequences, which
are provided by the NetAffx
web site, rather than the accession numbers provided by Affymetrix
for each probe. |
|
Type 2 |
The same DNA sequence has two different AGI
names in TIGR and MIPS, respectively. As an example, the probe
12528_at corresponds to At2g22200 in TIGR and At2g22210 in
MIPS database. For consistency, when a discrepancy arose,
we used the TIGR AGI name since it has been updated with cDNA
information more recently. |
|
Type 3 |
The target sequence is too short and not unique,
hitting more than one gene. For example, the NetAffx sequence
for 16940_at is just 101 b, and blasting hits both At2g33110
and At2g33120, which are unique genes. Due to the similarity
of the coding sequence, mRNA from both genes should bind to
the probe. |
|