The Arabidopsis MAPK Cascade and Signal Transduction
Database (MAPKDB)
We have integrated genomic information
from public database resources and generated the Arabidopsis
MAPK Cascade and Signal Transduction Database - MAPKDB. The
database provides detailed descriptions of Arabidopsis genes
important for the regulation of plant signaling networks.
For example, the Arabidopsis genes covered in our database
perform essential functions in MAPK cascade signaling, two-component
signaling, calcium sensing and signaling, G-protein signaling,
and transcription. |
|
Detailed information for possible
members of each gene family was gathered by first using well-defined
sequences of each gene family for BLASTP/RPS-BLAST searches
against the NCBI nr database. The information from the SMART
database was obtained through CDD, a service provided by NCBI.
For gene family members that are not described in the SMART
database, we generated a Perl program to fetch both the GenBank
report for each gene and specific NCBI taxonomy reports for
genes in Arabidopsis thaliana. We then use each protein
sequence to blast against the MIPS or TIGR database to obtain
the official AGI name of each gene in our database. We use
the AGI number as the reference key to pull together all relevant
NCBI information. |
|
MAPKDB features eight services: |
|
1. Gene
category:
Organized genomic information on Arabidopsis MAPK
cascades, two-component signaling systems, calcium sensors,
G-proteins, and transcription factors. |
|
2. MAPK
Cascade Mutants:
Contains the list of available mutants of MAPK,
MAPKK and MAPKKK. |
|
3. Integrated
Gene Functional Information:
To facilitate global gene expression analysis and
extract useful information from massive amount of data (e.g.,
generated by Affymetrix GeneChips and microarrays) to reveal
biological insights, we have decided to initiate an effort
to classify Arabidopsis genes based on related functions.
We have integrated the information collected from all the
sources, including literature, websites and blast searches
into a master gene functional list. The genes can also be
searched using either AGI locus number, gene name or gene
family name. |
|
4. Affymetrix
search (25K):
To facilitate global gene expression analysis and
extract useful information from massive amount of data (e.g.,
generated by Affymetrix GeneChips and microarrays) to reveal
biological insights, we have initiated an effort to classify
Arabidopsis genes based on related functions. We have organized
the information collected from literature, websites and blast
searches into several tables. Further more, although in collaboration
with TIGR, Affymetrix has provided better and informative
annotation for its ATH1 Genome Array. As the annotaiton was
done before TIGR's third major release (3.0) of the Arabidopsis
genome annotation. The Affymetrix annotation has missed new
genes added since version 2.0 and other important updates. |
|
5. Affymetrix
search (8K):
Based on the currently available resources, we
have found incorrect or ambiguous annotation for a large number
of Arabidopsis genes on the Affymetrix Genechip. Although
many papers on global gene expression profiling have been
published based on the existing annotation of Arabidopsis
Affymetrix Genechip, the problems of incorrect or ambiguous
annotation have never been brought up until NetAffx provided
the actual GeneChip sequences. Three types of annotation problems
are described in problems
in Affymetrix GeneChip annotation. To eliminate the problems,
we have written a Perl program to systematically gather the
correct information. A new list (generated through a multi-step
strategy as described in the
strategy for determining Affymetrix annotation) with matching
AGI names and GeneChip IDs and relevant information can now
be searched or downloaded through our database. An updated
GeneChip annotation generated independently using a different
approach can also be searched or downloaded from the Schroeder
lab web site. |
|
6. Related
links:
Many useful links are provided here, including
main databases, electronic journals, bioinformatics on-line
tools, T-DNA resources, tools for cloning, motif recognition
and prediction tools, and protein structure prediction tools.
|
|
7. Bioinformatics tools: |
|
Promoter Analysis and Search Tools: |
|
How to Identify New & Important Cis-Elements by Alex Kazberouk & Mike Zhang
|
We have written three types of Perl programs that are available upon request (chu@molbio.mgh.harvard.edu).
A. For systematic retrieval and organization
of specific information about T-DNA insertion lines from the
SALK Institute's SALK T-DNA express web site. Provides a multiple
AGI name search interface. |
|
This program is used for parsing
the search result into a tabulating format, which includes
the T-DNA insertion clone name, the chromosome of the insertion,
the precise physical location (e.g., exon, intron or 300 bp
UTR region), and the direction of the insertion. |
|
B. For systematic searching and retrieval of
information about T-DNA insertion lines from TRMI/Syngenta.
|
|
The requesting program automatically
generates an html form with the query sequence, which can
then be used to submit a blast search request to TMRI T-DNA
blast server.
The retrieval program parses e-mail sent back from
TMRI blast server. This program can parse and fetch information
generated by the blast search, including the T-DNA insertion
clone name, blast score, overlapping region between query
and target sequences, direction of the insertion, and the
precise insertion location (e.g., in exon, intron or UTR region).
|
|
C. For systemic gathering of public domain gene
information related to Affymetrix GeneChip annotation
|
|
Four Perl programs were written
to perform this function.
The first program performs an automatic blast
search that blasts the target sequences downloaded from the
NetAffx web site against the TIGR nucleotide sequence database.
The results are filtered with the criteria of 97 percent identity
based on whole query length. We use this program to obtain
the AGI name from the TIGR database for each corresponding
Affymetrix GeneChip ID.
The second program uses the nucleotide sequences
fetched from the TIGR database to perform an automatic blast
search against the UniGene database download from NCBI. This
program matches all available Arabidopsis mRNA/EST information
collected from NCBI to genes on the Affymetrix GeneChip.
The third program uses the TIGR protein sequences
to perform automatic blast searches against the SMART and
PFAM databases, and to fetch all related domain information
that matches the set E-value criteria. This blast search provides
new and insightful information about those genes on the Affymetrix
GeneChip with ambiguous or incorrect annotation in the TIGR
database.
The fourth program uses the TIGR protein sequences
to perform automatic blast searches against the NCBI GenPept
database, fetching proteins with 100 percent match from NCBI.
Because the protein accession number corresponding to each
AGI name cannot be fetched directly by searching the NCBI
web page, this program provides a direct link between the
NCBI protein accession number and the AGI name from TIGR.
|
|
|