Database

The Arabidopsis MAPK Cascade and Signal Transduction Database (MAPKDB)

We have integrated genomic information from public database resources and generated the Arabidopsis MAPK Cascade and Signal Transduction Database - MAPKDB. The database provides detailed descriptions of Arabidopsis genes important for the regulation of plant signaling networks. For example, the Arabidopsis genes covered in our database perform essential functions in MAPK cascade signaling, two-component signaling, calcium sensing and signaling, G-protein signaling, and transcription.

 

Detailed information for possible members of each gene family was gathered by first using well-defined sequences of each gene family for BLASTP/RPS-BLAST searches against the NCBI nr database. The information from the SMART database was obtained through CDD, a service provided by NCBI. For gene family members that are not described in the SMART database, we generated a Perl program to fetch both the GenBank report for each gene and specific NCBI taxonomy reports for genes in Arabidopsis thaliana. We then use each protein sequence to blast against the MIPS or TIGR database to obtain the official AGI name of each gene in our database. We use the AGI number as the reference key to pull together all relevant NCBI information.

 

MAPKDB features eight services:

 

1. Gene category:

Organized genomic information on Arabidopsis MAPK cascades, two-component signaling systems, calcium sensors, G-proteins, and transcription factors.

 

2. MAPK Cascade Mutants:

Contains the list of available mutants of MAPK, MAPKK and MAPKKK.

 

3. Integrated Gene Functional Information:

To facilitate global gene expression analysis and extract useful information from massive amount of data (e.g., generated by Affymetrix GeneChips and microarrays) to reveal biological insights, we have decided to initiate an effort to classify Arabidopsis genes based on related functions. We have integrated the information collected from all the sources, including literature, websites and blast searches into a master gene functional list. The genes can also be searched using either AGI locus number, gene name or gene family name.

 

4. Affymetrix search (25K):

To facilitate global gene expression analysis and extract useful information from massive amount of data (e.g., generated by Affymetrix GeneChips and microarrays) to reveal biological insights, we have initiated an effort to classify Arabidopsis genes based on related functions. We have organized the information collected from literature, websites and blast searches into several tables. Further more, although in collaboration with TIGR, Affymetrix has provided better and informative annotation for its ATH1 Genome Array. As the annotaiton was done before TIGR's third major release (3.0) of the Arabidopsis genome annotation. The Affymetrix annotation has missed new genes added since version 2.0 and other important updates.

 

5. Affymetrix search (8K):

Based on the currently available resources, we have found incorrect or ambiguous annotation for a large number of Arabidopsis genes on the Affymetrix Genechip. Although many papers on global gene expression profiling have been published based on the existing annotation of Arabidopsis Affymetrix Genechip, the problems of incorrect or ambiguous annotation have never been brought up until NetAffx provided the actual GeneChip sequences. Three types of annotation problems are described in problems in Affymetrix GeneChip annotation. To eliminate the problems, we have written a Perl program to systematically gather the correct information. A new list (generated through a multi-step strategy as described in the strategy for determining Affymetrix annotation) with matching AGI names and GeneChip IDs and relevant information can now be searched or downloaded through our database. An updated GeneChip annotation generated independently using a different approach can also be searched or downloaded from the Schroeder lab web site.

 

6. Related links:

Many useful links are provided here, including main databases, electronic journals, bioinformatics on-line tools, T-DNA resources, tools for cloning, motif recognition and prediction tools, and protein structure prediction tools.

 

7. Bioinformatics tools:

 

Promoter Analysis and Search Tools:

 

How to Identify New & Important Cis-Elements by Alex Kazberouk & Mike Zhang

We have written three types of Perl programs that are available upon request (chu@molbio.mgh.harvard.edu).

A. For systematic retrieval and organization of specific information about T-DNA insertion lines from the SALK Institute's SALK T-DNA express web site. Provides a multiple AGI name search interface.

 

This program is used for parsing the search result into a tabulating format, which includes the T-DNA insertion clone name, the chromosome of the insertion, the precise physical location (e.g., exon, intron or 300 bp UTR region), and the direction of the insertion.

 

B. For systematic searching and retrieval of information about T-DNA insertion lines from TRMI/Syngenta.

 

The requesting program automatically generates an html form with the query sequence, which can then be used to submit a blast search request to TMRI T-DNA blast server.

The retrieval program parses e-mail sent back from TMRI blast server. This program can parse and fetch information generated by the blast search, including the T-DNA insertion clone name, blast score, overlapping region between query and target sequences, direction of the insertion, and the precise insertion location (e.g., in exon, intron or UTR region).

 

C. For systemic gathering of public domain gene information related to Affymetrix GeneChip annotation

 

Four Perl programs were written to perform this function.

The first program performs an automatic blast search that blasts the target sequences downloaded from the NetAffx web site against the TIGR nucleotide sequence database. The results are filtered with the criteria of 97 percent identity based on whole query length. We use this program to obtain the AGI name from the TIGR database for each corresponding Affymetrix GeneChip ID.

The second program uses the nucleotide sequences fetched from the TIGR database to perform an automatic blast search against the UniGene database download from NCBI. This program matches all available Arabidopsis mRNA/EST information collected from NCBI to genes on the Affymetrix GeneChip.

The third program uses the TIGR protein sequences to perform automatic blast searches against the SMART and PFAM databases, and to fetch all related domain information that matches the set E-value criteria. This blast search provides new and insightful information about those genes on the Affymetrix GeneChip with ambiguous or incorrect annotation in the TIGR database.

The fourth program uses the TIGR protein sequences to perform automatic blast searches against the NCBI GenPept database, fetching proteins with 100 percent match from NCBI. Because the protein accession number corresponding to each AGI name cannot be fetched directly by searching the NCBI web page, this program provides a direct link between the NCBI protein accession number and the AGI name from TIGR.