Schema for Exoniphy - Exoniphy Mouse/Rat/Human/Dog
  Database: mm9    Primary Table: exoniphy    Row Count: 172,859   Data last updated: 2007-09-05
Format description: A gene prediction with some additional info.
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 609smallint(5) unsigned range Indexing field to speed chromosome range queries.
name chr8.56500001.9varchar(255) values Name of gene (usually transcript_id from GTF)
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
strand -char(1) values + or - for strand
txStart 3206102int(10) unsigned range Transcription start position (or end position for minus strand item)
txEnd 3207049int(10) unsigned range Transcription end position (or start position for minus strand item)
cdsStart 3206102int(10) unsigned range Coding region start (or end position for minus strand item)
cdsEnd 3207049int(10) unsigned range Coding region end (or start position for minus strand item)
exonCount 1int(10) unsigned range Number of exons
exonStarts 3206102,longblob   Exon start positions (or end positions for minus strand item)
exonEnds 3207049,longblob   Exon end positions (or start positions for minus strand item)
id 76974int(10) unsigned range  
name2 chr8.56500001.9varchar(255) values Alternate name (e.g. gene_id from GTF)
cdsStartStat incmplenum('none', 'unk', 'incmpl', 'cmpl') values Status of CDS start annotation (none, unknown, incomplete, or complete)
cdsEndStat cmplenum('none', 'unk', 'incmpl', 'cmpl') values Status of CDS end annotation (none, unknown, incomplete, or complete)
exonFrames 1,longblob   Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region.

Sample Rows
 
binnamechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEndsidname2cdsStartStatcdsEndStatexonFrames
609chr8.56500001.9chr1-320610232070493206102320704913206102,3207049,76974chr8.56500001.9incmplcmpl1,
611chr8.56400001.4chr1-341178234119823411782341198213411782,3411982,76973chr8.56400001.4incmplincmpl2,
612chr8.56100001.13chr1-366063236610123660632366101213660632,3661012,76972chr8.56100001.13incmplincmpl0,
612chr8.56100001.11chr1-366276336627753662763366277513662763,3662775,76971chr8.56100001.11incmplincmpl1,
615chr8.55900001.9chr1-399773639978183997736399781813997736,3997818,76970chr8.55900001.9incmplincmpl1,
615chr8.55900001.8chr1-400915040091844009150400918414009150,4009184,76969chr8.55900001.8incmplincmpl0,
615chr8.55900001.7chr1-401481640149714014816401497114014816,4014971,76968chr8.55900001.7incmplincmpl1,
615chr8.55900001.2chr1-403211440321884032114403218814032114,4032188,76967chr8.55900001.2incmplincmpl0,
616chr8.55800001.16chr1-408269740828614082697408286114082697,4082861,76966chr8.55800001.16incmplincmpl1,
616chr8.55800001.11chr1-411009541101544110095411015414110095,4110154,76965chr8.55800001.11incmplincmpl2,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Exoniphy (exoniphy) Track Description
 

Description

The exoniphy program identifies evolutionarily conserved protein-coding exons in a multiple alignment using a phylogenetic hidden Markov model (phylo-HMM), a statistical model that simultaneously describes exon structure and exon evolution. This track shows exoniphy predictions for the human Mar. 2006 (hg18), mouse Feb. 2006 (mm8), rat Nov. 2004 (rn4), and dog May 2005 (canFam2) genomes, as aligned by the multiz program. For this track, only alignments on the "syntenic net" between human and each other species were considered.

The predictions for mouse Feb. 2006 (mm8) were lifted to the mouse Jul. 2007 (mm9) genome.

Methods

For a description of exoniphy, see Siepel et al. (2004). Multiz is described in Blanchette et al. (2004). The alignment chaining methods behind the "syntenic net" are described in Kent et al. (2003).

Acknowledgments

Thanks to Brona Brejova of Cornell University for producing these predictions.

References

Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317

Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784

Siepel A, Haussler D. Computational identification of evolutionarily conserved exons. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, 177-186 (2004).