Schema for EvoFold - EvoFold Predictions of RNA Secondary Structure
  Database: hg19    Primary Table: evofold    Row Count: 47,510   Data last updated: 2010-12-13
Format description: Browser extensible data
fieldexampleSQL type info description
bin 591smallint(6) range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 896910int(10) unsigned range Start position in chromosome
chromEnd 896935int(10) unsigned range End position in chromosome
name 608_0_-_96varchar(255) values Name of item
score 96int(10) unsigned range Score from 0-1000
strand -char(1) values + or -
size 25int(10) unsigned range Size of element.
secStr (((((.....(((...))).)))))longblob   Parentheses and '.'s which define the secondary structure
conf 0.97,0.98,0.99,0.99,0.9,0.7...longblob   Confidence of secondary-structure annotation per position (0.0-1.0).

Sample Rows
 
binchromchromStartchromEndnamescorestrandsizesecStrconf
591chr1896910896935608_0_-_9696-25(((((.....(((...))).)))))0.97,0.98,0.99,0.99,0.9,0.78,0.87,0.99,0.98,0.98,0.21,0.22,0.11,0.99,0.95,0.91,0.11,0.22,0.21,0.98,0.9,0.99,0.99,0.98,0.97
591chr1898554898572617_0_+_156156+18((((((......))))))0.97,0.99,0.99,1.0,0.99,0.92,1.0,1.0,0.99,1.0,1.0,1.0,0.92,0.99,1.0,0.99,0.99,0.97
593chr1110440211044501087_0_+_6262+48((((((((((..(((((((..........)))))))..))))))))))0.92,0.98,0.99,1.0,1.0,1.0,0.99,0.98,0.94,0.77,0.97,0.9,0.34,0.79,0.91,0.91,0.89,0.8,0.37,0.98,0.99,0.99,0.99,0.98,1.0,1.0,1.0,0 ...
593chr1111945611194741109_0_+_111111+18((((((......))))))0.93,0.98,0.99,0.99,0.99,0.92,0.84,0.99,0.99,0.99,0.99,0.84,0.92,0.99,0.99,0.99,0.98,0.93
593chr1116859511686191179_1_+_112112+24((((...((.........))))))0.96,0.98,0.99,0.96,0.98,0.99,0.97,0.98,0.91,1.0,1.0,1.0,0.99,1.0,0.99,0.99,1.0,1.0,0.91,0.98,0.96,0.99,0.98,0.96
594chr1119241411924521227_0_+_8282+38((((((((......................))))))))0.98,1.0,1.0,1.0,1.0,0.99,0.99,0.93,0.94,0.94,0.98,0.98,0.95,0.93,1.0,0.98,0.97,0.97,0.94,0.99,1.0,0.98,0.98,0.94,0.97,0.85,0.94 ...
594chr1119245311924941227_1_+_107107+41((.((..........(((((..........))))).)).))0.98,1.0,1.0,0.99,0.96,0.52,0.59,0.98,1.0,0.98,1.0,1.0,0.59,0.58,0.99,0.89,0.92,0.92,0.92,0.89,0.79,0.98,0.99,1.0,0.98,0.98,1.0, ...
594chr1124725412472921365_0_+_111111+38(((((....(((((.............))))).)))))0.96,1.0,1.0,0.99,0.94,0.95,0.97,0.99,0.93,0.4,0.41,0.53,0.53,0.51,0.88,0.99,0.99,0.99,0.92,0.93,0.87,0.94,0.97,1.0,0.98,1.0,0.9 ...
594chr1124915712491731382_0_-_119119-16((((((....))))))0.96,0.98,0.98,0.98,0.95,0.42,0.99,0.99,0.99,0.99,0.42,0.95,0.98,0.98,0.98,0.96
594chr1125470912547471391_0_+_147147+38(((((((......((((((...))...)))))))))))0.98,1.0,1.0,1.0,0.98,0.96,0.91,0.66,0.99,0.77,0.76,0.83,0.87,0.37,0.39,0.38,0.21,0.18,0.18,0.87,0.88,0.88,0.18,0.18,0.74,0.79,0 ...

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

EvoFold (evofold) Track Description
 

Description

This track shows RNA secondary structure predictions made with the EvoFold program, a comparative method that exploits the evolutionary signal of genomic multiple-sequence alignments for identifying conserved functional RNA structures.

Display Conventions and Configuration

Track elements are labeled using the convention ID_strand_score. When zoomed out beyond the base level, secondary structure prediction regions are indicated by blocks, with the stem-pairing regions shown in a darker shade than unpaired regions. Arrows indicate the predicted strand. When zoomed in to the base level, the specific secondary structure predictions are shown in parenthesis format. The confidence score for each position is indicated in grayscale, with darker shades corresponding to higher scores.

The details page for each track element shows the predicted secondary structure (labeled SS anno), together with details of the multiple species alignments at that location. Substitutions relative to the human sequence are color-coded according to their compatibility with the predicted secondary structure (see the color legend on the details page). Each prediction is assigned an overall score and a sequence of position-specific scores. The overall score measures evidence for any functional RNA structures in the given region, while the position-specific scores (0 - 9) measure the confidence of the base-specific annotations. Base-pairing positions are annotated with the same pair symbol. The offsets are provided to ease visual navigation of the alignment in terms of the human sequence. The offset is calculated (in units of ten) from the start position of the element on the positive strand or from the end position when on the negative strand.

The graphical display may be filtered to show only those track elements with scores that meet or exceed a certain threshhold. To set a threshhold, type the minimum score into the text box at the top of the description page.

Methods

Evofold makes use of phylogenetic stochastic context-free grammars (phylo-SCFGs), which are combined probabilistic models of RNA secondary structure and primary sequence evolution. The predictions consist of both a specific RNA secondary structure and an overall score. The overall score is essentially a log-odd score between a phylo-SCFG modeling the constrained evolution of stem-pairing regions and one which only models unpaired regions.

The predictions for this track were based on the conserved elements of an 8-way vertebrate alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebrafish, and Fugu assemblies. NOTE: These predictions were originally computed on the hg17 (May 2004) human assembly, from which the hg16 (July 2003), hg18 (May 2006), and hg19 (Feb 2009) predictions were lifted. As a result, the multiple alignments shown on the track details pages may differ from the 8-way alignments used for their prediction. Additionally, some weak predictions have been eliminated from the set displayed on hg18 and hg19. The hg17 prediction set corresponds exactly to the set analyzed in the EvoFold paper referenced below.

Credits

The EvoFold program and browser track were developed by Jakob Skou Pedersen of the UCSC Genome Bioinformatics Group, now at Aarhus University, Denmark.

The RNA secondary structure is rendered using the VARNA Java applet.

References

EvoFold

Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006 Apr;2(4):e33. PMID: 16628248; PMC: PMC1440920

Phylo-SCFGs

Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999 Jun;15(6):446-54. PMID: 10383470

Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004;32(16):4925-36. PMID: 15448187; PMC: PMC519121

PhastCons

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216