Schema for Hi Seq Depth - Regions of Exceptionally High Depth of Aligned Short Reads
  Database: hg19    Primary Table: hiSeqDepthTopPt1Pct    Row Count: 522   Data last updated: 2011-07-15
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 589smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 569874int(10) unsigned range Start position in chromosome
chromEnd 569954int(10) unsigned range End position in chromosome

Sample Rows
 
binchromchromStartchromEnd
589chr1569874569954
1285chr19185276191853155
1510chr1121354274121354349
1510chr1121355719121355801
1510chr1121357977121358058
1511chr1121478613121478695
1511chr1121483235121483314
1511chr1121483334121483400
1511chr1121484109121485434
1678chr1143278758143278832

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Hi Seq Depth (hiSeqDepth) Track Description
 

Description

This track displays regions of the reference genome that have exceptionally high sequence depth, inferred from alignments of short-read sequences from the 1000 Genomes Project. These regions may be caused by collapsed repetitive sequences in the reference genome assembly; they also have high read depth in assays such as ChIP-seq, and may trigger false positive calls from peak-calling algorithms. Excluding these regions from analysis of short-read alignments should reduce such false positive calls.

Methods

Pickrell et al. downloaded sequencing reads for 57 Yoruba individuals from the 1000 Genomes Project's low-coverage pilot data, mapped them to the Mar. 2006 human genome assembly (NCBI36/hg18), computed the read depth for every base in the genome, and compiled a distribution of read depths. They then identified contiguous regions where read depth exceeded thresholds corresponding to the top 0.001, 0.005, 0.01, 0.05 and 0.1 of the per-base read depths, merging regions which fall within 50 bases of each other. The regions are available for download from http://eqtl.uchicago.edu/Masking/ (see the readme file).

Credits

Thanks to Joseph Pickrell at the University of Chicago for these data.

References

Pickrell JK, Gaffney DJ, Gilad Y, Pritchard JK. False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions. Bioinformatics. 2011 Aug 1;27(15):2144-6. Epub 2011 Jun 19.