ENCODE DNase-seq Track Settings
 
ENCODE DNase-seq Peaks and Signal of Open Chromatin based on Uniform processing pipeline

Maximum display mode:       Reset to defaults   
Select view (Help):
Peaks ▾       Signal ▾      
Select subtracks by lab and cell line:
 All Lab Duke  UW  UW-Duke 
Cell Line
GM12878 
H1-hESC 
K562 
HeLa-S3 
HepG2 
HUVEC 
8988T 
A549 
AG04449 
AG04450 
AG09309 
AG09319 
AG10803 
AoAF 
AoSMC 
BE2 C 
BJ 
Caco-2 
CD20+ 
CD34+ Mobilized 
Chorion 
CLL 
CMK 
Fibrobl 
FibroP 
Gliobla 
GM06990 
GM12864 
GM12865 
GM12891 
GM12892 
GM18507 
GM19238 
GM19239 
GM19240 
H7-hESC 
H9ES 
HA-h 
HA-sp 
HAc 
HAEpiC 
HBMEC 
HCF 
HCFaa 
HCM 
HConF 
HCPEpiC 
HCT-116 
HEEpiC 
Hepatocytes 
HFF 
HFF-Myc 
HGF 
HIPEpiC 
HL-60 
HMEC 
HMF 
HMVEC-dAd 
HMVEC-dBl-Ad 
HMVEC-dBl-Neo 
HMVEC-dLy-Ad 
HMVEC-dLy-Neo 
HMVEC-dNeo 
HMVEC-LBl 
HMVEC-LLy 
HNPCEpiC 
HPAEC 
HPAF 
HPDE6-E6E7 
HPdLF 
HPF 
HRCEpiC 
HRE 
HRGEC 
HRPEpiC 
HSMM 
HSMM emb 
HSMMtube 
HTR8svn 
Huh-7 
Huh-7.5 
HVMF 
iPS 
Ishikawa 
Jurkat 
LNCaP 
MCF-7 
Medullo 
Melano 
Monocytes-CD14+ RO01746 
Myometr 
NB4 
NH-A 
NHDF-Ad 
NHDF-neo 
NHEK 
NHLF 
NT2-D1 
Osteobl 
PANC-1 
PanIsletD 
PanIslets 
pHTE 
PrEC 
ProgFib 
RPTEC 
RWPE1 
SAEC 
SK-N-MC 
SK-N-SH RA 
SkMC 
Stellate 
T-47D 
Th0 
Th1 
Th2 
Urothelia 
WERI-Rb-1 
WI-38 
Cell Line
 All Lab Duke  UW  UW-Duke 
Select subtracks further by: (select multiple categories and items - help)
Tier:

List subtracks: only selected/visible    all    ()
  view↓1 Tier↓2 Cell Line↓3 Lab↓4   Track Name↓5  
 
dense
 Configure
 Peaks  1  GM12878  UW-Duke  GM12878 DNaseI FDR 1% Uniform Peak Calls from UW-Duke    Data format 
 
dense
 Configure
 Signal  1  GM12878  Duke  GM12878 DNaseI signals from Duke    Data format 
 
dense
 Configure
 Signal  1  GM12878  UW  GM12878 DNaseI signals from UW    Data format 
 
dense
 Configure
 Peaks  1  K562  UW-Duke  K562 DNaseI FDR 1% Uniform Peak Calls from UW-Duke    Data format 
 
dense
 Configure
 Signal  1  K562  Duke  K562 DNaseI signals from Duke    Data format 
 
dense
 Configure
 Signal  1  K562  UW  K562 DNaseI signals from UW    Data format 
    
Assembly: Human Feb. 2009 (GRCh37/hg19)


Note: ENCODE Project

Description

This set of data tracks represents the set of Open chromatin elements and signals based on DNase-seq data from the Duke University and University of Washington (UW) production groups. The data tracks were generated by the ENCODE Analysis Working Group (AWG) based on a uniform processing pipeline. The datasets are based on the January 2011 internal data freeze. These datasets were used in all downstream analysis pipelines by members of the ENCODE Consortium and are one of the primary sources of data referenced in the ENCODE Integrative analyis paper (ENCODE Project Consortium, 2012).

Methods

To reduce a source of variability between UW and Duke DNaseI processed datasets, the Stam Lab (UW) did the following with the Duke DNase-seq aligned reads, to more closely match UW processing:

  • Combined all replicates for a given cell type
  • Subsampled the result at a level of 30 million tags
  • Ran results through the Stam Lab hotspot pipeline

First, hotspots (i.e. broad, variable-sized regions of generalized chromatin accessibility) were identified using a relaxed threshold. FDR 1% peaks (narrowPeaks) were generated by first thresholding hotspots (using random simulation) at FDR 1%, and then locating local maxima of the tag density (150 bp window, sliding every 20 bp) within the hotspots. FDR 1% peaks were set to a fixed width of 150 bp.

The combined calls are just a union of the calls for both groups, except for those 14 cell types where both groups have data. In the latter case, a collapsed set of FDR 1% peaks were generated by taking a non-overlapping selection of the calls from both centers and giving preference to the peak with the higher z-score when calls overlapped. A collapsed set of hotspots on these cell types was generated by simply merging the calls from both centers (taking the union interval of overlapping intervals).

Uniform signal was generated by processing the aligned reads using the align2rawsignal "Wiggler" software (see http://code.google.com/p/align2rawsignal for details and settings). The method accounts for the depth of sequencing, the mappability of the genome (based on read length and ambiguous bases) and different fragment length shifts for the different datasets being combined. It also differentiates between positions that showed zero signal simply because they are unmappable and positions that are mappable but have no reads.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.

There is no restriction on the use of these specific tracks.

References

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 Sep 6;489(7414):57-74.

Contact

Bob Thurman