Schema for HAIB Genotype - Genotype (CNV and SNP) by Illumina 1MDuo and CBS from ENCODE/HudsonAlpha
  Database: hg19    Primary Table: wgEncodeHaibGenotypeHuvecRegionsRep2    Row Count: 100   Data last updated: 2011-04-08
Format description: BED9+1 Floating point score
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 1smallint(6) range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 20141int(10) unsigned range Start position in chromosome
chromEnd 35097520int(10) unsigned range End position in chromosome
name normalvarchar(255) values Name of item
score 0int(10) unsigned range Score from 0-1000
strand +char(1) values + or - or . for unknown
thickStart 20141int(10) unsigned range Start of where display should be thick (start codon)
thickEnd 35097520int(10) unsigned range End of where display should be thick (stop codon)
itemRgb 0int(10) unsigned range Color value R,G,B
logR -0.026float range logR value

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEnditemRgblogR
1chr12014135097520normal0+20141350975200,0,0-0.026
852chr13510726935108276homo.del0+3510726935108276255,0,0-4.2636
0chr1088874135235890normal0+888741352358900,0,0-0.0092
25chr10135242873135393215normal0+1352428731353932150,0,00.1688
1618chr10135402200135522754normal0+1354022001355227540,0,0-0.0568
9chr111551907817538normal0+15519078175380,0,0-0.0192
644chr1178184717819143het.del0+78184717819143255,153,51-0.9355
1chr11782222248724035normal0+7822222487240350,0,00
956chr114873385948737173amp0+48733859487371730,0,2550.6347
119chr114874761148924826normal0+48747611489248260,0,0-0.009

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

HAIB Genotype (wgEncodeHaibGenotype) Track Description
 

Description

This track is produced as part of the ENCODE project. The track displays copy number variation (CNV) as determined by the Illumina Human 1M-Duo Infinium HD BeadChip assay and circular binary segmentation (CBS). The Human 1M-Duo contains more than 1,100,000 tagSNP markers and a set of ~60,000 additional CNV-targeted markers. The median spacing between markers is 1.5 kb and the mean spacing is 2.4 kb. The B-allele frequency and genotyping single nucleotide polymorphism (SNP) data generated by the experiment are not displayed, but are available for download from the Downloads page.

Where applicable, biological replicates of each cell line are reported separately. Possible uses of the data include correction of copy number in peak-calling for ChIP-seq, transcriptome, DNase hypersensitivity, and methylation determinations.

Display Conventions and Configuration

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

The track displays regions of the genome where copy number variation has been assessed. CNV regions are colored by type:

  • blue = amplified
  • black = normal
  • orange = heterozygous deletion
  • red = homozygous deletion

The mean log R ratio for each region can be seen by clicking on each individual region. See Methods below for significance of log R ratio values. The mean log R ratio for each region is reported in the .bed file available for download.

The Illumina 1M-Duo B-allele frequency data is available from the Supplemental Materials directory on the Downloads page. The file (wgEncodeHaibGenotypeBalleleSnp.txt) was generated using the standard Illumina protocol and contains the B-allele frequency for all cell types tested. The genotype calls for all cell types tested are also available for download (wgEncodeHaibGenotypeGtypeSnp.txt). Genotyping calls with a Gencall value greater than 0.6 are considered significant.

Replicate Numbering

The replicate labeling in the genome browser view is a counter indicating the total number of replicates submitted (UCSC Rep). The producing lab has replicate numbers (Lab Rep) that correspond to their internal bio-replicate numbering. Where these two numbering systems conflict, both are listed in the long label of the specific track. When comparing data across tracks, the lab replicate number should be considered. In the downloads directory both replicate numbers are listed. The files are labeled with the lab replicate number.

Methods

Isolation of genomic DNA and hybridization

Cells were grown according to the approved ENCODE cell culture protocols by the Myers lab and by other ENCODE production groups. The production group is reported in the metadata. Genomic DNA was isolated using the DNeasy Blood and Tissue Kit (Qiagen). DNA concentration and quality were determined by fluorescence (Invitrogen Quant-iT dsDNA High Sensitivity Kit and Qubit Fluorometer), and 400 nanograms of each sample were hybridized to Illumina 1M-Duo DNA Analysis BeadChips.

Processing and Analysis

The genotypes from the 1M-Duo Arrays were ascertained with BeadStudio by using default settings and formatting with the A/B genotype designation for each SNP. Primary QC for each sample was a cut-off at a call rate of 0.95.

Copy Number Variation (CNV) analysis was performed with circular binary segmentation (DNAcopy) of the log R ratio values at each probe (Olshen et al., 2004). The parameters used were alpha=0.001, nperm=5000, sd.undo=1. The copy number segments are reported with the mean log R ratio for each chromosomal segment called by CBS. Log ratios of ~-0.2 to -1.5 can be considered heterozygous deletions, < -1.5 homozygous deletions, and > 0.2 amplifications. Primary QC for each sample was SD of < 0.6.

Release Notes

This is release 2 of this track (Jan 2012). This is a correction release. There are no new experiments. The affected tracks are:
wgEncodeHaibGenotypeGm12878RegionsRep1 - replaced by wgEncodeHaibGenotypeGm12878RegionsRep1V2 due to mapping off the end of the chromosome in the original version.
wgEncodeHaibGenotypeAstrocyRegionsRep1 - renamed to wgEncodeHaibGenotypeNhaDukeRegionsRep1 Astrocytes and NH-A are the same cell line.

In addition to the above changes, color values for the files have been corrected as well. This does not affect any data values.

This is the NCBI Build37 (hg19) release of this track. This release includes the 3 cell types previously released on NCBI Build36 (hg18) which were lifted to NCBI Build37 (hg19) and adds data for many more cell types. The track includes a single display for each cell type and reports the Log ratio in the .bed files. The B-allele frequency and SNP genotyping files are not displayed, but are available for download for the entire dataset from the downloads page.

Credits

These data were produced by the Dr. Richard Myers Lab and the Dr. Devin Absher lab at the HudsonAlpha Institute for Biotechnology.

Cells were grown by the Myers Lab and other ENCODE production groups.

Contact: Dr. Florencia Pauli.

References

Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004 Oct;5(4):557-72.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.