Schema for T2T Encode - T2T Encode Reanalysis

Home
Genomes
Genome Browser
Tools
Mirrors
- Euro/Asia Mirrors
- Mirroring Instructions
- US Server
- European Server
- Asian Server
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Conditions of Use
- Our History
- Jobs
- Licenses
- Contact Us

field

example

description

chrom

chr1

Reference sequence chromosome or scaffold

chromStart

166184453

Start position in chromosome

chromEnd

166186583

End position in chromosome

name

macs2/ENCSR987PNT.CHM13.v2.0_peak_3583

Name of item.

score

480

Score (0-1000)

strand

+ or - for strand

field8

8.76059

Undocumented field

field9

50.08292

Undocumented field

field10

48.00477

Undocumented field

field11

1081

Undocumented field

chrom

chromStart

chromEnd

name

score

strand

field8

field9

field10

field11

chr1

166184453

166186583

macs2/ENCSR987PNT.CHM13.v2.0_peak_3583

480

8.76059

50.08292

48.00477

1081

chr1

166221476

166222060

macs2/ENCSR987PNT.CHM13.v2.0_peak_3584

341

6.66264

36.13898

34.15601

348

chr1

166442132

166443724

macs2/ENCSR987PNT.CHM13.v2.0_peak_3585

397

8.78833

41.82262

39.79898

950

chr1

166482816

166483890

macs2/ENCSR987PNT.CHM13.v2.0_peak_3586

113

4.59278

13.10051

11.35497

343

chr1

166509773

166510692

macs2/ENCSR987PNT.CHM13.v2.0_peak_3587

2.47464

4.98192

3.41118

551

chr1

166543079

166543451

macs2/ENCSR987PNT.CHM13.v2.0_peak_3588

2.52528

3.84355

2.31519

264

chr1

166570775

166574333

macs2/ENCSR987PNT.CHM13.v2.0_peak_3589

1000

12.64491

163.29765

160.74069

1008

chr1

166657131

166657649

macs2/ENCSR987PNT.CHM13.v2.0_peak_3590

3.42491

8.05474

6.40131

330

chr1

166967953

166969572

macs2/ENCSR987PNT.CHM13.v2.0_peak_3591

353

7.96931

37.37810

35.38623

1108

chr1

167012385

167016744

macs2/ENCSR987PNT.CHM13.v2.0_peak_3592

1000

25.24135

263.87695

261.03931

1248

Description

These tracks represent a reanalysis of ENCODE data against the T2T chm13 genome. All ChIP-seq experiments with pair-end data and read lengths of 100bp or greater are included.

Track types include:

Coverage pileups of mapped and filtered reads
Enrichment of mapped reads relative to a control
ChIP-seq peaks as called by MACS2
ChIP-seq peaks as called by MACS2 in GRCh38 and lifted over to chm13

Methods

Prior to mapping, reads originating from a single library were combined. Reads were mapped with Bowtie2 (v2.4.1) as paired-end with the arguments "--no-discordant --no-mixed --very-sensitive --no-unal --omit-sec-seq --xeq --reorder". Alignments were filtered using SAMtools (v1.10) using the arguments "-F 1804 -f 2 -q 2" to remove unmapped or single end mapped reads and those with a mapping quality score less than 2. PCR duplicates were identified and removed with the Picard tools "mark duplicates" command (v2.22.1) and the arguments "VALIDATION_STRINGENCY=LENIENT ASSUME_SORT_ORDER=queryname REMOVE_DUPLICATES = true".

Alignments were then filtered for the presence of unique k-mers. Specifically, for each alignment, reference sequences aligned with template ends were compared to a database of minimum unique k-mer lengths. The size of the k-mers in the k-mer filtering step are dependent on the length of the mapped reference sequence. Alignments were discarded if no unique k-mers occurred in either end of the read. The minimum unique k-mer length database was generated using scripts found here. Alignments from replicates were then pooled.

Bigwig coverage tracks were created using deepTools bamCoverage (v3.4.3) with a bin size of 1bp and default for all other parameters. Enrichment tracks were created using deepTools bamCompare with a bin size of 50bp, a pseudo-count of 1, and excluding bins with zero counts in both target and control tracks.

Peak calls were made using MACS2 (v2.2.7.1) with default parameters and estimated genome sizes 3.03e9 and 2.79e9 for chm13 and GRCh38, respectively. GRCh38 peak calls were lifted over to chm13 using the UCSC liftOver utility, the chain file created by the T2T consortium, and the parameter "-minMatch=0.2".

Credits

Data were processed by Michael Sauria at Johns Hopkins University. For inquiries, please contact us at the following address: msauria@jhu.edu

References

Gershman A, Sauria MEG, Guitart X, Vollger MR, Hook PW, Hoyt SJ, Jain M, Shumate A, Razaghi R, Koren S, Altemose N, Caldas GV, Logsdon GA, Rhie A, Eichler EE, Schatz MC, O'Neill RJ, Phillippy AM, Miga KH, Timp W. Epigenetic patterns in a complete human genome. Science. 2022 Apr;376(6588):eabj5089. doi: 10.1126/science.abj5089. Epub 2022 Apr 1. PMID: 35357915.