Assembly: Human Dec. 2013 (GRCh38/hg38) Data last updated at UCSC: 2020-02-04 09:14:32
Track of human polytracts
Description
This tract displays
polytracts in the human reference genome.
A polytract is defined as a tract of tandem mono-nucleotide, di-nucleotide, or
tri-nucleotide repeats, termed MNR, DNR, and TNR, respectively. Each MNR and
DNR span at least six units, and each TNR spans at least three units. Incomplete
terminal motif is included in a polytract (so that the length of a DNR or TNR
is not necessarily multiple of 2 or 3).
Display Conventions and Configuration
Four shades of grayness are used to distinguish four classes
of data, including three clades of polytracts (MNR, DNR, and TNR) and the polytract
hinge.
- Mono-Nucleotide
Repeat (MNR)
-
Di-Nucleotide Repeat (DNR)
-
Tri-Nucleotide Repeat (TNR)
- polytract break-point (hinge)
Methods
A polytract is defined as a tract of mono-nucleotide,
di-nucleotide, or tri-nucleotide tandemly repeated motifs, with possibly
incomplete terminal motif included, where the minimum number of repeated units
are 6 for MNR and 3 for DNR and TNR. Polytracts were identified through a
string matching between a pattern and each chromosome sequence, with assistance
from R packages “stringr” (https://CRAN.R-project.org/package=stringr)
and “BSgenome.Hsapiens.UCSC.hg38” (www.bioconductor.org; DOI: 10.18129/B9.bioc.BSgenome). Accommodating
the complementarity between a purine and pyrimidine pair (A:T and G:C), MNRs comprise
A/T and G/C species, DNRs comprise TA,
CT/GA, CA/GT, and GC species,
and TNRs comprise AAC, AAG, AAT, ACC, GAC,
ACT, CAG, AGG, ATC, and CGG species.
That is, our resultant polytract dataset consisted of two types of MNRs, four
types of DNRs, and ten types of TNRs (Table 1).
Table 1. Summary
statistics of three clades of polytracts in the human reference genome (HG38).
polytract
species
Tract
number
Tract
volume (nt)
Genome
occupancy
Mean
length
Median
length
A/T
7,119,220
55,290,931
1.79%
7.8
6
C/G
610,474
3,839,875
0.12%
6.3
6
MNR
total
7,729,694
59,130,806
1.91%
7.6
6
TA
2,764,278
19,948,282
0.65%
7.2
6
CT/GA
3,679,922
25,030,351
0.81%
6.8
6
CA/GT
3,371,036
25,125,048
0.81%
7.5
6
GC
71,160
476,554
0.02%
6.7
6
DNR
total
9,886,396
70,580,235
2.29%
7.1
6
AAT
353,551
3,828,512
0.12%
10.8
9
ACC
238,068
2,302,369
0.07%
9.7
9
AAG
183,579
1,859,914
0.06%
10.1
9
AGG
183,232
1,856,718
0.06%
10.1
9
AAC
146,444
1,720,479
0.06%
11.7
10
CAG
131,235
1,290,213
0.04%
9.8
9
ATC
123,051
1,223,622
0.04%
9.9
9
ACT
36,083
352,774
0.01%
9.8
9
CGG
21,508
238,039
0.01%
11.1
10
GAC
1,396
13,915
0.00%
10.0
9
TNR
total
1,418,147
14,686,555
0.48%
10.4
9
Credits
Data were generated and processed in Guo Bioinformatics Lab
at UNM Comprehensive Cancer Center. For inquiries, please contact Dr. Hui Yu (huiyu1@salud.unm.edu).
References
Yu H, Zhao S, Ness S, Kang H, Sheng Q, Samuels DC, Oyebamiji
O, Guo Y.Non-canonical RNA-DNA differences and
other human genomic features are associated with very short tandem repeats. PLoS Comp Biol. 2020 In revision.