Schema for 1000 Genomes Trios - Thousand Genomes Project Family VCF Trios
  Database: hg38    Primary Table: tgpHG00702_SH089_CHS
VCF File Download: /gbdb/hg38/1000Genomes/trio/HG00702_SH089_CHS/HG00702_SH089_CHSTrio.chr1.vcf.gz
Format description: The fields of a Variant Call Format data line
fielddescription
chromAn identifier from the reference genome
posThe reference position, with the 1st base having position 1
idSemi-colon separated list of unique identifiers where available
refReference base(s)
altComma separated list of alternate non-reference alleles called on at least one of the samples
qualPhred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong)
filterPASS if this position has passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail
infoAdditional information encoded as a semicolon-separated series of short keys with optional comma-separated values
formatIf genotype columns are specified in header, a semicolon-separated list of of short keys starting with GT
genotypesIf genotype columns are specified in header, a tab-separated set of genotype column values; each value is a colon-separated list of values corresponding to keys in the format column

Sample Rows
 
chromposidrefaltqualfilterinfoformatgenotypes
155299.CT.PASSVT=SNP;NS=150;DP=59922;AF=0.29;EAS_AF=0.35;EUR_AF=0.14;AFR_AF=0.39;AMR_AF=0.27;SAS_AF=0.24;AN=6;AC=1GT0|00|0...
155326.TC.PASSVT=SNP;NS=150;DP=57297;AF=0.05;EAS_AF=0.1;EUR_AF=0.01;AFR_AF=0.05;AMR_AF=0.01;SAS_AF=0.04;AN=6;AC=1GT0|00|0...
166507.TA.PASSVT=SNP;NS=150;DP=5589;AF=0.45;EAS_AF=0.6;EUR_AF=0.51;AFR_AF=0.27;AMR_AF=0.37;SAS_AF=0.5;AN=6;AC=6GT1|11|1...
169511.AG.PASSEX_TARGET;VT=SNP;NS=150;DP=1105161;AF=0.87;EAS_AF=1;EUR_AF=0.94;AFR_AF=0.65;AMR_AF=0.89;SAS_AF=0.96;AN=6;AC=6GT1|11|1...
184002.GA.PASSVT=SNP;NS=150;DP=51294;AF=0.18;EAS_AF=0.22;EUR_AF=0.19;AFR_AF=0.1;AMR_AF=0.17;SAS_AF=0.24;AN=6;AC=3GT1|11|0...
184006.GA.PASSVT=SNP;NS=150;DP=51783;AF=0.09;EAS_AF=0.1;EUR_AF=0.1;AFR_AF=0.03;AMR_AF=0.07;SAS_AF=0.18;AN=6;AC=1GT0|00|0...
198929.AG.PASSVT=SNP;NS=150;DP=50799;AF=0.1;EAS_AF=0.12;EUR_AF=0.05;AFR_AF=0.15;AMR_AF=0.05;SAS_AF=0.07;AN=6;AC=1GT1|00|0...
1263722.CG.PASSVT=SNP;NS=150;DP=64761;AF=0.21;EAS_AF=0.14;EUR_AF=0.12;AFR_AF=0.44;AMR_AF=0.13;SAS_AF=0.12;AN=6;AC=1GT0|01|0...
1275654.TC.PASSVT=SNP;NS=150;DP=32163;AF=0.14;EAS_AF=0.19;EUR_AF=0.13;AFR_AF=0.1;AMR_AF=0.09;SAS_AF=0.17;AN=6;AC=2GT0|00|0...
1591356.CG.PASSVT=SNP;NS=150;DP=70578;AF=0.15;EAS_AF=0.14;EUR_AF=0.05;AFR_AF=0.25;AMR_AF=0.18;SAS_AF=0.1;AN=6;AC=2GT0|00|1...

1000 Genomes Trios (tgpTrios) Track Description
 

Description

This track shows approximately 4.5 million single nucleotide variants (SNVs) and 0.6 million short insertions/deletions (indels) from 7 different parent/child trios as produced by the International Genome Sample Resource (IGSR), from sequence data generated by the 1000 Genomes Project in its Phase 3 sequencing of 2,504 genomes from 16 populations worldwide.

Variants were called on the autosomes (chromosomes 1 through 22) and on the Pseudo-Autosomal Regions (PARs) of chromosome X. Therefore this track has no annotations on alternate haplotype sequences, fix patches, chromosome Y, or the non-PAR portion (the majority) of chromosome X.

The variant genotypes have been phased (i.e., the two alleles of each diploid genotype have been assigned to two haplotypes, one inherited from each parent). This information allows us to illustrate which haplotypes in the child have been inherited from which parent.

Trios from six different populations are available, including:

  • YRI - Yoruban from Idaban, Nigeria
  • KHV - Kinh in Ho Chi Minh City, Vietnam
  • PUR - Puerto Ricans from Puerto Rico
  • CEU - CEPH Utah
  • CHS - Southern Han Chinese
  • MXL - Mexican Ancestry from Los Angeles

Display Conventions and Configuration

This track illustrates the vcfPhasedTrio track type, where two lines, one for each chromosome in the diploid genome, is drawn per sample in the underlying VCF. Variants in the window are then drawn on the haplotype line corresponding to which haplotype they belong to, such that variants on the same line were likely inherited together. The sorting routine is the same as what is used to draw the haplotype sorted display in the non-trio 1000 Genomes track, and is described here.

The child haplotypes are drawn in the center of each group, flanked above and below by parent haplotypes, and variants are sorted to show the transmitted alleles:

parent 1 untransmitted haploytpe 
parent 1 transmitted haplotype
child haplotype inherited from parent 1
child haplotype inherited from parent 2
parent 2 transmitted haplotype
parent 2 untransmitted haploytpe 

Track configuration options include:

  • Showing the child haplotypes below the parent(s)
  • Toggling the haplotype labels with mother/father/child or VCF sample IDs
  • Hiding the parent samples

Allele coloring options include:

  • No shading - the default option
  • Shading by functional effect of the variant relative to NCBI RefSeq Curated Transcripts:
    • reference alleles invisible
    • alternate alleles in red for non-synonymous
    • alternate alleles in green for synonymous
    • alternate alleles in blue for UTR/noncoding
    • alternate alleles in black otherwise
  • Child de novo alleles in red - all alternate alleles black except for cases where the child has an allele not present in either parent
  • Child alleles that are "inconsistent" with phasing in red - all alternate alleles black except for cases where the "inherited" child allele does not match the "transmitted" parent allele. Note that as the genomic location changes, and thus the alleles present to use for sorting change, whether an allele is marked as inconsistent can change as well. Because all the variants present in the window are considered a haplotype, what haplotypes are considered "inherited" and "transmitted" varies as the viewing location changes

From the subtrack configure menu, there is the option to manually rearrange the family order for each trio by dragging haplotypes.

Clicking on a variant takes one to a details page with the standard VCF details, including INFO column annotations, the REF and ALT alleles, and the genotypes from all three samples.

Methods

The genomes of 2,504 individuals were sequenced using both whole-genome sequencing (mean depth = 7.4x) and targeted exome sequencing (mean depth = 65.7x). Sequence reads were aligned to the reference genome using alt-aware BWA-MEM (Zheng-Bradley et al.). Variant discovery and quality control were performed as described in Lowy-Gallego et al.

See also:

UCSC Methods

Trio samples were extracted out of both the main 1000 Genomes set, and the related samples using the pedigree information from 1000 Genomes. Variants that were homozygous reference across all three samples were removed.

Data Access

Trio VCFs are available for download from our download server.

Credits

Thanks to the International Genome Sample Resource (IGSR) for making these variant calls freely available.

References

Zheng-Bradley X, Streeter I, Fairley S, Richardson D, Clarke L, Flicek P, 1000 Genomes Project Consortium. Alignment of 1000 Genomes Project reads to reference assembly GRCh38. Gigascience. 2017 Jul 1;6(7):1-8. PMID: 28531267; PMC: PMC5522380

Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res. 2019 Oct 4. PMID: 31584097

Lowy-Gallego E, Fairley S, Zheng-Bradley X, Ruffier M, Clarke L, Flicek P, 1000 Genomes Project Consortium. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project [version 1; peer review: 2 not approved]. Wellcome Open Research. 2019 Mar. 11.

1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA et al. A global reference for human genetic variation. Nature. 2015 Oct 1;526(7571):68-74. PMID: 26432245