WGBS (mCG) Track Settings
 
Average mCG in each condition and CG-DMRs

Maximum display mode:       Reset to defaults   
Select views (Help):
mCG ▾       Differentially methylated regions      
 
mCG Configuration
Type of graph: Graph configuration help
Track height: pixels (range: 8 to 100)
Data view scaling: Always include zero: 
Vertical viewing range: min:  max:   (range: 0 to 1)
Transform function:Transform data points by: 
Windowing function: Smoothing window:  pixels
Negate values:
Draw y indicator lines:at y = 0.0:    at y =
  
Select subtracks by brain region and neun status:
 All Brain region Dorsolateral prefrontal cortex  Anterior cingulate gyrus  Hippocampus  Nucleus accumbens  Not applicable 
NeuN status
NeuN+ 
NeuN- 
Unsorted 
Not applicable 
Select subtracks further by: (select multiple categories and items - help)
Comparison:
Smoothing bandwidth:

List subtracks: only selected/visible    all    ()
  NeuN status↓1 Brain region↓2 Comparison↓3 Smoothing bandwidth↓4 views↓5   Track Name↓6  
squish
 NeuN+  Not applicable  Between brain regions in NeuN+ samples  Large  Differentially methylated regions  CG-blocks between brain regions in NeuN+ samples   Data format 
squish
 Not applicable  Not applicable  Between NeuN+ and NeuN- samples  Large  Differentially methylated regions  CG-blocks between NeuN+ and NeuN- samples   Data format 
squish
 Not applicable  Not applicable  Between NeuN+ and NeuN- samples  Small  Differentially methylated regions  CG-DMRs between NeuN+ and NeuN- samples   Data format 
squish
 NeuN+  Not applicable  Between brain regions in NeuN+ samples  Small  Differentially methylated regions  CG-DMRs between brain regions in NeuN+ samples   Data format 
squish
 NeuN-  Not applicable  Between brain regions in NeuN- samples  Small  Differentially methylated regions  CG-DMRs between brain regions in NeuN- samples   Data format 
squish
 Unsorted  Not applicable  Between brain regions in unsorted samples  Small  Differentially methylated regions  CG-DMRs between brain regions in unsorted samples   Data format 
squish
 NeuN+  Not applicable  Between NAcc (NeuN+) and BA9 (NeuN+) samples  Small  Differentially methylated regions  CG-DMRs between NAcc (NeuN+) and BA9 (NeuN+) samples   Data format 
full
 Configure
 NeuN+  Anterior cingulate gyrus  Not applicable  Large  mCG  Average mCG (large smooth) in BA24 (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Anterior cingulate gyrus  Not applicable  Large  mCG  Average mCG (large smooth) in BA24 (NeuN-) samples   Data format 
full
 Configure
 NeuN+  Dorsolateral prefrontal cortex  Not applicable  Large  mCG  Average mCG (large smooth) in BA9 (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Dorsolateral prefrontal cortex  Not applicable  Large  mCG  Average mCG (large smooth) in BA9 (NeuN-) samples   Data format 
full
 Configure
 NeuN+  Hippocampus  Not applicable  Large  mCG  Average mCG (large smooth) in HC (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Hippocampus  Not applicable  Large  mCG  Average mCG (large smooth) in HC (NeuN-) samples   Data format 
full
 Configure
 NeuN+  Nucleus accumbens  Not applicable  Large  mCG  Average mCG (large smooth) in NAcc (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Nucleus accumbens  Not applicable  Large  mCG  Average mCG (large smooth) in NAcc (NeuN-) samples   Data format 
full
 Configure
 Unsorted  Anterior cingulate gyrus  Not applicable  Small  mCG  Average mCG (small smooth) in BA24 (unsorted) samples   Data format 
full
 Configure
 NeuN+  Anterior cingulate gyrus  Not applicable  Small  mCG  Average mCG (small smooth) in BA24 (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Anterior cingulate gyrus  Not applicable  Small  mCG  Average mCG (small smooth) in BA24 (NeuN-) samples   Data format 
full
 Configure
 Unsorted  Dorsolateral prefrontal cortex  Not applicable  Small  mCG  Average mCG (small smooth) in BA9 (unsorted) samples   Data format 
full
 Configure
 NeuN+  Dorsolateral prefrontal cortex  Not applicable  Small  mCG  Average mCG (small smooth) in BA9 (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Dorsolateral prefrontal cortex  Not applicable  Small  mCG  Average mCG (small smooth) in BA9 (NeuN-) samples   Data format 
full
 Configure
 Unsorted  Hippocampus  Not applicable  Small  mCG  Average mCG (small smooth) in HC (unsorted) samples   Data format 
full
 Configure
 NeuN+  Hippocampus  Not applicable  Small  mCG  Average mCG (small smooth) in HC (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Hippocampus  Not applicable  Small  mCG  Average mCG (small smooth) in HC (NeuN-) samples   Data format 
full
 Configure
 Unsorted  Nucleus accumbens  Not applicable  Small  mCG  Average mCG (small smooth) in NAcc (unsorted) samples   Data format 
full
 Configure
 NeuN+  Nucleus accumbens  Not applicable  Small  mCG  Average mCG (small smooth) in NAcc (NeuN+) samples   Data format 
full
 Configure
 NeuN-  Nucleus accumbens  Not applicable  Small  mCG  Average mCG (small smooth) in NAcc (NeuN-) samples   Data format 
    
Assembly: Human Feb. 2009 (GRCh37/hg19)

Introduction

This document describes the whole genome bisulfite sequencing (WGBS) genome browser tracks for the project ‘Neuronal Brain Region-Specific DNA Methylation and Chromatin Accessibility are Associated with Neuropsychiatric Disease Heritability’. For an overview of the project, please see the README.

Read pre-processing, alignment, and post-processing

We trimmed reads of their adapter sequences using Trim Galore! (v0.4.0) and quality-trimmed using the following parameters: trim_galore -q 25 --paired ${READ1} ${READ2}. We then aligned these trimmed reads to the hg19 build of the human genome (including autosomes, sex chromosomes, mitochondrial sequence, and lambda phage (accession NC_001416.1) but excluding non-chromosomal sequences) using Bismark (v0.14.3) with the following alignment parameters: bismark --bowtie2 -X 1000 -1 ${READ1} -2 ${READ2}. Using the reads aligned to the lambda phage genome, we estimated that all libraries had a bisulfite conversion rate > 99%.

We then used bismark_methylation_extractor to summarize the number of reads supporting a methylated cytosine and the number of reads supported a unmethylated cytosine for every cytosine in the reference genome. Specifically, we first computed and visually inspected the M-bias of our libraries. Based on these results, we decided to ignore the first 5 bp of read1 and the first 10 bp of read2 in the subsequent call to bismark_methylation_extractor with parameters: --ignore 5 --ignore_r2 10. The final cytosine report file summarizes the methylation evidence at each cytosine in the reference genome.

Smoothing WGBS

We used BSmooth to estimate CpG methylation levels as previously described. Specifically, we ran a ‘small’ smooth to identify small DMRs (smoothing over windows of at least 1 kb containing at least 20 CpGs) and a ‘large’ smooth to identify large-scale blocks (smoothing over windows of at least 20 kb containing at least 500 CpGs). Following smoothing, we analyzed all CpGs that had a sequencing coverage of at least 1 in all samples (n = 45 for sorted data, n = 27 for unsorted data).

We also adapted BSmooth to estimate CpA and CpT methylation levels in NeuN+ samples. Unlike CpGs, CpAs and CpTs are not palindromic, so were analyzed separately for each strand, for a total of 4 strand/dinucleotide combinations:

  • mCA (forward strand)
  • mCA (reverse strand)
  • mCT (forward strand)
  • mCT (reverse strand)

For display purposes, mCA and mCT on the reverse strand are shown as negative values.

For each dinucleotide/strand combination we ran a single ‘small-ish’ smooth to identify DMRs (smoothing over windows of at least 3 kb containing at least 200 CpAs or CpTs). Following smoothing, we analyzed all CpAs and CpTs regardless of sequencing coverage.

Identification of differentially methylated regions (DMRs)

We ran separate analyses to identify 6 types of differentially methylated regions (DMRs):

  1. CG-DMRs: Using data from the ‘small’ smooth of CpG methylation levels
  2. CG-blocks: Using data from the ‘large’ smooth of CpG methylation levels
  3. CA-DMRs (forward strand): Using data from the ‘small-ish’ smooth of CpA methylation levels on the forward strand
  4. CA-DMRs (reverse strand): Using data from the ‘small-ish’ smooth of CpA methylation levels on the reverse strand
  5. CT-DMRs (forward strand): Using data from the ‘small-ish’ smooth of CpT methylation levels on the forward strand
  6. CT-DMRs (reverse strand): Using data from the ‘small-ish’ smooth of CpT methylation levels on the reverse strand

Previously, we have used BSmooth to perform pairwise (two-group) comparisons. In the present study, we had up to 8 groups to compare: 4 brain regions (BA9, BA24, HC, NAcc) and, for the sorted data, 2 cell types (NeuN+, NeuN-). Rather than running all 28 pairwise comparisons, we extended the BSmooth method to handle multi-group comparisons, which we refer to as the F-statistic method.

For the F-statistic method, we constructed a design matrix with a term for each group (e.g., BA9_neg for NeuN- cells from BA9, BA9_pos for NeuN+ cells from BA9, etc.). For each CpX (CpG, CpT, or CpA), we then fitted a linear model of the smoothed methylation levels against the design matrix. To improve standard error estimates, we thresholded the residual standard deviations at the 75% percentile and smoothed these using a running mean over windows containing 101 CpXs.

We then combined the estimated coefficients from the linear model, their estimated correlations, and the smoothed residual standard deviations to form F-statistics to summarize the evidence that methylation differs between the groups at each of the CpXs.

Next, we identified runs of CpXs where the F-statistic exceeded a cutoff and where each CpX was within a maximum distance of the next. Specifically, we used cutoffs of F = 4.62 for CG-DMRs, F = 22 for CG-blocks (following10), and F = 42 for CA-DMRs and CT-DMRs, and required that the CpXs were within 300 bp of one another for DMRs and 1000 bp of one another for blocks. For blocks, we also required that the average methylation in the block varied by at least 0.1 across the groups. These runs of CpXs formed our candidate DMRs and blocks. Each candidate DMR and block was summarized by the area under the curve formed when treating the F-statistic as a function along the genome (areaStat).

We used permutation testing to assign a measure of statistical significance to each candidate DMR/block. We randomly permuted the design matrix, effectively permuting the sample labels, and repeated the F-statistic analysis with the same cutoffs using the permuted design matrix, resulting in a set of null DMRs/blocks for each permutation. We performed 1000 such permutations. We then asked, for each candidate DMR/block, in how many permutations did we see a null DMR/block anywhere in the genome with the same or better areaStat as the candidate DMR/block; dividing this number by the total number of permutations gives a permutation P-value for each DMR/block. Since we are comparing each candidate block/DMR against anything found anywhere in the genome in the permutation set, we are also correcting for multiple testing by controlling the family-wise error rate. Those candidates DMRs/blocks with a permutation P-value ≤ 0.05 form our set of DMRs/blocks.

Annotation of differentially methylated regions

The F-statistic approach allows us to jointly use all samples for the identification of differentially methylated regions. However, it does not tell us which group(s) are hypomethylated or hypermethylated for the region. To assign such labels to our F-statistic CG-DMRs and CG-blocks, we used a post-hoc analysis for specific pairwise comparisons of interest:

  • NeuN+ vs. NeuN-
  • NeuN+ cells in NAcc vs. NeuN+ cells in BA9, BA24, and HC
  • NeuN+ cells in NAcc vs. NeuN+ cells in BA9

We identified small DMRs and blocks using the original t-statistic method of BSmooth; an F-statistic DMR was assigned a label (e.g., hypermethylated in NeuN+ and hypomethylated in NeuN-) if the corresponding t-statistic DMR or block overlapped at least 50% of the F-statistic DMR or block. This procedure does not change the coordinates of the DMR/block and means an F-statistic DMR/block may be assigned multiple labels.

No such analysis was performed for CA-DMRs or CT-DMRs due to computational costs.

Subset analyses of CG-DMRs and CG-block

We found, as expected, that the differences between NeuN+ and NeuN- samples dominated our results. To better focus on the differences between brain regions within a given cell type (NeuN+ or NeuN-), we repeated the F-statistic analysis using just the NeuN+ or NeuN- samples. We again found that one group dominated: 11,895 / 13,074 F-statistic NeuN+ DMRs were specific to NAcc. To better focus on the differences between the remaining brain regions, we repeated the analysis using just the BA9, BA24, and HC NeuN+ samples.

Credits

Questions should be directed to Peter Hickey.