All analysis was done using a bisulfite sequnecing data analysis
pipeline MethPipe developed in the Smith lab at USC.
Mapping bisulfite treated reads: Bisulfite treated reads are mapped
to the genomes with the rmapbs
program (rmapbs-pe
for pair-end
reads), one of the wildcard based mappers. Input reads are filtered by
their quality, and adapter sequences in the 3' end of reads are
trimmed. Uniquely mapped reads with mismatches below given threshold
are kept. For pair-end reads, if the two mates overlap, the
overlapping part of the mate with lower quality is clipped. After
mapping, we use the program duplicate-remover
to randomly select one
from multiple reads mapped exactly to the same location.
Estimating methylation levels: After reads are mapped and filtered,
the methcounts
program is used to obtain read coverage and estimate
methylation levels at individual cytosine sites. We count the number
of methylated reads (containing C's) and the number of unmethylated
reads (containing T's) at each cytosine site. The methylation level
of that cytosine is estimated with the ratio of methylated to total
reads covering that cytosine. For cytosines within the symmetric CpG
sequence context, reads from the both strands are used to give a
single estimate.
Estimating bisulfite conversion rate: Bisulfite conversion rate is
estimated with the bsrate
program by computing the fraction of
successfully converted reads (read out as Ts) among all reads mapped
to presumably unmethylated cytosine sites, for example, spike-in
lambda DNA, chroloplast DNA or non-CpG cytosines in mammalian
genomes.
Calculating read duplicate rate: Duplicate reads
from the same library caused by PCR overamplification must be removed
prior to calling methylation levels and bisulfite conversion rate. We
report the percentage of reads thrown out of the total initially mapped,
and only throw out reads with identical start and end positions belonging
to the same library. A high value corresponds to a low complexity library.
Calculating mapping rate: This value is the weighted average percentage
of reads mapping from each SRA FASTA/FASTQ accession to the reference genome,
and is calculated before removing duplicates.
Identifying hypo-methylated regions: In most mammalian cells, the
majority of the genome has high methylation, and regions of low
methylation are typically more interesting. These are called
hypo-methylated regions (HMR). To identify the HMRs, we use the hmr
which implements a hidden Markov model (HMM) approach taking into
account both coverage and methylation level information.
Identifying hyper-methylated regions: Hyper-methylated regions
(HyperMR) are of interest in plant methylomes, invertebrate
methylomes and other methylomes showing "mosaic methylation"
pattern. We identify HyperMRs with the hmr_plant
program for those
samples showing "mosaic methylation" pattern.
Identifying partially methylated domains: Partially methylated
domains are large genomic regions showing partial methylation
observed in immortalized cell lines and cancerous cells. The pmd
program is used to identify PMDs.
Identifying allele-specific methylated regions: Allele-Specific
methylated regions refers to regions where the parental allele is
differentially methylated compared to the maternal allele. The
program allelicmeth
is used to allele-specific methylation score
can be computed for each CpG site by testing the linkage between
methylation status of adjacent reads, and the program amrfinder
is
used to identify regions with allele-specific methylation.
For more detailed description of the methods of each step, please
refer to the reference by Song et al. For instructions on how to use
MethPipe
, you may obtain the MethPipe Manual.
Credits
The raw data were produced by He X et al. The data analysis were
performed by members of the Smith lab.
Contact:
Benjamin Decato
and
Liz Ji
Terms of Use
If you use this resource, please cite us! The Smith Lab at USC has developed and is owner of all analyses and associated browser tracks from the MethBase database (e.g. tracks displayed in the "DNA Methylation" trackhub on the UCSC Genome Browser). Any derivative work or use of the MethBase resource that appears in published literature must cite the most recent publication associated with Methbase (see "References" below). Users who wish to copy the contents of MethBase in bulk into a publicly available resource must additionally have explicit permission from the Smith Lab to do so. We hope the MethBase resource can help you!
References
MethPipe and MethBase
Song Q, Decato B, Hong E, Zhou M, Fang F, Qu J, Garvin T, Kessler M, Zhou J, Smith AD (2013) A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics. PLOS ONE 8(12): e81148
Data sources
He X, Chatterjee R, Tillo D, Smith A, FitzGerald P, Vinson C Nucleosomes are enriched at the boundaries of hypomethylated regions (HMRs) in mouse dermal fibroblasts and keratinocytes. Epigenetics Chromatin. 2014 7(1):34