Published online October 13, 2005
5868–5877 Nucleic Acids Research, 2005, Vol. 33, No. 18
doi:10.1093/nar/gki901
Reduced representation bisulfite sequencing
for comparative high-resolution DNA
methylation analysis
Alexander Meissner1, Andreas Gnirke2, George W. Bell1, Bernard Ramsahoye3,
Eric S. Lander1,2 and Rudolf Jaenisch1,*
1
Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology, Nine Cambridge Center,
Cambridge, MA 02142, USA, 2Broad Institute of MIT and Harvard, Cambridge, MA, USA and
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
3
John Hughes Bennett Laboratory, Division of Oncology, School of Molecular and Clinical Medicine,
University of Edinburgh, Western General Hospital, Edinburgh, UK
Received August 23, 2005; Revised and Accepted September 28, 2005
ABSTRACT INTRODUCTION
We describe a large-scale random approach termed DNA methylation is a key epigenetic modification that pro-
vides heritable information not encoded in the nucleotide
reduced representation bisulfite sequencing (RRBS)
sequence. 5-Methylcytosine is the only known covalent modi-
for analyzing and comparing genomic methylation fication of DNA in vertebrates (1). Mammalian DNA methyla-
patterns. BglII restriction fragments were size- tion serves a wide range of functions including regulation
selected to 500–600 bp, equipped with adapters, of gene expression, genomic imprinting and X-chromosome
treated with bisulfite, PCR amplified, cloned and inactivation. It contributes to genomic stability and serves as
sequenced. We constructed RRBS libraries from a defense mechanism against transposable elements (2–5).
murine ES cells and from ES cells lacking DNA In addition, its role in disease states such as cancer becomes
methyltransferases Dnmt3a and 3b and with increasingly evident (6–10).
knocked-down (kd) levels of Dnmt1 (Dnmt [1kd,3a/, Three catalytically active DNA methyltransferases (Dnmts)
3b/]). Sequencing of 960 RRBS clones from have been described that are responsible for establishing and
Dnmt[1kd,3a/,3b/] cells generated 343 kb of non- maintaining methylation patterns in mammals (4,11–13).
Dnmt1 has been largely viewed as the maintenance enzyme,
redundant bisulfite sequence covering 66 212 cyto-
owing to its preference for hemimethylated DNA (2). Dnmt3a
sines in the genome. All but 38 cytosines had been and Dnmt3b have no preference and are required for de novo
converted to uracil indicating a conversion rate of methylation activity (14). During murine preimplantation
.99.9%. Of the remaining cytosines 35 were found development methylation levels decrease with some notable
in CpG and 3 in CpT dinucleotides. Non-CpG methyla- exceptions including imprinted genes and IAP elements
tion was .250-fold reduced compared with wild-type (3,15). Around the time of implantation normal methylation
ES cells, consistent with a role for Dnmt3a and/or levels are restored by the de novo methyltransferases and later
Dnmt3b in CpA and CpT methylation. Closer inspec- maintained by Dnmt1.
tion revealed neither a consensus sequence around Targeted gene disruption for each of the catalytically active
the methylated sites nor evidence for clustering of Dnmts (1, 3a and 3b) results in a lethal phenotype demonstrat-
residual methylation in the genome. Our findings ing the essential role of DNA methylation in development
(11,13). Interestingly, undifferentiated ES cells deficient for
indicate random loss rather than specific mainten-
Dnmt1, Dnmt3a, Dnmt3b or Dnmt3a/3b do not display any
ance of methylation in Dnmt [1kd,3a/,3b/] cells. obvious abnormalities (13,16). Normally in wild-type ES cells
Near-complete bisulfite conversion and largely most CpG dinucleotides are methylated with the exception of
unbiased representation of RRBS libraries suggest many CpG islands.
that random shotgun bisulfite sequencing can be The intense interest in the biological functions of DNA
scaled to a genome-wide approach. methylation and its role in diseases have led to numerous
*To whom correspondence should be addressed. Tel: +1 617 258 5186; Fax: +1 617 258 6505; Email:
[email protected]
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
The Author 2005. Published by Oxford University Press. All rights reserved.
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access
version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press
are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but
only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact
[email protected]
Nucleic Acids Research, 2005, Vol. 33, No. 18 5869
techniques to detect and compare DNA methylation [reviewed fragments as measured by PicoGreen fluorescence (Invitro-
in (8,17)]. Global methods such as nearest neighbor analysis gen). The size-selected BglII fragments (1–2 pmol) were
(NNA) and high-performance liquid chromatography are valu- ligated to 700 pmol BglII adapter pre-annealed from
able to quantify the total 5-methylcytosine content of a DNA oligodeoxynucleotides 50 -AGTTATTCCGGACTGTCGAA-
sample, but information on the position in the genome cannot GCTGAATGCCATGG-30 and 50 -pGATCCCATGGCAT-
be gained (18,19). TCAGCTTCGACAGTCCGGAAT-30 in 70 ml containing
Digestion with methylation-sensitive (or methylation- 2400 U T4 DNA ligase (New England Biolabs) for 16 h
dependent) restriction enzymes (MSREs) has been used to at 14 C. Excess adapter was removed by ultrafiltration
selectively enrich the methylated and unmethylated DNA frac- (Millipore Montage) followed by preparative electrophoresis
tions, respectively (20–24). Similarly, methylation-dependent in 2% agarose and electroelution, yielding 50–100 ng of
restriction in a cloning host has been employed as a filter adapter-ligated material.
against methylation-rich sequences in clone libraries (25). Adapter-ligated, size-selected BglII fragments (50 ng) were
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
Another, more recent genome-wide approach used immuno- bisulfite-treated using the reagents and protocol of the CpGen-
preciptation with a methyl cytosine antibody rather than ome DNA modification kit (Chemicon) with the following
restriction digestion to enrich for the methylated fraction modifications: the DNA was alkali-denatured for 20 min at
(26). The enriched genome fractions are analyzed by sequen- 55 C; the total reaction volume was increased from 650 to
cing or by array-hybridization (20,21,26). MSRE-based meth- 750 ml and contained 0.22 g urea (31); and the mixture was
ods are somewhat indirect in that they discriminate for or incubated for 24 h at 55 C. After alkaline desulfonation and
against methylation at the recognition site of the particular final desalting, single-stranded uracil-containing reaction
enzyme used and cannot directly reveal the methylation status products were eluted in 40 ml of TE buffer and
of cytosines or CpG dinucleotides outside the restriction site. converted to double-stranded DNA by PCR with primers
In contrast, methylation-sensitive chemical reactions have 50 -TTGGATTGTTGAAGTTGAATG-30 and 50 -AAACTAT-
no specific recognition sequence. Sodium bisulfite efficiently CAAAACTAAATACCATAAAATC-30 designed to amplify
deaminates unmethylated cytosine to uracil without affecting molecules carrying bisulfite-modified adapter sequences at
5-methyl cytosine. In recent years, PCR amplification and both ends. For each bisulfite reaction, eight 50 ml PCRs
sequencing of bisulfite-converted genomic DNA has emerged were performed, each containing 2.5 ml bisulfite-treated
as the gold standard for analyzing and comparing methylation DNA, 25 pmol of each PCR primer and 2.5 U PfuTurboCx
patterns at specific loci (27). Hotstart DNA polymerase (Stratagene). Thermocycling
Despite these technological advances, in the absence of included eight cycles of ‘touchdown’ (32) at annealing tem-
systematic sequence-based methylation analyses, the genomic peratures from 55 to 52 C (two cycles at each temperature)
methylation landscape in mammals is still largely unexplored. followed by 10 cycles at an annealing temperature of 51 C.
Therefore, the potential diagnostic value of specific methyla- Denaturation (94 C), annealing and extension (72 C) times
tion differences remains largely untapped. were 10 s, 30 s and 3 min, respectively. PCR products were
The human epigenome project (HEP) is aimed at generating cleaned-up by ultrafiltration followed by preparative electro-
a high-resolution DNA methylation map of the human genome phoresis on a 2% agarose gel. Typical yields were between 50
(28,29). To achieve this goal the bisulfite sequencing tech- and 100 ng for each library. Gel-purified PCR product (4 ng)
nique has been scaled-up in a targeted fashion using locus- was incubated for 5 min with 1 ml pCR BluntII TOPO vector
specific PCR primers. and cloned by electroporation of Escherichia coli TOP10
Here we describe a random approach for large-scale high- (Invitrogen). The cloning efficiency was 2000 colonies
resolution DNA methylation analysis termed reduced repres- per ng of PCR product.
entation bisulfite sequencing (RRBS). To test the feasibility of Plasmid DNA was isolated by standard protocols, and
the method, we compared wild-type ES cells and ES cells cloned inserts were sequenced using 2.7 pmol M13 reverse
deficient for Dnmt1, Dnmt3a and Dnmt3b. Our data suggest primer and 2 ml BigDye3.1 mix (Applied Biosystems) in 10 ml
that RRBS provides high-quality data suitable for future large- sequencing reactions (25 cycles). Caused by preferential clon-
scale comparative epigenetic studies of DNA methylation in ing in one orientation, 80% of the sequences were the G-poor
a given cell type or tissue. In addition our sequencing data strand. Most inserts that had been cloned in the other orienta-
confirm and complement previous studies on the role of DNA tion (C-poor strand) sequenced poorly, with peak heights and
methyltransferases in murine ES cells. sequence quality suddenly dropping after 300–400 bases.
Data analysis
METHODS
In silico digestion of the mouse genome (NCBI Build 33, May
RRBS library construction and sequencing 2004) was performed at BglII sites, followed by selection of
Mouse ES DNA (50–100 mg) was digested to completion by fragments ranging from 440 to 640 bases. Cytosines were
overnight incubation with 1000 U of BglII and electrophoresed converted to thymine, with upper/lower case used to differ-
on a 1.8% agarose gel. Marker lanes were stained with SYBR entiate converted from original thymines. Each strand was
Green (Invitrogen). A narrow slice containing the 500–600 bp converted separately. Sequencing reads were mapped to the
fraction was excised from the unstained preparative portion of genome by using NCBI BLAST (without query filtering) to
the gel. DNA was recovered by electroelution, phenol extrac- search the database of size-selected and converted BglII frag-
tion and ethanol precipitation as described elsewhere (30). ments. The top BLAST hit determined the most probable
Typical yields were 300–600 ng of size-selected BglII genome location of each read and also permitted identification
5870 Nucleic Acids Research, 2005, Vol. 33, No. 18
of original and converted cytosines over the high-scoring
segment pair length. The repeat content of the in silico reduced
representation and the sets of sequencing reads were compared
with that of the whole genome using RepeatMasker (http://
www.repeatmasker.org). Locations of all sequence reads
relative to selected genomic landmarks were determined by
comparing fragment coordinates to those of the RefSeq
and Ensembl transcript sets and CpG islands from UCSC.
The expected number of redundant RRBS sequences and the
sequence overlap between two DNA samples were calculated
by composite Poisson statistics in 5 bp bins across the range of
insert sizes. Di is the number of BglII fragments in the refer-
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
ence genome that fall into bin i. Nai is the number of successful
sequencing
P 2 reads from DNA sample a that fall into bin i.
N ai =2Di double-hits in sample a are expected by random
sampling (33). The expected number of BglII fragments
sequenced
P at least
once in sample
a and in sample b is
1 eNai =Di · 1 eNbi =Di · Di :
ES cell manipulation
Lentiviral infections of ES cells were performed as described
previously (34). ES cells were cultivated on irradiated mouse
embryonic fibroblasts in DME containing 15% fetal calf
serum, leukemia inhibiting factor, penicillin/streptomycin,
L-glutamine and non-essential amino acids. All ES cells Figure 1. Reduced representation bisulfite sequencing. Genomic DNA is
were depleted of feeder cells for two passages on 0.2% digested to completion using a restriction enzyme (here BglII). After
size-selection an adapter is added. The DNA is denatured, and unmethylated
gelatine before isolating DNA. cytosines are bisulfite-converted to uracil. The two resulting C-poor strands
are no longer complementary to each other. Primers specific for the converted
Southern blot and methylation analysis adapter sequence are used to fill-in the second (G-poor) strand and for PCR
amplification. PCR products are cloned and sequenced. Sequences generated
To assess the levels of DNA methylation, genomic DNA was from RRBS libraries are projected onto the genome by searching against a
digested with HpaII, and hybridized to pMR150 as a probe for reduced representation database of BglII fragments that had been
the minor satellite repeats (35), or with an IAP-probe (36). For size-selected and bisulfite-converted in silico.
the methylation status of imprinted genes, a combined bisulfite
restriction analysis (COBRA) assay was performed with the were used to synthesize the second strand and to PCR amplify
CpGenome DNA modification kit (Chemicon) using PCR pri- the bisulfite-converted material. Blunt-end PCR products were
mers and conditions described previously (37). PCR products cloned in a plasmid vector and sequenced (Figure 1).
were gel purified, digested with BstUI or HpyCH4 IV and For analysis of the bisulfite sequences and to identify the
resolved on a 2% agarose gel. NNA was done as described corresponding genomic sequence we searched RRBS reads
previously (19). against a reduced representation database of the mouse gen-
ome that contained both strands of BglII fragments that had
been size-selected and bisulfite-converted in silico. When
RESULTS aligned to the original genome sequence, a 5-methyl-
cytosine is thus displayed as a matching C in the bisulfite
Reduced representation bisulfite sequencing sequence, and C to T transitions indicate unmethylated
RRBS is analogous to the reduced representation shotgun cytosines.
sequencing used for single nucleotide polymorphism (SNP) Even though bisulfite sequencing is a widespread technique,
discovery (33). The method is based on size selection of some concerns persist. Since bisulfite converts single-stranded
restriction fragments to generate a ’reduced representation’ but not double-stranded DNA, incomplete denaturation or re-
of the genome of a strain, tissue or cell type. annealing leads to incomplete conversion. This complicates
For this study, we digested genomic DNA with BglII and the data analysis, as it is not always possible to determine
purified fragments between 500 and 600 bp in size on an whether an unconverted cytosine represents bona fide
agarose gel. Based on the available mouse genome sequence, methylation or an experimental artifact.
BglII digestion is expected to generate 21 939 BglII fragments Another potential problem is depurination, strand breakage
in this size range comprising 12 Mb (0.5%) of the genome. and DNA degradation caused by the harsh reaction conditions,
Size-selected BglII fragments were equipped with end which lower the yield of full-length BglII fragments signific-
adapters, denatured and treated with bisulfite to convert all antly. It has been estimated that >90% of the input DNA is lost
unmethylated cytosines to uracil. Bisulfite-converted DNA to DNA degradation during the first hour of a bisulfite reaction
remains single-stranded as the two strands are no longer com- (38). However, to maximize the conversion rate, the reaction is
plementary. Primers specific for the converted adapter usually carried out overnight, necessitating extensive PCR
sequence and a proofreading thermostable DNA polymerase amplification before cloning or sequencing to compensate
Nucleic Acids Research, 2005, Vol. 33, No. 18 5871
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
Figure 2. Generation of Dnmt1, Dnmt3a and Dnmt3b deficient ES cells. (A) Dnmt3a/3b homozygous double knockout ES cells have been described earlier (43). The
knockdown virus is expressing a Dnmt1 shRNA, whereas the control is not. The infection was termed Passage 0. After the infection ES cells were passaged four times
on feeders followed by two additional passages under feeder-free conditions (Passage 6). Number of viral integrations were determined by Southern blotting and
clones with single integration were selected (data not shown). (B) Western blot analysis. The status of the different Dnmts is indicated above. The knockdown ES cells
showed a significant reduction in Dnmt1 levels compared with their sister clone. c/c is a previously reported Dnmt1 null ES line (16).
for the inevitable loss of DNA. Moreover, since most that was largely free of methylation. To this end we generated
proofreading enzymes stall at uracil residues in the template ES cells deficient in all three major DNA methyltransferases.
strand, non-proofreading Taq polymerase is usually prescribed
for second-strand synthesis and PCR amplification which can
ES cells deficient for Dnmt1, Dnmt3a and Dnmt3b
lead to PCR-induced sequencing errors.
These limitations are less worrisome for single-copy loci, We combined knockouts for the de novo Dnmts (Dnmt3a and
but could be significant in a genome-wide setting, where no Dnmt3b) with RNAi-induced knockdown of Dnmt1
preselection against fast-re-annealing repetitive sequences is (Figure 2A) using a lentivirus-based system for stable short
made and where amplification bias and skewed sequence hairpin RNA (shRNA) expression (34). The Dnmt1 knock-
representation creates serious sampling problems. down resulted in a significant albeit not complete loss of
Indeed, our preliminary attempts were plagued by DNA Dnmt1 protein compared with the Dnmt[3a/,3b/] control
degradation, incomplete conversion and poor efficiency of cells (Figure 2B).
PCR amplification, most probably caused by the re- To determine whether the decrease in Dnmt1 levels led to
annealing of repetitive sequences including the common efficient demethylation, we analyzed the methylation status of
adapter sequence at the ends of each DNA molecule. More- minor satellite repeats and IAP elements in a number of
over, certain sequences were clearly overrepresented in the control and knockdown ES cell lines by MSRE analysis.
resulting libraries indicating amplification bias during the Significant repeat demethylation was observed when Dnmt1
PCR. These initial problems were largely remedied by per- was knocked down, and the methylation levels in the
forming the bisulfite reaction in the presence of urea as sug- Dnmt[1kd,3a/,3b/] ES cells closely resembled the digest
gested by Paulin et al. (31) and by fine-tuning experimental of genomic DNA with MspI which cuts irrespective of the
parameters such as DNA concentration, time and temperature methylation status (Figure 3A and B). Loss of methylation at
of the bisulfite reaction, and number of PCR cycles for the these repeat elements appears to be primarily caused by the
double-strand rescue and amplification by a proofreading ther- lack of Dnmt1 and largely independent of the de novo Dnmts
mostable DNA polymerase engineered to accept uracil in the at these early passages.
template strand (39). Using a COBRA assay (40) we observed loss of imprinting
To test if our optimized protocol was sufficient to achieve at four imprinted genes following Dnmt1 knockdown as com-
complete genome-wide bisulfite conversion without com- pared with the controls (Figure 3C). Taken together, these
promising library complexity and representation, we wished experiments showed that Dnmt1 knockdown resulted in
to construct and sequence RRBS libraries from genomic DNA significant loss of methylation at specific genes and repeat
elements.
5872 Nucleic Acids Research, 2005, Vol. 33, No. 18
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
Figure 3. Methylation status of the Dnmt-deficient ES cells. All knockdown and control ES cells were analyzed at Passage 6 after infection. (A) Minor satellite repeat
methylation. HpaII digests of genomic DNA were hybridized to minor satellite probe pMR150. MspI is an isoschizomere of HpaII and cuts irrespective of the
methylation status (i.e. appearance of a ladder in HpaII lane indicates loss of methylation). The status of the different Dnmts is shown above the Southern blot. All
knockdown and control ES cell lines were generated as described in Figure 2. Each knockdown line contains a single lentiviral integration (data not shown). (B) IAP
methylation. HpaII-digested genomic DNA was hybridized to an IAP probe. (C) COBRA analysis for imprinted genes. Genomic DNA was bisulfite treated and after
PCR amplification of H19, Snrpn, Peg1 and Peg3 a restriction digest was performed to analyze the methylation status of the differentially methylated regions
(U ¼ unmethylated, M ¼ methylated). The second (smaller) fragment of the methylated and digest product is not shown. (D) Total mCpG quantification by NNA.
The spots corresponding to CpG and mCpG are indicated in the upper left panel. The per cent mCpG/(CpG+mCpG) are displayed in each panel (estimated error 5%).
To better quantify these results, we used NNA, which pared with 22% in the Dnmt1 null ES cells and 75% in
allowed to determine the global amounts of CpG methylation wild-type ES cells (Figure 3D). Dnmt3b heterozygous and
in wild-type and mutant ES cells. We detected 2% residual homozygous ES cells displayed wild-type methylation
CpG methylation in the Dnmt[1kd,3a/,3b/] cells com- levels in the presence of Dnmt1 and showed similar loss of
Nucleic Acids Research, 2005, Vol. 33, No. 18 5873
methylation within six passages of Dnmt1 knockdown were more pronounced in the Dnmt[1kd,3a/,3b/] library
(Figure 3D and data not shown) confirming the potency of which has an extremely asymmetric base distribution. Of the
the shRNA. sequenced inserts 96% from this library consisted solely of
To test the RRBS approach and to determine whether spe- three bases, i.e. either A, G and T or A, C and T owing to
cific sequences were retaining methylation we generated and complete absence of methylated cytosine in the corresponding
sequenced BglII RRBS libraries from wild-type and Dnmt- genome loci. Directional cloning and sequencing bias has been
deficient ES cells. observed before with bisulfite-treated DNA (38) and is there-
fore not a RRBS specific phenomenon.
Sequencing of RRBS libraries Of the complete RRBS reads from Dnmt[1kd,3a/,3b/]
cells (89%) found a near-perfect match in the reduced repres-
In preliminary experiments we noticed that sequencing RRBS entation reference-sequence database and could be placed with
clones with reverse primer had a significantly higher success high confidence on the mouse genome. The rate of genome
rate and produced longer reads on average than sequencing
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
alignments for sequences from wild-type ES was slightly
with forward primer. We therefore sequenced the RRBS higher (94%). Overall, the success rate of full-length, mapped
clones single-pass using reverse primer. Only clones with bisulfite sequence was 72% of all clones picked. A schematic
high-quality sequence across the entire length of the insert of the distribution of RRBS sequences along the mouse chro-
were used for the final methylation analysis. Table 1 summar- mosomes is available in the Supplementary Data (Supplement-
izes the sequencing statistics from 960 RRBS clones from ary Figures S1 and S2). In addition we have developed a
Dnmt-deficient cells and 192 clones from wild-type ES cells. genome browser that allows a more comprehensive view of
Although blunt-ended PCR products can insert in either the genomic environment of the RRBS libraries and the
orientation into the cloning vector, only a minority had inserts data generated (for a sample screenshot see Supplementary
in the orientation that resulted in the C-poor sequence, i.e. Figure 3).
the strand that has been modified by bisulfite (153 out of 876 Fifty-six loci were hit by more than one RRBS sequence
RRBS reads from the Dnmt[1kd,3a/,3b/] library). The from the Dnmt[1kd,3a/,3b/] library. Ten of these poten-
vast majority of the clones produced the complementary tially represent sequences that occur more than once in the
G-poor reads. Notably, the sequence quality was also signi- genome. The remaining 46 appear to be unique loci that have
ficantly different for the two orientations. Almost all G-poor indeed been cloned and sequenced twice. This is more than the
reads were high-quality across the entire insert whereas peak 23 double-hits expected by random sampling of an ideal lib-
heights and quality of many C-poor reads dropped after a few rary, possibly indicating a slight cloning or sequencing bias.
hundred bases, leaving relatively few complete C-poor Consistent with random cloning, the much smaller number of
sequences for the methylation analysis. Preferential cloning wild-type RRBS sequences produced only one double-hit.
in one orientation and high drop-out rate for C-poor strands Eleven fragments were sequenced in both cell lines, compared
with eight sequence overlaps expected given the number and
Table 1. Sequencing and methylation statistics size distribution of successful reads from each library
(Figure 4). The total length of non-redundant and mapped
ES cell line Dnmt[1kd, wild-type RRBS sequences was 342 556 bp for Dnmt[1kd,3a/,
3a/,3b/]
3b/] and 80 692 bp for wild-type ES cells.
Colonies picked 960 192 To determine whether these RRBS libraries were generally
Bisulfite sequencing readsa 876 186 representative we compared the GC content, the representation
Insert in plus orientationb 153 50
Plus read completec 38 23
Insert in minus orientationd 723 136
Minus read completec 719 134
Complete bisulfite sequencing readsc 757 157
Genome hits 676 148
Non-redundant genome hitse 609 147
Total bp of non-redundant genome hits 342 556 80 692
Cytosines in aligned genome sequence 66 212 15 296
5-Methylcytosine (mC) 38 (0.06%) 707 (4.6%)
CpG in aligned genome sequence 3458 594
mCpG 35 (1.0%) 533 (90%)
CpA in aligned genome sequence 23 046 5601
mCpA 0 (0%) 135 (2.4%)
CpT in aligned genome sequence 25 505 5924
mCpT 3 (0.01%) 39 (0.7%)
CpC in aligned genome sequence 14 203 3177
mCpC 0 (0%) 0 (0%)
a
Excludes growth failures, sequencing failures, mixed clones, vector-only
clones and a total of nine reads that showed no bisulfite conversion at all.
b
Sequenced strand is the bisulfite-converted C-poor strand.
c
High-quality sequence across entire length of BglII fragment.
d
Sequenced strand is the G-poor complementary strand of the bisulfite- Figure 4. Size distributions of the sequenced clones from each library. RRBS
converted strand. reads from wild-type ES cells (black) had a mean of 553 bp and an SD of 17 bp.
e
Includes sequences that are duplicated in the genome. BglII fragments that were Dnmt[1kd,3a/,3b/] reads were (570 ± 20) bp in size (grey bars). The size
hit more than once were counted only once. distributions of the two libraries were overlapping but not identical.
5874 Nucleic Acids Research, 2005, Vol. 33, No. 18
of CpG islands, transcripts, promoter regions and different ple the genome more evenly and increase the complexity of the
classes of repeat elements between the entire mouse genome RRBS libraries.
(41), the 500–600 bp BglII fraction thereof and the genome
sequences hit by the RRBS clones (Table 2). While reducing Comparison of wild-type and Dnmt-deficient ES cells
the representation introduced a noticeable bias, in particular a
reduction of repeats, bisulfite conversion, PCR amplification, The RRBS sequences revealed the methylation status of
cloning and sequencing did not. The GC content of loci cov- 66 212 cytosines in Dnmt[1kd,3a/,3b/] ES cells (Table 1,
ered by RRBS sequences ranged from 32 to 63%, indicating bottom half). Only 38 of these were inferred to be methylated,
satisfactory performance of our protocol over a wide range of 35 of them in CpG and three in CpT dinucleotide context.
GC content. Likewise, the distribution of the sequenced clones Considering the non-random distribution of mC among the
in the genome did not show conspicuous hot or cold spots four dinucleotides, it unlikely that all of them were caused
(Supplementary Figures S1 and S2). Taken together, our data by incomplete bisulfite conversion or PCR or sequencing
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
suggest that RRBS libraries are sufficiently random and rep- errors. Moreover, the 35 mCpGs are 1% of all bisulfite-
resentative of the genome fraction used to make them. sequenced CpGs, which is close to the 2% mCpG level
Reducing the complexity by size fractionation of a limit determined by NNA (Figure 3D). By comparison, 90% of
digest with BglII (recognition site AGATCT) is expected to CpGs were methylated in wild-type ES cells. We also
bias somewhat against GC-rich regions of the genome. Pool- observed a considerable difference in the level of non-CpG
ing two single digests with compatible enzymes such as BglII methylation [(mCpA+mCpT)/C], which was >250-fold
and BamHI (GGATCC) before the size selection would sam- reduced in the Dnmt-deficient ES cells.
In the Dnmt[1kd,3a/,3b/] RRBS sequences, 25 020
bases were covered 2- or 3-fold, comprising 4669 cytosines
Table 2. Fraction (in per cent) of various types of sequences in the mouse including 217 CpGs. Overlapping RRBS sequences agreed for
reference genome, the 500–600 bp BglII reduced representation thereof (RR most loci. In two cases, only one sequenced Dnmt[1kd,3a/,
genome) and RRBS sequences from Dnmt-deficient and wild-type ES cells 3b/] clone displayed a methylcytosine. At another discord-
ant site, the two reads agreed at one mCpG but disagreed at
Genomea RR genome RRBS Dnmt RRBS wt
another.
GC content 42.0 41.5 43.7 43.1 To address the issue of heterogeneity, we selected 10 loci
CpG islandsb 0.4 0.1 0.1 0.0 with mCpGs and 10 loci without methylation and designed
ENSEMBL genes 34.3c 35.0c 41.9d 35.3d specific PCR primers to bisulfite re-sequence them in a tar-
Promoter 5.0e 5.0e 7.0f 4.7f
SINEs 8.2 2.7 2.6 2.3 geted fashion. Multiple clones were sequenced for each locus
LINEs 19.2 10.2 10.7 11.3 in wild-type, Dnmt[3a/,3b/] and the Dnmt[1kd,3a/,
LTR elements 9.9 2.9 2.9 4.3 3b/] cells. In all but one case, at least one re-sequenced
MER DNA elements 0.9 0.2 0.1 0.2 clone matched the previously determined mCpG pattern pre-
a
Repeat and GC content were taken from Ref. (41).
cisely, and the overall level of methylation for each region was
b
CpG islands were taken from the mm6 mouse genome assembly on the UCSC similar in all cases (Figure 5 and data not shown). Thus, as a
genome browser. rule, a single clone from the RRBS library provides a good
c
Fraction of genome sequence that falls within gene bounds of non-overlapping indication of the general methylation pattern at any given site.
ENSEMBL gene models. This is in line with the predominantly bimodal methylation
d
Fraction of RRBS sequences with significant hits to the ENSEMBL gene
fraction of the genome. profiles observed previously [reviewed in Ref. (42)]. For
e
Fraction of genome sequence that falls within 5 kb upstream of the transcription example, >80% of the loci in the HEP survey of the MHC
start site of ENSEMBL gene models.
f
were either hypermethylated or hypomethylated (29).
Fraction of RRBS sequences with significant hits to regions 5 kb upstream of Four representative examples are shown in Figure 5. For the
transcription start sites.
two loci on chromosome 4 and 15, respectively, all clones,
Figure 5. Targeted bisulfite sequencing of specific loci. Ten loci for which RRBS sequencing indicated mCpGs in Dnmt-deficient cells and 10 loci that were devoid
of methylation were bisulfite re-sequenced using specific primers in wild-type (top), 3a/b double knockout (middle) and Dnmt[1kd,3a/,3b/] cells (bottom).
Shown are two examples of each set. Each row represents a single sequenced molecule. Filled squares are methylated CpGs and empty ones indicate unmethylated
sites. The asterisk indicates the original clone sequenced from the library.
Nucleic Acids Research, 2005, Vol. 33, No. 18 5875
including the clone from the RRBS library, indicated complete tissue-specific regulated methylation domains or long-range
absence of methylation in Dnmt[1kd,3a/,3b/] cells. The methylation gradients along a chromosome. We also envision
sister cell line with normal Dnmt1 levels (Figure 2) was also RRBS applications in epigenetic cancer profiling and bio-
considerably demethylated at these sites compared with wild- marker discovery.
type ES cells. The two other loci maintained more mCpGs in
the methylation-impaired cell lines. The two CpGs on chro- Methylation patterns in methylation-impaired ES cells
mosome 17 that were most consistently methylated in Despite the essential role of each known DNA methyltrans-
Dnmt[3a/,3b/] cells showed also residual methylation ferase during mouse development (11,13), DNA methylation
in the Dnmt[1kd,3a/,3b/] cells. One of these two and the enzymes responsible for its establishment and main-
mCpGs was detected in the RRBS clone. Targeted resequen- tenance appear to be largely dispensable in undifferentiated ES
cing detected methylation at the second CpG. This pattern is cells. Dnmt1 knockout ES cells retain 20% CpG methylation
consistent with passive random loss of CpG methylation in probably a result of continuous de novo methylation by
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
Dnmt[1kd,3a/,3b/] cells. Dnmt3a and Dnmt3b. Although early passage Dnmt3a/b dou-
ble mutant ES cells show almost wild-type levels of CpG
methylation (43,44), they progressively lose methylation with
DISCUSSION <1% remaining after 75 passages (44). This gradual loss may
Large-scale random bisulfite sequencing reflect the infidelity of the maintenance enzyme Dnmt1.
Our data showed that ES cells lacking DNA methyltrans-
In this study we explored the feasibility of large-scale shotgun ferases Dnmt3a and 3b and with greatly reduced levels of
bisulfite sequencing for genome-wide analysis of DNA Dnmt1 were viable with 1–2% CpG methylation remaining
methylation. We have shown that bisulfite sequencing libraries after only six passages. The extremely low rate of false-
can be made that are largely unbiased and representative and positive methylcytosines allowed us to identify and inspect
display few false-positive methylcytosines caused by incom- some of the rare sites that retained methylation. There were no
plete cytosine to uracil conversion or PCR and sequencing obvious hotspots of residual mCpGs in the genome (Supple-
errors. mentary Figures). Also, there was no correlation between the
Insert sizes of the libraries were kept very small (500– numbers of CpGs and the residual methylation at a given site.
600 bp) for two reasons. First, the bisulfite reaction requires The distance to CpG islands or to known genes appeared to be
relatively high temperatures (50–60 C) and a low pH (pH 5), random. None of the loci was notably conserved across spe-
conditions that are known to cause depurination and strand cies. Finally, no specific motif was detected upstream and
breakage; smaller molecules are less prone to damage and downstream of the residual mCpG dinucleotides (data not
require fewer PCR cycles to recover intact for cloning than shown). Our findings provide no evidence of specific main-
larger ones, thereby minimizing the risk of a skewed repres- tenance of residual mCpG by yet another DNA methyl-
entation. Second, larger-insert clones would require sequen- transferase. Rather, Dnmt[1kd,3a/,3b/] cells seem to lose
cing of both strands; however, C-poor strands sequenced residual CpG methylation in a random fashion over time.
poorly in our hands. Only 3 of the 25 505 sequenced CpT dinucleotides were
We used limit digestion with BglII and size fractionation to inferred to be methylated in Dnmt-deficient cells. No methyl-
reduce the complexity of the DNA. The resulting RRBS lib- ated CpA was detected. By comparison, wild-type cells
raries cover a small but reproducible fraction of the genome showed 0.7% CpT and 2.4% CpA methylation in agreement
and are therefore suitable for large-scale comparative methyla- with previous observations (45,46). Previous experiments
tion studies across different strains, tissues or cell types. Based have also shown that the presence of Dnmt1 is not required
on the overall success rate (72%) and insert-size distributions for non-CpG methylation (46). In contrast, non-CpG methyla-
encountered during this pilot study (Figure 4), we expect that tion becomes almost undetectable in ES cells lacking Dnmt3a
for a pair-wise comparison, sequencing 100 · 384 RRBS and Dnmt3b (45). Both global nearest neighbor and our
clones from each DNA sample will produce 4.0 Mb of bisulfite-sequencing data therefore suggest that the de novo
high-quality overlapping bisulfite sequence with 2- to 3-fold DNA methyltransferases 3a and/or 3b are responsible for
coverage in each library of fragments within 1 SD of the mean asymmetric CpA and CpT methylation in murine ES cells.
size. Assuming that improvements in sequencing of C-poor
strands (85% success rate) and better libraries with congruent
CONCLUSION
insert-size distributions can be made, the same sequencing
effort would yield 5.8 Mb of pair-wise comparative seq- In this pilot study we have employed a combination of RNAi-
uence which, of course, is still only a tiny fraction of the induced knockdown and complete knockout of DNA methyl-
genome. transferases to generate murine ES cells that were almost devoid
At this level of genome coverage, differential methylation at of DNA methylation. These cells had 1–2% residual CpG
most individual sites in the genome, including many function- methylation left after a few passages, and non-CpG methylation
ally important ones, will escape detection. However, we was >250-fold reduced compared with wild-type ES cells.
expect the coverage to be sufficient to generate methylation Unamplified, nearly methylation-free genomic DNA is an
variable position markers for future bisulfite SNP ’epigeno- ideal substrate to optimize and test conditions for genome-
typing’ (17). A genome-wide set of comparative bisulfite wide bisulfite conversion, PCR amplification and library con-
sequences may prove useful to train computer algorithms struction for future genomic shotgun bisulfite sequencing of
for predicting methylation patterns. RRBS sequencing may mammalian genomes. We have shown that essentially com-
be sufficient to detect genomic imprints (or the loss thereof), plete bisulfite conversion can be achieved without undue
5876 Nucleic Acids Research, 2005, Vol. 33, No. 18
adverse effects on library complexity and sequence representa- 14. Okano,M. and Li,E. (2002) Genetic analyses of DNA methyltransferase
tion. genes in mouse model system. J. Nutr., 132, 2462S–2465S.
Large-scale random bisulfite sequencing complements 15. Lane,N., Dean,W., Erhardt,S., Hajkova,P., Surani,A., Walter,J. and
Reik,W. (2003) Resistance of IAPs to methylation reprogramming may
existing directed bisulfite sequencing strategies, which are provide a mechanism for epigenetic inheritance in the mouse. Genesis,
well suited to analyze a limited number of gene promoters 35, 88–93.
and regulatory sequence elements in a large number of sam- 16. Lei,H., Oh,S.P., Okano,M., Juttermann,R., Goss,K.A., Jaenisch,R. and
ples. One advantage of sequencing clone libraries in a random Li,E. (1996) De novo DNA cytosine methyltransferase activities in mouse
embryonic stem cells. Development, 122, 3195–3205.
fashion is that no target-specific PCR or sequencing primers 17. Murrell,A., Rakyan,V.K. and Beck,S. (2005) From genome to epigenome.
are needed. Once the library is made, the method is amenable Hum. Mol. Genet., 14, R3–R10.
to automation and is scaleable. Since the bisulfite reads are not 18. Ramsahoye,B.H. (2002) Measurement of genome wide DNA methylation
assembled but merely aligned to the reference genome by reversed-phase high-performance liquid chromatography. Methods,
sequence, we expect this method to work well in combination 27, 156–161.
19. Ramsahoye,B.H. (2002) Nearest-neighbor analysis. Methods Mol. Biol.,
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
with highly parallel sequencing technologies that produce sin- 200, 9–15.
gle reads of 100 bases in length (47). Finally, in principle, 20. Lippman,Z., Gendrel,A.V., Colot,V. and Martienssen,R. (2005) Profiling
bisulfite-converted libraries can be constructed from randomly DNA methylation patterns using genomic tiling microarrays. Nature
sheared DNA for future whole-genome bisulfite sequencing. Methods, 2, 219–224.
21. Lippman,Z., Gendrel,A.V., Black,M., Vaughn,M.W., Dedhia,N.,
McCombie,W.R., Lavine,K., Mittal,V., May,B., Kasschau,K.D. et al.
(2004) Role of transposable elements in heterochromatin and epigenetic
SUPPLEMENTARY DATA control. Nature, 430, 471–476.
22. Yamada,Y., Watanabe,H., Miura,F., Soejima,H., Uchiyama,M.,
Supplementary Data are available at NAR Online. Iwasaka,T., Mukai,T., Sakaki,Y. and Ito,T. (2004) A comprehensive
analysis of allelic methylation status of CpG islands on human
chromosome 21q. Genome Res., 14, 247–266.
ACKNOWLEDGEMENTS 23. Bedell,J.A., Budiman,M.A., Nunberg,A., Citek,R.W., Robbins,D.,
Jones,J., Flick,E., Rholfing,T., Fries,J., Bradford,K. et al. (2005) Sorghum
We would like to thank Laurie Jackson Grusby, Konrad genome sequencing by methylation filtration. PLoS Biol., 3, e13.
Hochedlinger and Katrin Plath for critical reading of the 24. Strichman-Almashanu,L.Z., Lee,R.S., Onyango,P.O., Perlman,E.,
manuscript. This work was funded in part by NIH grants Flam,F., Frieman,M.B. and Feinberg,A.P. (2002) A genome-wide screen
5R01CA87869 (RJ) and HG03067-02 (ESL). AM was for normally methylated human CpG islands that can identify novel
imprinted genes. Genome Res., 12, 543–554.
supported by the Boehringer Ingelheim Fonds. Funding to 25. Rabinowicz,P.D., Schutz,K., Dedhia,N., Yordan,C., Parnell,L.D.,
pay the Open Access publication charges for this article was Stein,L., McCombie,W.R. and Martienssen,R.A. (1999) Differential
provided by Whitehead Institute for Biomedical Research. methylation of genes and retrotransposons facilitates shotgun sequencing
of the maize genome. Nature Genet., 23, 305–308.
Conflict of interest statement. None declared. 26. Weber,M., Davies,J.J., Wittig,D., Oakeley,E.J., Haase,M., Lam,W.L. and
Schubeler,D. (2005) Chromosome-wide and promotor-specific analyses
identify sites of differential DNA methylation in normal and transformed
REFERENCES human cells. Nature Genet., 37, 853–862.
27. Frommer,M., McDonald,L.E., Millar,D.S., Collis,C.M., Watt,F.,
1. Jeltsch,A. (2002) Beyond Watson and Crick: DNA methylation and Grigg,G.W., Molloy,P.L. and Paul,C.L. (1992) A genomic sequencing
molecular enzymology of DNA methyltransferases. Chembiochem, 3, protocol that yields a positive display of 5-methylcytosine
274–293. residues in individual DNA strands. Proc. Natl Acad. Sci. USA, 89,
2. Robertson,K.D. and Wolffe,A.P. (2000) DNA methylation in health and 1827–1831.
disease. Nature Rev. Genet., 1, 11–19. 28. Novik,K.L., Nimmrich,I., Genc,B., Maier,S., Piepenbrock,C., Olek,A.
3. Jaenisch,R. (1997) DNA methylation and imprinting: why bother? Trends and Beck,S. (2002) Epigenomics: genome-wide study of methylation
Genet, 13, 323–329. phenomena. Curr. Issues Mol. Biol., 4, 111–128.
4. Bestor,T.H. (2000) The DNA methyltransferases of mammals. Hum. Mol. 29. Rakyan,V.K., Hildmann,T., Novik,K.L., Lewin,J., Tost,J., Cox,A.V.,
Genet., 9, 2395–2402. Andrews,T.D., Howe,K.L., Otto,T., Olek,A. et al. (2004) DNA
5. Jaenisch,R. and Bird,A. (2003) Epigenetic regulation of gene expression: methylation profiling of the human major histocompatibility complex: a
how the genome integrates intrinsic and environmental signals. Nature pilot study for the human epigenome project. PLoS Biol., 2, e405.
Genet., 33 (Suppl), 245–254. 30. Garnes,J., Ciancio,M. and Gnirke,A. (2002) Construction Of Large-Insert
6. Jones,P.A. and Baylin,S.B. (2002) The fundamental role of epigenetic Bacterial Clone Libraries. In Genomic Mapping and Sequencing. Horizon
events in cancer. Nature Rev. Genet., 3, 415–428. Scientific Press, UK.
7. Robertson,K.D. (2002) DNA methylation and chromatin—unraveling the 31. Paulin,R., Grigg,G.W., Davey,M.W. and Piper,A.A. (1998) Urea
tangled web. Oncogene., 21, 5361–5379. improves efficiency of bisulphite-mediated sequencing of 50 -
8. Laird,P.W. (2003) The power and the promise of DNA methylation methylcytosine in genomic DNA. Nucleic Acids Res., 26, 5009–5010.
markers. Nature Rev. Cancer, 3, 253–266. 32. Don,R.H., Cox,P.T., Wainwright,B.J., Baker,K. and Mattick,J.S. (1991)
9. Gaudet,F., Hodgson,J.G., Eden,A., Jackson-Grusby,L., Dausman,J., ’Touchdown’ PCR to circumvent spurious priming during gene
Gray,J.W., Leonhardt,H. and Jaenisch,R. (2003) Induction of tumors in amplification. Nucleic Acids Res., 19, 4008.
mice by genomic hypomethylation. Science, 300, 489–492. 33. Altshuler,D., Pollara,V.J., Cowles,C.R., Van Etten,W.J., Baldwin,J.,
10. Feinberg,A.P. (2004) The epigenetics of cancer etiology. Semin. Cancer Linton,L. and Lander,E.S. (2000) An SNP map of the human genome
Biol., 14, 427–432. generated by reduced representation shotgun sequencing. Nature, 407,
11. Li,E., Bestor,T.H. and Jaenisch,R. (1992) Targeted mutation of the DNA 513–516.
methyltransferase gene results in embryonic lethality. Cell, 69, 915–926. 34. Ventura,A., Meissner,A., Dillon,C.P., McManus,M., Sharp,P.A., Van
12. Okano,M., Xie,S. and Li,E. (1998) Cloning and characterization of a Parijs,L., Jaenisch,R. and Jacks,T. (2004) Cre-lox-regulated conditional
family of novel mammalian DNA (cytosine-5) methyltransferases. RNA interference from transgenes. Proc. Natl Acad. Sci. USA, 101,
Nature Genet., 19, 219–220. 10380–10385.
13. Okano,M., Bell,D.W., Haber,D.A. and Li,E. (1999) DNA 35. Chapman,V., Forrester,L., Sanford,J., Hastie,N. and Rossant,J. (1984)
methyltransferases Dnmt3a and Dnmt3b are essential for de novo Cell lineage-specific undermethylation of mouse repetitive DNA. Nature,
methylation and mammalian development. Cell, 99, 247–257. 307, 284–286.
Nucleic Acids Research, 2005, Vol. 33, No. 18 5877
36. Walsh,C.P., Chaillet,J.R. and Bestor,T.H. (1998) Transcription of IAP 43. Chen,T., Ueda,Y., Dodge,J.E., Wang,Z. and Li,E. (2003)
endogenous retroviruses is constrained by cytosine methylation. Nature Establishment and maintenance of genomic methylation patterns in
Genet., 20, 116–117. mouse embryonic stem cells by Dnmt3a and Dnmt3b. Mol. Cell. Biol.,
37. Lucifero,D., Mertineit,C., Clarke,H.J., Bestor,T.H. and Trasler,J.M. 23, 5594–5605.
(2002) Methylation dynamics of imprinted genes in mouse germ cells. 44. Jackson,M., Krassowska,A., Gilbert,N., Chevassut,T., Forrester,L.,
Genomics, 79, 530–538. Ansell,J. and Ramsahoye,B. (2004) Severe global DNA
38. Grunau,C., Clark,S.J. and Rosenthal,A. (2001) Bisulfite genomic hypomethylation blocks differentiation and induces histone
sequencing: systematic investigation of critical experimental parameters. hyperacetylation in embryonic stem cells. Mol. Cell. Biol., 24,
Nucleic Acids Res., 29, E65–65. 8862–8871.
39. Fogg,M.J., Pearl,L.H. and Connolly,B.A. (2002) Structural basis for 45. Dodge,J.E., Ramsahoye,B.H., Wo,Z.G., Okano,M. and Li,E. (2002)
uracil recognition by archaeal family B DNA polymerases. Nature Struct. De novo methylation of MMLV provirus in embryonic stem cells: CpG
Biol., 9, 922–927. versus non-CpG methylation. Gene, 289, 41–48.
40. Eads,C.A. and Laird,P.W. (2002) Combined bisulfite restriction analysis 46. Ramsahoye,B.H., Biniszkiewicz,D., Lyko,F., Clark,V., Bird,A.P. and
(COBRA). Methods Mol. Biol., 200, 71–85. Jaenisch,R. (2000) Non-CpG methylation is prevalent in embryonic stem
41. Waterston,R.H., Lindblad-Toh,K., Birney,E., Rogers,J., Abril,J.F., cells and may be mediated by DNA methyltransferase 3a. Proc. Natl Acad.
Downloaded from https://academic.oup.com/nar/article/33/18/5868/2401288 by guest on 07 February 2024
Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. et al. Sci. USA, 97, 5237–5242.
(2002) Initial sequencing and comparative analysis of the mouse genome. 47. Margulies,M., Egholm,M., Altman,W.E., Attiya,S., Bader,J.S.,
Nature, 420, 520–562. Bemben,L.A., Berka,J., Braverman,M.S., Chen,Y.J., Chen,Z. et al.
42. Bird,A. (2002) DNA methylation patterns and epigenetic memory. Genes (2005) Genome sequencing in microfabricated high-density picolitre
Dev., 16, 6–21. reactors. Nature, 437(7057), 376–380.