Published online 19 February 2008 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32
doi:10.1093/nar/gkn074
The use of multiple displacement amplification
to amplify complex DNA libraries
Melissa J. Fullwood, Jack J. S. Tan, Patrick W. P. Ng, Kuo Ping Chiu, Jun Liu,
Chia Lin Wei and Yijun Ruan*
Genome Institute of Singapore, Agency for Science, Technology and Research (ASTAR), 60 Biopolis Street,
Genome #02-01, Singapore 138672
Received December 14, 2007; Revised January 17, 2008; Accepted February 5, 2008
ABSTRACT Paired-End diTags (ChIP-PET) libraries used for
elucidating transcription factor binding sites (5).
Complex libraries for genomic DNA and cDNA In constructing such libraries, the starting DNA
sequencing analyses are typically amplified using
Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016
samples are often limited, and therefore DNA amplifica-
bacterial propagation. To reduce biases, large tion is often necessary. The method of choice has been
numbers of colonies are plated and scraped from bacterial propagation of DNA fragments in plasmid
solid-surface agar. This process is time consuming, vectors. To ensure accurate representation, the bacteria
tedious and limits scaling up. At the same time, must not be allowed to compete with each other for
multiple displacement amplification (MDA) has been nutrients. Therefore, growth and scraping from solid-
recently developed as a method for in vitro ampli- surface agar is commonly used because colonies are spread
fication of DNA. However, MDA has no selection out on solid-surface agar such that they will not encounter
function for the removal of ligation multimers. We each other and compete. As the libraries are complex and
developed a novel method of briefly introducing contain many different DNA molecules, a large number of
colonies must be scraped from the agar to ensure that the
ligation reactions into bacteria to select single insert
resulting library contains sufficient coverage of the
DNA clones followed by MDA to amplify. We applied different DNA molecules present in the original pool.
these methods to a Gene Identification Signatures Plating and scraping large numbers of solid-surface agar
with Paired-End diTags (GIS-PET) library, which is a bacteria clones then results in methods that are tedious,
complex transcriptome library created by pairing time consuming and difficult to scale up.
short tags from the 5’ and 3’ ends of cDNA Multiple displacement amplification (MDA) has been
fragments together, and demonstrated that this recently developed as a method for in vitro amplification
selection and amplification strategy is unbiased of DNA. MDA is a method for amplifying plasmids and
and efficient. long strands of DNA in a cell-free system using phi29
polymerase, a newly discovered polymerase enzyme that
has very high fidelity (6), proof-reading activity (7) and
processivity (8). Such a system would be ideal for
replacing the tedious solid-phase agar scraping steps
used for the amplification of complex cloning-based
INTRODUCTION
libraries. The use of MDA would remove this bottleneck,
A mainstay of genomic technologies to interrogate as MDA is able to amplify complex mixtures with high
genomes and functional genomic elements is the genera- accuracy and efficiency.
tion of complex cloning-based DNA libraries. Examples However, one obstacle to the use of MDA for the
of such libraries include genomic DNA libraries used in amplification of complex cloning-based libraries is the fact
the sequencing of the human genome (1) as well as other that cloning ligation reactions into vectors typically results
genomes (2); full-length cDNA (flcDNA) libraries (3) and in multimers of plasmid vectors and DNA fragments.
Gene Identification Signatures with Paired-End diTags Bacterial propagation can remove multimers because
(GIS-PET) libraries used for elucidating the transcriptome replication constructs that contain multiple origins of
(4); as well as Chromatin Immunoprecipitation with replication will not survive during bacterial replication,
*To whom correspondence should be addressed. Tel: (65) 6478 8073; Fax: (65) 6478 9059; Email:
[email protected]
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
ß 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
e32 Nucleic Acids Research, 2008, Vol. 36, No. 5 PAGE 2 OF 7
while MDA alone is not capable of such selection to RNA using the mMACS mRNA isolation kit (Figure 1B).
eliminate multimers during amplification. The poly A+RNA was converted into cDNA by oligo-
To overcome this problem, we developed a method, dT-primed reverse transcription. RNA ends were
called Selection-MDA, which combines the selection biotinylated. Cap-trapper selection was performed to
capability of bacterial replication for single vector/insert select full-length first strand cDNA. 50 adapters were
constructs with the efficiency and convenience of MDA. In added to prime for second strand cDNA synthesis, and the
this method, we first transfer the vector/insert ligation into material was then digested to give rise to sticky ends for
electrocompetent E. coli for a short period of replication cloning. The flcDNA was then ligated with pGIS4b vector
and selection in liquid media. Because the bacteria are cut with NotI (NEB) and GsuI (Fermentas). The flcDNA
harvested after a short period of growth in liquid media, library was amplified by bacterial amplification at 378C on
the bacteria would not have multiplied to such an extent solid-surface agar Q-trays (Figure 1C) followed by
that they begin to compete for nutrients, yet plasmids scraping and plasmid extraction by Maxiprep (Qiagen).
with multiple origins of replication would be selected out. An aliquot of the Maxiprep was used to prepare a GIS-
The multimer-free pool of plasmids is then purified from PET library by the classic bacterial propagation GIS-PET
liquid media and used for MDA, which amplifies large protocol (12). Briefly, MmeI digestion was performed, and
quantities of multimer-free DNA, thus eliminating tedious the single-PET plasmids were end-polished with T4
and time-consuming plating and scraping of solid-surface polymerase (Promega). The single-PET plasmids were
agar. As such, the selective advantage of bacterial then self-ligated and amplified by bacterial amplification
Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016
propagation can be combined with the efficiency conve- at 378C on solid-surface agar Q-trays (Figure 1C) followed
nience of the MDA method without the disadvantages by scrapping and plasmid extraction by Maxiprep
of sample bias or chimeras. The end result is an MDA- (Qiagen). Single PETs were released with BseRI,
amplified library of the same quality as a similar library purified and concatenated. The concatemers were then
prepared by bacterial propagation. blunted by T4 DNA polymerase (Promega), cloned into
To validate the Selection-MDA method in a complex EcoRV-cut pZErO-1 vectors (Invitrogen) (Figure 1D),
library, we prepared a GIS-PET library (4) with the and 300 384-well plates were sequenced with Sanger
Selection-MDA method, and compared it with the same capillary sequencing. This library was called SHE001.
library prepared by conventional bacterial amplification The library was analyzed, and the results were reported
on solid surface agar (9). Short Paired-End diTag (PET) separately (9).
libraries, including GIS-PET, were conceived of in order To construct the MDA-amplified library using the new
to improve sequencing efficiency. In GIS-PET, the 50 and Selection-MDA protocol (Figure 2), we took an aliquot of
30 signatures of each full-length cDNA are covalently 8 ng of maxiprep from the GIS-PET full-length cDNA
linked into structures in which the 50 and 30 tags were library and added it to 50 ml of Templiphi 500 sample
paired together, and then sequenced, allowing a 20- to buffer (GE Healthcare). The sample was denatured at
30-fold increase in efficiency compared with bidirectional 958C for 3 min, and then cooled to 48C. 2 ml of Templiphi
sequencing of DNA (10). The paired-end nature of the 500 enzyme mix (GE Healthcare) was added to 50 ml
method also allows the use of GIS-PET to study Templiphi sample buffer on ice, and the mixture was then
unconventional fusion transcripts (11). The same concept added to the 50 ml sample buffer with denatured template.
has also been applied to ChIP DNA characterization The reaction was incubated at 308C for 18 h, and then heat
(ChIP-PET) (5). The PET analysis method involves the inactivated at 658C for 10 min. The material was
construction of two libraries: the original DNA insert quantitated with Picogreen Fluorimetry (Invitrogen),
library (flcDNA library for GIS-PET), and the single and an MmeI (New England Biolabs) digestion was
PET library, which is derived from the original DNA performed following the Single PET construction method
insert library. The amplification of the libraries using as described (12). 800 ng of self-ligation reaction was
bacteria propagation is time consuming and labor purified to remove salts before electroporation by phenol/
intensive. To further improve PET analysis, we applied chloroform isopropanol precipitation as described (12).
the Selection-MDA method to replace the single PET The pellet was resuspended in 5 ml of Elution Buffer
library amplification step. (Qiagen). The entire ligation mix was transformed into
50 ml of Top10 E. coli electrocompetent cells (Invitrogen)
and recovered in 1 ml of Lucigen Recovery Medium
(Lucigen) with shaking at 378C for 4 h. Because recovery
MATERIALS AND METHODS
was for only 4 h, the bacteria would not have multiplied
HES3 human embryonic stem (ES) cells were grown and sufficiently so as to compete with each other; hence the
prepared as described (9). Briefly, cells were obtained from library should contain no size bias. To monitor bacterial
ES Cell International, and cultured in a feeder-free growth, the optical density at 600 nm (OD600) of aliquots
medium. Flow cytometry analysis was used to ensure were taken at various time points by an ND-1000
that cells were human ES cells. spectrophotometer (Nanodrop). Cells were spun down at
A flcDNA library was constructed from the human 10 000 g for 5 min and washed twice with 750 ml of Lucigen
embryonic stem cells and PETs were prepared for Recovery Medium to remove free-floating DNA that was
sequencing as described in the classic bacterial propaga- not introduced into the cells. Next, plasmids were
tion protocol (12). Briefly, RNA was isolated from HES3 extracted by performing Miniprep (Qiagen). 40 ml of
cells (Figure 1A), and poly A+RNA was isolated from elution buffer was used for the elution, and the DNA
PAGE 3 OF 7 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32
A B C
D
Figure 1. Library quality controls. (A) HES3 Human embryonic stem cells were grown and prepared as described (9). Total RNA was prepared by
the Trizol isolation method. A smear of RNA with two bright bands corresponding to the 28S and 18S rRNA was obtained. The ladder used in all
Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016
panels is Generuler 1 Kb (Fermentas) (http://www.fermentas.com/catalog/electrophoresis/images/generuler031123.jpg). (B) The mRNA prepared by
the use of the mMACS mRNA isolation kit on total RNA showed no bright bands corresponding to the rRNA. (C) A flcDNA library was prepared
by the Captrapper method, which had a titer of 4.6 106 cfu. Colony PCR quality control of the library was performed. An empty vector will
produce a PCR product of size 260 bp (corresponds to the first band of the ladder); insert sizes were therefore calculated by subtracting off the size of
the empty vector. Colony PCR therefore showed a range of insert sizes from 250 to 2000 bp (corresponds to the second to seventh bands of the
ladder). This is expected, as a flcDNA library is expected to give a range of different-sized inserts, with no single dominant size. Given that the library
was of good quality, as can be seen from the colony PCR, the library was used to prepare two libraries: A single-PET library by the classic method,
and a single-PET library by the Selection-MDA method. (D) A single-PET library was prepared from the full-length library as per the classic
bacterial propagation method. Colony PCR quality control of this library showed a single predominant fixed size of 300 bp in many colonies, which
is expected, as single-PET plasmids all have a fixed size of 2800 bp, and hence upon PCR, will give a band of 300 bp. Certain clones do not show this
fixed size, which could be the result of the incorporation of foreign DNA, or other factors. Colony PCR quality control showed an insert ratio of
75% based on the number of wells that had PCR products of the correct size (300 bp).
was quantitated with Picogreen fluorimetry. 1 ml was run RESULTS
on a PAGE gel to check that plasmids were prepared The starting point for this analysis was HES3 human ES
correctly (Figure 2B, ‘purified plasmids’). Plasmid-safe cell RNA, from which we generated a flcDNA library
DNAse (Epicenter) treatment was then performed to (Figure 1A, B and C). We then generated two libraries:
remove any linear species, such as bacterial genomic (1) a GIS-PET library by the standard approach, called
DNA, that might be present. Phenol/chloroform ethanol SHE001 (Figure 1D), which comprised 613 905 unique
precipitation was then performed and pellets were PETs that were collapsed into 25 845 transcriptional units;
resuspended in 20 ml of Elution Buffer (Qiagen). MDA and (2) a GIS-PET library prepared by the Selection-
was performed on aliquots of 8 ng of material as described MDA approach, called SHE002 (Figure 2), which
earlier. The material was quantitated with Picogreen comprised 12 888 unique PETs which were collapsed
Fluorimetry, and digested with BamHI (New England into 3584 transcriptional units. To construct the MDA-
Biolabs) according to the manufacturer’s protocols. The amplified library (schematic in Figure 2B), a single-PET
PETs were PAGE gel-purified (Figure 2B, ‘50 bp ditags ligation mixture was generated from the maxiprep of the
obtained after BamHI digest’), then cloned, concatenated flcDNA library, transformed into bacteria, and recovered
(Figure 2B, ‘concatenated BamHI-cut PETs’), partially for 4 h in the ‘Selection’ part of the procedure. The short
digested with BamHI, cloned into BamHI-cut pZErO-1 4 h growth in liquid media, allows for the selection of
vectors (Invitrogen), and prepared for sequencing as single insert clones because multiple insert clones have
described (12). Ten plates of 384 colonies consisting of multiple origins of replication and cannot survive.
concatenated PETs were sequenced as a GIS-PET library, However, the time is not long enough to result in
SHE002. A more detailed protocol is provided in the crowding of bacteria in liquid media, such that size bias
Supplementary Data. is minimized. To investigate whether the bacteria would
Data analysis was performed using PET-Tool for PET have multiplied such that they crowd, we analyzed the
extraction and genome mapping (13), followed by optical density of the liquid media at 0, 1, 2 and 4 h. The
visualization in the T2G browser, a specially designed optical density absorbance at 600 nm (OD600) of the media
visualization system for PETs mapped to genome assem- increased from 0.728 at 0 h to 0.897 over 4 h. Using the
blies (4). Calculations were performed with Microsoft approximation that 1 OD600 is 1 109 cells/ml (23), our
Excel. Categories of the genes were identified using RefSeq bacteria increased from 7.3 108 to 9.0 108 cells over
(14), UCSC Known Genes (15), Genbank mRNA 4 h. Hence, our bacteria are still in log growth and not
(http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide), yet saturated (23), thus the increase in cell number
MGC (16), Ensembl (17), ESTs (18), Twinscan (19), should not be sufficient to cause crowding. At the end
SGPGene (20,21) and Genescan (22) databases. of 4 h, the bacteria were washed well and harvested.
e32 Nucleic Acids Research, 2008, Vol. 36, No. 5 PAGE 4 OF 7
Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016
Figure 2. Schematic of the Selection-MDA approach. (A) Schematic showing the differences between the GIS-PET approach using bacterial propagation,
and the GIS-PET approach using Selection-MDA. The Selection-MDA version allows for further amplification of the flcDNA library maxiprep by MDA,
as well as amplification of the single-PET library solely by Selection-MDA without the need for tedious plating and scraping of large numbers of bacterial
colonies from solid-surface agar. Approximate times required for steps that are different between different protocols are given in brackets. Comparing the
steps between Selection-MDA and the bacterial propagation method, it is clear that Selection-MDA requires much less hands-on labor and time, and also,
in terms of absolute time, is at least 4 h shorter. (B) Detailed schematic of the MDA protocol. FlcDNA maxiprep was cut with MmeI, self-ligated, and
transformed into bacteria, which were recovered for 4 h. After this, cells were washed with media, plasmids were extracted. MDA was then performed,
followed by enzymatic digestion, concatenation and then cloning and sequencing. We ran quality control aliquots of the reactions on PAGE gels after the
plasmid purification. Clean plasmids of the correct size, 2800 bp, were obtained. After BamH1 digestion, 50 bp PETs were successfully recovered, as may be
seen from the PAGE gel which shows a band of 50 bp (marked by a white box) separated from a high molecular weight smear from the plasmid backbone.
PETs were successfully excised and concatenated, as may be seen from the smear from the concatemers, which was seen on a third PAGE gel. The
concatemers were excised from the PAGE gel and prepared for subsequent cloning and sequencing.
Plasmids were prepared by miniprep and DNAse cleanup. quality controls indicate that the Selection-MDA proce-
A quality control check showed that clean plasmids dures were successful in producing PETs for sequencing.
(Figure 2B) were obtained. PETs were then released by We analyzed the library of PET sequences derived from
BamHI digestion (Figure 2B). Released PETs were the MDA approach using standard GIS-PET quality
concatenated for Sanger sequencing (Figure 2B). These control measures (4), to investigate whether libraries
PAGE 5 OF 7 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32
Table 1. Analysis of GIS-PET library quality control measures
Category SHE002 (Selection-MDA) SHE004 (Classic) SHE005 (Classic) SHE006 (Classic)
PET sequences
Total number of unique PETs 12 888 13 196 12 988 13 102
PET matches to the genome 0 matches 2953 (22.9%) 2903 (22.0%) 2895 (22.3%) 2925 (22.3%)
1 match 9641 (74.8%) 8266 (62.5%) 9851 (75.8%) 9936 (75.8%)
>1 match 294 (2.3%) 2027 (15.4%) 242 (1.9%) 241 (1.8%)
Mapping accuracy All PETs 88.4% 89.1% 88.2% 88.4%
PETs from the top 20
transcriptional units 98.5% 97.9% 98.5% 99.2%
GC percentage 49.7% 48.9% 48.2% 48.3%
Categories of PETs with 1 match to
the genome Known 5697 (59.1%) 5253 (63.6%) 6080 (61.7%) 6083 (61.2%)
ESTs 3512 (36.4%) 2678 (32.4%) 3291 (33.4%) 3385 (34.1%)
Gene predictions 380 (3.9%) 303 (3.7%) 431 (4.4%) 420 (4.2%)
Novel 52 (0.5%) 31 (0.4%) 48 (0.5%) 48 (0.5%)
Transcriptional units
Total number of transcriptional units 3584 3362 3780 3776
Categories of Transcriptional units Known 2278 (63.6%) 2309 (68.7%) 2490 (65.9%) 2506 (66.4%)
ESTs 997 (27.8%) 817 (24.3%) 965 (25.5%) 956 (25.3%)
Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016
Gene predictions 265 (7.4%) 209 (6.2%) 287 (7.6%) 280 (7.4%)
Novel 44 (1.2%) 27 (0.8%) 38 (1.0%) 34 (0.9%)
prepared by the MDA approach are of good quality. Of a order to compare the two libraries at the same number of
total 12 888 unique PETs sequenced, the number of PETs PETs, we created three smaller virtual libraries, SHE004,
that could not be mapped to the human genome was SHE005 and SHE006 (Table 1), by random selection of
22.9%. This number is comparable to the percentage of data from bacterial propagation library SHE001, such
unmappable PETs (26%) shown in a mouse embryonic that the virtual libraries had the same approximate size as
stem cell library (4), and indicates that the MDA approach that of the MDA-prepared SHE002. Differences within
has a low percentage of chimeras due to multimers as well the set of these three virtual libraries would reflect
as high accuracy amplification, which allows the amplified sampling variation. Hence, if the differences between the
sequences to map well to the genome. In addition, the MDA approach and the conventional approach are
mapping accuracy (percentage within 100 bp of the significant, then the differences between SHE002, and
transcription start site or polyadenylation site) for all SHE004, SHE005 and SHE006 should be much larger
known PETs in SHE002 was 92.5% for 50 tags and 91.9% than the differences between SHE004, SHE005 and
for 30 tags, comparable to the mouse ES cell GIS-PET (4), SHE006. The percentages of PET matches to the
which showed results of 90.7% for 50 tags and 86.9% for genome, numbers of transcriptional units, as well as
30 tags. Overall, the percentage of PETs with both 50 and 30 mapping accuracies of SHE004, SHE005 and SHE006 are
tags that map accurately is 88.4% for the entire library. comparable to that of SHE002, indicating that the MDA-
While high, this measure includes mRNAs that have prepared library is of similar quality as that of the
alternative splicing and alternative transcription start sites conventionally-prepared library constructed from the
and hence represents a lower bound. The 12 888 unique same starting material (Table 1).
PETs were collapsed into 3584 transcriptional units. To Next, we checked whether the MDA procedure caused
more accurately measure the mapping accuracy of the any biases in the sample. Because MDA is a different
library, we examined PET sequences from the top 20 most amplification method from bacterial amplification, we
abundant transcriptional units, which are well-annotated. wished to investigate if there was any base bias. Base bias
The overall mapping accuracy is 98.5% for the top 20 was measured by calculating the GC percentage of the
transcriptional units of SHE002. This high level of library. There is minimal base bias between the MDA
mapping accuracy indicates that Selection-MDA method method and the conventional method (Table 1).
can accurately capture gene identification signatures. Again because MDA is a different amplification
In order to directly compare the performance of the method, we investigated whether there is any bias towards
Selection-MDA protocol with the standard protocol, we any category of genes, such as novel genes. We grouped
wanted to compare the quality control measures of the the PETs and transcriptional units into ‘known genes’,
MDA-prepared GIS-PET library with those of a GIS- ‘gene predictions’, ‘ESTs’ and ‘novel genes’. All libraries
PET library (SHE001) prepared by conventional bacterial showed similar distributions, indicating minimal category
amplification. As the size of the data sampled from library bias (Table 1).
SHE001 (the total number of PETs is 613 905) is almost The Selection-MDA step could not have introduced a
50-fold larger than the size sampled from library SHE002 length bias in this particular library, because Selection-
(the total number of PETs is 12 888), a direct comparison MDA was performed on single PET clones, which are all
of these two libraries will not be meaningful. Therefore, in of a fixed size. Therefore, we could not test whether
e32 Nucleic Acids Research, 2008, Vol. 36, No. 5 PAGE 6 OF 7
45 Table 2. Identities of Top 20 most abundant transcriptional units
Percentage of all known mRNAs (%)
40 SHE002 (Selection-MDA) (genes)
SHE004 (Classic)
35
SHE005 (Classic)
30
Rank SHE002 SHE004 SHE005 SHE006
SHE006 (Classic)
(Selection-MDA) (Classic) (Classic) (Classic)
25
20
1 FTL FTL FTL FTL
15 2 GAPDH MIF ENO1 MIF
10 3 MIF ENO1 MIF ENO1
5
4 TPI1 PRDX1 RPL13 LOC388817
5 ENO1 IFITM1 RPS2 RPS2
0
6 LOC388817 C14orf172 TPI1 RPL13
0 500 1000 1500 2000 2500 3000 3500 4000 4500 > 5000
7 RPL13 K-ALPHA-1 LOC388817 H3F3A
mRNA span (in 500 bp bins) 8 OAZ1 PGK1 RPL9 PRDX1
9 FTH1 RPL13 K-ALPHA-1 TMSL3
Figure 3. Analysis of length bias between the MDA approach and the 10 TMSL3 PFN1 FTH1 TPI1
bacterial amplification approach. We tested for the presence of length 11 H3F3A LOC388817 MDK H2AFZ
bias by classifying the mRNA lengths of the best-matching Known 12 IFITM1 RPL18 H2AFZ K-ALPHA-1
Genes, ESTs, or Gene Predictions from each library into 500-bp bins, 13 H2AFZ IFITM3 H3F3A FTH1
which were then plotted on a graph. There is a small length bias. 14 PRDX1 ACTG1 PGK1 PRDX4
Because the length bias is small, it is possible that the apparent bias is 15 C14orf172 PRDX4 RPL18 PFN1
due to sampling variation. 16 PFN1 MDK TMSL3 C14orf172
Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016
17 RPL15 OAZ1 IFITM1 IFITM1
18 TPT1 RPL8 PRDX1 RPL9
Selection-MDA would result in length biases or not. 19 RPL9 RPLP0 C14orf172 RPL18
However, given that MDA was performed on the full- 20 RPL10 STOML2 OAZ1 PGK1
length cDNA library maxiprep to obtain more material
for the construction of the single-PET library in the MDA
procedure, we reasoned that this step might have GIS-PET library and found that the Selection-MDA
introduced a length bias, and hence investigated whether method results in a library with similar content and
there was a length bias. We tested for the presence of quality control statistics as compared with a library
length bias by investigating the mRNA lengths of the best- constructed from the same starting material that was
matching known genes, ESTs or gene predictions, and amplified with bacteria and harvested through scraping
found there was a length bias towards shorter mRNAs on bacterial colonies from solid surface agar.
the part of Selection-MDA, but the bias is small Comparing the steps between the MDA version and the
(Figure 3). Given that the bias is small, it is possible bacterial propagation method, it is clear that the MDA
that the apparent bias could still be the result of sampling version requires much less hands-on labor. In terms of the
variation. physical handling, the MDA version uses small scale
Next, we reasoned that the contents of the SHE002, 1.5 ml tubes of material whereas the bacterial propagation
SHE004, SHE005 and SHE006 libraries should be similar, method uses 10 large Q-trays and many maxiprep
because the same starting full-length cDNA library was columns. The approximate times for each step that
used for the preparation of the two libraries. Hence, we differed between the two protocols was estimated
compared the top 20 most abundant transcriptional units (Figure 2A). Comparing the absolute times required, the
of each library with each other. The average number of MDA method requires 4 h less time than the bacterial
transcriptional units that are the same between SHE002 propagation method. Considering the fact that many of
(the MDA-prepared library) and any randomly selected the time-consuming steps in MDA do not require hands-
library from a bacterial propagation library is 13. The on activities and hence allows other projects to be carried
average number of transcriptional units that are the same out in parallel, the time requirement of the MDA method
between the bacterial propagation libraries is 14, suggest- is much less than the bacterial propagation method. With
ing that the agreement between the MDA method and the recent improvements in the MDA method (for example,
bacterial amplification method is similar to the agreement the Illustra Genomiphi V2 DNA Amplification kit from
between randomly selected libraries chosen from the same GE Healthcare), further time savings could be possible.
bacterial propagation library (Table 2). This analysis thus The concept of performing bacterial selection followed
indicates that the contents of the MDA-prepared library by MDA (Selection-MDA) may be used to replace
show a good match to those of the conventionally amplification steps in complex libraries, and represents a
prepared library. substantial improvement to existing cloning-based proto-
cols. The Selection-MDA method is an effective and
simple method for the unbiased amplification of a pool of
complex clones, which allows scale-up and elimination of
DISCUSSION
tedious scraping steps in library-preparation protocols.
Taken together, we have shown the method of inserting The method may be readily integrated and applied to
plasmids into bacteria for a short selection interval current cloning-based protocols.
followed by MDA is a feasible method for the construc- In conclusion, Selection-MDA is a novel method
tion of a complex library. We have successfully applied for the amplification of cloned libraries consisting of
Selection-MDA to the construction of a complex complex DNA. We applied Selection-MDA to a GIS-PET
PAGE 7 OF 7 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32
library, an example of a cloned, complex DNA library, to 9. Zhao,X.D., Xu,H., Chew,J.L., Liu,J., Chiu,K.P., Choo,A.,
illustrate the benefits of Selection-MDA. Library prepara- Orlov,Y.L., Sung,K.W., Shahab,A., Kuznetsov,V.A. et al. (2007)
Whole-genome mapping of histone H3 Lys4 and 27 trimethylations
tion was made simpler, and differences between the MDA- reveals distinct genomic compartments in human embryonic stem
prepared library and a library prepared by the classic cells. Cell Stem Cell, 1, 286–298.
protocol were minimal. Hence, Selection-MDA is an 10. Ng,P., Tan,J.J., Ooi,H.S., Lee,Y.L., Chiu,K.P., Fullwood,M.J.,
effective and useful improvement to current cloning-based Srinivasan,K.G., Perbost,C., Du,L., Sung,W.K. et al. (2006)
Multiplex sequencing of paired-end ditags (MS-PET): a strategy for
protocols. the ultra-high-throughput analysis of transcriptomes and genomes.
Nucleic Acids Res., 34, e84.
11. Ruan,Y., Ooi,H.S., Choo,S.W., Chiu,K.P., Zhao,X.D.,
SUPPLEMENTARY DATA Srinivasan,K.G., Yao,F., Choo,C.Y., Liu,J., Ariyaratne,P. et al.
Supplementary Data are available at NAR Online. (2007) Fusion transcripts and transcribed retrotransposed loci
discovered through comprehensive transcriptome analysis using
Paired-End diTags (PETs). Genome Res., 17, 828–838.
ACKNOWLEDGEMENTS 12. Ng,P., Wei,C.L. and Ruan,Y. (2006) Paired-end diTagging for
transcriptome and genome analysis. In Ausubel,F.M., Brent,R.,
The authors gratefully acknowledge Mr H. Thoreau and Kingston,R.E., Moore,D.D., Seidman,J.G., Smith,J.A. and
the Genome Technology & Biology Group at the Genome Struhl,K. (eds), Current Protocols in Molecular Biology, 2006, Unit
Institute of Singapore for high-throughput sequencing 21.12. John Wiley and Sons, Inc, New York.
13. Chiu,K.P., Wong,C.H., Chen,Q., Ariyaratne,P., Ooi,H.S., Wei,C.L.,
support. National Institutes of Health (1R01HG003521- Sung,W.K. and Ruan,Y. (2006) PET-Tool: a software suite for
01 to C.L.W and Y.J.R.); Agency for Science, Technology
Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016
comprehensive processing and managing of Paired-End diTag
and Research (ASTAR grants to C.L.W. and Y.J.R.; (PET) sequence data. BMC Bioinformatics, 7, 390.
ASTAR National Science Scholarship to M.J.F.). 14. Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2007) NCBI reference
sequences (RefSeq): a curated non-redundant sequence database
Funding to pay the Open Access publication charges for of genomes, transcripts and proteins. Nucleic Acids Res., 35,
this article was provided by the Agency for Science, D61–D65.
Technology and Research. 15. Hsu,F., Kent,W.J., Clawson,H., Kuhn,R.M., Diekhans,M. and
Haussler,D. (2006) The UCSC known genes. Bioinformatics, 22,
Conflict of interest statement. None declared. 1036–1046.
16. Gerhard,D.S., Wagner,L., Feingold,E.A., Shenmen,C.M.,
Grouse,L.H., Schuler,G., Klein,S.L., Old,S., Rasooly,R., Good,P.
REFERENCES et al. (2004) The status, quality, and expansion of the NIH
full-length cDNA project: the mammalian gene collection (MGC).
1. Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C.,
Genome Res., 14, 2121–2127.
Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al.
17. Hubbard,T.J., Aken,B.L., Beal,K., Ballester,B., Caccamo,M.,
(2001) Initial sequencing and analysis of the human genome.
Chen,Y., Clarke,L., Coates,G., Cunningham,F., Cutts,T. et al.
Nature, 409, 860–921.
(2007) Ensembl 2007. Nucleic Acids Res., 35, D610–D617.
2. Waterston,R.H., Lindblad-Toh,K., Birney,E., Rogers,J., Abril,J.F.,
18. Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993)
Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P.
dbEST—database for ‘expressed sequence tags’. Nat. Genet., 4,
et al. (2002) Initial sequencing and comparative analysis of the
332–333.
mouse genome. Nature, 420, 520–562.
19. Korf,I., Flicek,P., Duan,D. and Brent,M.R. (2001) Integrating
3. Strausberg,R.L., Feingold,E.A., Klausner,R.D. and Collins,F.S.
genomic homology into gene structure prediction. Bioinformatics, 17
(1999) The mammalian gene collection. Science, 286, 455–457.
(Suppl 1), S140–S148.
4. Ng,P., Wei,C.L., Sung,W.K., Chiu,K.P., Lipovich,L., Ang,C.C.,
20. Guigo,R., Dermitzakis,E.T., Agarwal,P., Ponting,C.P., Parra,G.,
Gupta,S., Shahab,A., Ridwan,A., Wong,C.H. et al. (2005) Gene
Reymond,A., Abril,J.F., Keibler,E., Lyle,R., Ucla,C. et al. (2003)
identification signature (GIS) analysis for transcriptome
Comparison of mouse and human genomes followed by
characterization and genome annotation. Nat. Methods, 2, 105–111.
experimental verification yields an estimated 1,019 additional genes.
5. Wei,C.L., Wu,Q., Vega,V.B., Chiu,K.P., Ng,P., Zhang,T.,
Proc. Natl Acad. Sci. USA, 100, 1140–1145.
Shahab,A., Yong,H.C., Fu,Y., Weng,Z. et al. (2006) A global map
21. Parra,G., Agarwal,P., Abril,J.F., Wiehe,T., Fickett,J.W. and
of p53 transcription-factor binding sites in the human genome. Cell,
Guigo,R. (2003) Comparative gene prediction in human and mouse.
124, 207–219.
Genome Res., 13, 108–117.
6. Esteban,J.A., Salas,M. and Blanco,L. (1993) Fidelity of phi 29
22. Burge,C. and Karlin,S. (1997) Prediction of complete
DNA polymerase. Comparison between protein-primed initiation
gene structures in human genomic DNA. J. Mol. Biol., 268, 78–94.
and DNA polymerization. J. Biol. Chem., 268, 2719–2726.
23. Elbing,K. and Brent,R. (2002) Growth in liquid media.
7. Garmendia,C., Bernad,A., Esteban,J.A., Blanco,L. and Salas,M.
In Ausubel,F.M., Brent,R., Kingston,R.E., Moore,D.D.,
(1992) The bacteriophage phi 29 DNA polymerase, a proofreading
Seidman,J.G., Smith,J.A. and Struhl,K. (eds), Current Protocols in
enzyme. J. Biol. Chem., 267, 2594–2599.
Molecular Biology, 2002, Unit 1.2.1. John Wiley and Sons, Inc, New
8. Blanco,L., Bernad,A., Lazaro,J.M., Martin,G., Garmendia,C. and
York.
Salas,M. (1989) Highly efficient DNA synthesis by the phage phi 29
DNA polymerase. Symmetrical mode of DNA replication. J. Biol.
Chem., 264, 8935–8940.