The use of multiple displacement amplification to amplify complex DNA libraries

Jack Tan

doi:10.1093/NAR/GKN074

Outline

The use of multiple displacement amplification to amplify complex DNA libraries

Jack Tan

2008, Nucleic Acids Research

https://doi.org/10.1093/NAR/GKN074

visibility

…

description

7 pages

Published online 19 February 2008 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32 doi:10.1093/nar/gkn074 The use of multiple displacement amplification to amplify complex DNA libraries Melissa J. Fullwood, Jack J. S. Tan, Patrick W. P. Ng, Kuo Ping Chiu, Jun Liu, Chia Lin Wei and Yijun Ruan* Genome Institute of Singapore, Agency for Science, Technology and Research (ASTAR), 60 Biopolis Street, Genome #02-01, Singapore 138672 Received December 14, 2007; Revised January 17, 2008; Accepted February 5, 2008 ABSTRACT Paired-End diTags (ChIP-PET) libraries used for elucidating transcription factor binding sites (5). Complex libraries for genomic DNA and cDNA In constructing such libraries, the starting DNA sequencing analyses are typically amplified using Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016 samples are often limited, and therefore DNA ampliﬁca- bacterial propagation. To reduce biases, large tion is often necessary. The method of choice has been numbers of colonies are plated and scraped from bacterial propagation of DNA fragments in plasmid solid-surface agar. This process is time consuming, vectors. To ensure accurate representation, the bacteria tedious and limits scaling up. At the same time, must not be allowed to compete with each other for multiple displacement amplification (MDA) has been nutrients. Therefore, growth and scraping from solid- recently developed as a method for in vitro ampli- surface agar is commonly used because colonies are spread fication of DNA. However, MDA has no selection out on solid-surface agar such that they will not encounter function for the removal of ligation multimers. We each other and compete. As the libraries are complex and developed a novel method of briefly introducing contain many diﬀerent DNA molecules, a large number of colonies must be scraped from the agar to ensure that the ligation reactions into bacteria to select single insert resulting library contains suﬃcient coverage of the DNA clones followed by MDA to amplify. We applied diﬀerent DNA molecules present in the original pool. these methods to a Gene Identification Signatures Plating and scraping large numbers of solid-surface agar with Paired-End diTags (GIS-PET) library, which is a bacteria clones then results in methods that are tedious, complex transcriptome library created by pairing time consuming and diﬃcult to scale up. short tags from the 5’ and 3’ ends of cDNA Multiple displacement ampliﬁcation (MDA) has been fragments together, and demonstrated that this recently developed as a method for in vitro ampliﬁcation selection and amplification strategy is unbiased of DNA. MDA is a method for amplifying plasmids and and efficient. long strands of DNA in a cell-free system using phi29 polymerase, a newly discovered polymerase enzyme that has very high ﬁdelity (6), proof-reading activity (7) and processivity (8). Such a system would be ideal for replacing the tedious solid-phase agar scraping steps used for the ampliﬁcation of complex cloning-based INTRODUCTION libraries. The use of MDA would remove this bottleneck, A mainstay of genomic technologies to interrogate as MDA is able to amplify complex mixtures with high genomes and functional genomic elements is the genera- accuracy and eﬃciency. tion of complex cloning-based DNA libraries. Examples However, one obstacle to the use of MDA for the of such libraries include genomic DNA libraries used in ampliﬁcation of complex cloning-based libraries is the fact the sequencing of the human genome (1) as well as other that cloning ligation reactions into vectors typically results genomes (2); full-length cDNA (ﬂcDNA) libraries (3) and in multimers of plasmid vectors and DNA fragments. Gene Identiﬁcation Signatures with Paired-End diTags Bacterial propagation can remove multimers because (GIS-PET) libraries used for elucidating the transcriptome replication constructs that contain multiple origins of (4); as well as Chromatin Immunoprecipitation with replication will not survive during bacterial replication, *To whom correspondence should be addressed. Tel: (65) 6478 8073; Fax: (65) 6478 9059; Email: [email protected] The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. ß 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. e32 Nucleic Acids Research, 2008, Vol. 36, No. 5 PAGE 2 OF 7 while MDA alone is not capable of such selection to RNA using the mMACS mRNA isolation kit (Figure 1B). eliminate multimers during ampliﬁcation. The poly A+RNA was converted into cDNA by oligo- To overcome this problem, we developed a method, dT-primed reverse transcription. RNA ends were called Selection-MDA, which combines the selection biotinylated. Cap-trapper selection was performed to capability of bacterial replication for single vector/insert select full-length ﬁrst strand cDNA. 50 adapters were constructs with the eﬃciency and convenience of MDA. In added to prime for second strand cDNA synthesis, and the this method, we ﬁrst transfer the vector/insert ligation into material was then digested to give rise to sticky ends for electrocompetent E. coli for a short period of replication cloning. The ﬂcDNA was then ligated with pGIS4b vector and selection in liquid media. Because the bacteria are cut with NotI (NEB) and GsuI (Fermentas). The ﬂcDNA harvested after a short period of growth in liquid media, library was ampliﬁed by bacterial ampliﬁcation at 378C on the bacteria would not have multiplied to such an extent solid-surface agar Q-trays (Figure 1C) followed by that they begin to compete for nutrients, yet plasmids scraping and plasmid extraction by Maxiprep (Qiagen). with multiple origins of replication would be selected out. An aliquot of the Maxiprep was used to prepare a GIS- The multimer-free pool of plasmids is then puriﬁed from PET library by the classic bacterial propagation GIS-PET liquid media and used for MDA, which ampliﬁes large protocol (12). Brieﬂy, MmeI digestion was performed, and quantities of multimer-free DNA, thus eliminating tedious the single-PET plasmids were end-polished with T4 and time-consuming plating and scraping of solid-surface polymerase (Promega). The single-PET plasmids were agar. As such, the selective advantage of bacterial then self-ligated and ampliﬁed by bacterial ampliﬁcation Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016 propagation can be combined with the eﬃciency conve- at 378C on solid-surface agar Q-trays (Figure 1C) followed nience of the MDA method without the disadvantages by scrapping and plasmid extraction by Maxiprep of sample bias or chimeras. The end result is an MDA- (Qiagen). Single PETs were released with BseRI, ampliﬁed library of the same quality as a similar library puriﬁed and concatenated. The concatemers were then prepared by bacterial propagation. blunted by T4 DNA polymerase (Promega), cloned into To validate the Selection-MDA method in a complex EcoRV-cut pZErO-1 vectors (Invitrogen) (Figure 1D), library, we prepared a GIS-PET library (4) with the and 300 384-well plates were sequenced with Sanger Selection-MDA method, and compared it with the same capillary sequencing. This library was called SHE001. library prepared by conventional bacterial ampliﬁcation The library was analyzed, and the results were reported on solid surface agar (9). Short Paired-End diTag (PET) separately (9). libraries, including GIS-PET, were conceived of in order To construct the MDA-ampliﬁed library using the new to improve sequencing eﬃciency. In GIS-PET, the 50 and Selection-MDA protocol (Figure 2), we took an aliquot of 30 signatures of each full-length cDNA are covalently 8 ng of maxiprep from the GIS-PET full-length cDNA linked into structures in which the 50 and 30 tags were library and added it to 50 ml of Templiphi 500 sample paired together, and then sequenced, allowing a 20- to buﬀer (GE Healthcare). The sample was denatured at 30-fold increase in eﬃciency compared with bidirectional 958C for 3 min, and then cooled to 48C. 2 ml of Templiphi sequencing of DNA (10). The paired-end nature of the 500 enzyme mix (GE Healthcare) was added to 50 ml method also allows the use of GIS-PET to study Templiphi sample buﬀer on ice, and the mixture was then unconventional fusion transcripts (11). The same concept added to the 50 ml sample buﬀer with denatured template. has also been applied to ChIP DNA characterization The reaction was incubated at 308C for 18 h, and then heat (ChIP-PET) (5). The PET analysis method involves the inactivated at 658C for 10 min. The material was construction of two libraries: the original DNA insert quantitated with Picogreen Fluorimetry (Invitrogen), library (ﬂcDNA library for GIS-PET), and the single and an MmeI (New England Biolabs) digestion was PET library, which is derived from the original DNA performed following the Single PET construction method insert library. The ampliﬁcation of the libraries using as described (12). 800 ng of self-ligation reaction was bacteria propagation is time consuming and labor puriﬁed to remove salts before electroporation by phenol/ intensive. To further improve PET analysis, we applied chloroform isopropanol precipitation as described (12). the Selection-MDA method to replace the single PET The pellet was resuspended in 5 ml of Elution Buﬀer library ampliﬁcation step. (Qiagen). The entire ligation mix was transformed into 50 ml of Top10 E. coli electrocompetent cells (Invitrogen) and recovered in 1 ml of Lucigen Recovery Medium (Lucigen) with shaking at 378C for 4 h. Because recovery MATERIALS AND METHODS was for only 4 h, the bacteria would not have multiplied HES3 human embryonic stem (ES) cells were grown and suﬃciently so as to compete with each other; hence the prepared as described (9). Brieﬂy, cells were obtained from library should contain no size bias. To monitor bacterial ES Cell International, and cultured in a feeder-free growth, the optical density at 600 nm (OD600) of aliquots medium. Flow cytometry analysis was used to ensure were taken at various time points by an ND-1000 that cells were human ES cells. spectrophotometer (Nanodrop). Cells were spun down at A ﬂcDNA library was constructed from the human 10 000 g for 5 min and washed twice with 750 ml of Lucigen embryonic stem cells and PETs were prepared for Recovery Medium to remove free-ﬂoating DNA that was sequencing as described in the classic bacterial propaga- not introduced into the cells. Next, plasmids were tion protocol (12). Brieﬂy, RNA was isolated from HES3 extracted by performing Miniprep (Qiagen). 40 ml of cells (Figure 1A), and poly A+RNA was isolated from elution buﬀer was used for the elution, and the DNA PAGE 3 OF 7 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32 A B C D Figure 1. Library quality controls. (A) HES3 Human embryonic stem cells were grown and prepared as described (9). Total RNA was prepared by the Trizol isolation method. A smear of RNA with two bright bands corresponding to the 28S and 18S rRNA was obtained. The ladder used in all Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016 panels is Generuler 1 Kb (Fermentas) (http://www.fermentas.com/catalog/electrophoresis/images/generuler031123.jpg). (B) The mRNA prepared by the use of the mMACS mRNA isolation kit on total RNA showed no bright bands corresponding to the rRNA. (C) A ﬂcDNA library was prepared by the Captrapper method, which had a titer of 4.6 106 cfu. Colony PCR quality control of the library was performed. An empty vector will produce a PCR product of size 260 bp (corresponds to the ﬁrst band of the ladder); insert sizes were therefore calculated by subtracting oﬀ the size of the empty vector. Colony PCR therefore showed a range of insert sizes from 250 to 2000 bp (corresponds to the second to seventh bands of the ladder). This is expected, as a ﬂcDNA library is expected to give a range of diﬀerent-sized inserts, with no single dominant size. Given that the library was of good quality, as can be seen from the colony PCR, the library was used to prepare two libraries: A single-PET library by the classic method, and a single-PET library by the Selection-MDA method. (D) A single-PET library was prepared from the full-length library as per the classic bacterial propagation method. Colony PCR quality control of this library showed a single predominant ﬁxed size of 300 bp in many colonies, which is expected, as single-PET plasmids all have a ﬁxed size of 2800 bp, and hence upon PCR, will give a band of 300 bp. Certain clones do not show this ﬁxed size, which could be the result of the incorporation of foreign DNA, or other factors. Colony PCR quality control showed an insert ratio of 75% based on the number of wells that had PCR products of the correct size (300 bp). was quantitated with Picogreen ﬂuorimetry. 1 ml was run RESULTS on a PAGE gel to check that plasmids were prepared The starting point for this analysis was HES3 human ES correctly (Figure 2B, ‘puriﬁed plasmids’). Plasmid-safe cell RNA, from which we generated a ﬂcDNA library DNAse (Epicenter) treatment was then performed to (Figure 1A, B and C). We then generated two libraries: remove any linear species, such as bacterial genomic (1) a GIS-PET library by the standard approach, called DNA, that might be present. Phenol/chloroform ethanol SHE001 (Figure 1D), which comprised 613 905 unique precipitation was then performed and pellets were PETs that were collapsed into 25 845 transcriptional units; resuspended in 20 ml of Elution Buﬀer (Qiagen). MDA and (2) a GIS-PET library prepared by the Selection- was performed on aliquots of 8 ng of material as described MDA approach, called SHE002 (Figure 2), which earlier. The material was quantitated with Picogreen comprised 12 888 unique PETs which were collapsed Fluorimetry, and digested with BamHI (New England into 3584 transcriptional units. To construct the MDA- Biolabs) according to the manufacturer’s protocols. The ampliﬁed library (schematic in Figure 2B), a single-PET PETs were PAGE gel-puriﬁed (Figure 2B, ‘50 bp ditags ligation mixture was generated from the maxiprep of the obtained after BamHI digest’), then cloned, concatenated ﬂcDNA library, transformed into bacteria, and recovered (Figure 2B, ‘concatenated BamHI-cut PETs’), partially for 4 h in the ‘Selection’ part of the procedure. The short digested with BamHI, cloned into BamHI-cut pZErO-1 4 h growth in liquid media, allows for the selection of vectors (Invitrogen), and prepared for sequencing as single insert clones because multiple insert clones have described (12). Ten plates of 384 colonies consisting of multiple origins of replication and cannot survive. concatenated PETs were sequenced as a GIS-PET library, However, the time is not long enough to result in SHE002. A more detailed protocol is provided in the crowding of bacteria in liquid media, such that size bias Supplementary Data. is minimized. To investigate whether the bacteria would Data analysis was performed using PET-Tool for PET have multiplied such that they crowd, we analyzed the extraction and genome mapping (13), followed by optical density of the liquid media at 0, 1, 2 and 4 h. The visualization in the T2G browser, a specially designed optical density absorbance at 600 nm (OD600) of the media visualization system for PETs mapped to genome assem- increased from 0.728 at 0 h to 0.897 over 4 h. Using the blies (4). Calculations were performed with Microsoft approximation that 1 OD600 is 1 109 cells/ml (23), our Excel. Categories of the genes were identiﬁed using RefSeq bacteria increased from 7.3 108 to 9.0 108 cells over (14), UCSC Known Genes (15), Genbank mRNA 4 h. Hence, our bacteria are still in log growth and not (http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide), yet saturated (23), thus the increase in cell number MGC (16), Ensembl (17), ESTs (18), Twinscan (19), should not be suﬃcient to cause crowding. At the end SGPGene (20,21) and Genescan (22) databases. of 4 h, the bacteria were washed well and harvested. e32 Nucleic Acids Research, 2008, Vol. 36, No. 5 PAGE 4 OF 7 Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016 Figure 2. Schematic of the Selection-MDA approach. (A) Schematic showing the diﬀerences between the GIS-PET approach using bacterial propagation, and the GIS-PET approach using Selection-MDA. The Selection-MDA version allows for further ampliﬁcation of the ﬂcDNA library maxiprep by MDA, as well as ampliﬁcation of the single-PET library solely by Selection-MDA without the need for tedious plating and scraping of large numbers of bacterial colonies from solid-surface agar. Approximate times required for steps that are diﬀerent between diﬀerent protocols are given in brackets. Comparing the steps between Selection-MDA and the bacterial propagation method, it is clear that Selection-MDA requires much less hands-on labor and time, and also, in terms of absolute time, is at least 4 h shorter. (B) Detailed schematic of the MDA protocol. FlcDNA maxiprep was cut with MmeI, self-ligated, and transformed into bacteria, which were recovered for 4 h. After this, cells were washed with media, plasmids were extracted. MDA was then performed, followed by enzymatic digestion, concatenation and then cloning and sequencing. We ran quality control aliquots of the reactions on PAGE gels after the plasmid puriﬁcation. Clean plasmids of the correct size, 2800 bp, were obtained. After BamH1 digestion, 50 bp PETs were successfully recovered, as may be seen from the PAGE gel which shows a band of 50 bp (marked by a white box) separated from a high molecular weight smear from the plasmid backbone. PETs were successfully excised and concatenated, as may be seen from the smear from the concatemers, which was seen on a third PAGE gel. The concatemers were excised from the PAGE gel and prepared for subsequent cloning and sequencing. Plasmids were prepared by miniprep and DNAse cleanup. quality controls indicate that the Selection-MDA proce- A quality control check showed that clean plasmids dures were successful in producing PETs for sequencing. (Figure 2B) were obtained. PETs were then released by We analyzed the library of PET sequences derived from BamHI digestion (Figure 2B). Released PETs were the MDA approach using standard GIS-PET quality concatenated for Sanger sequencing (Figure 2B). These control measures (4), to investigate whether libraries PAGE 5 OF 7 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32 Table 1. Analysis of GIS-PET library quality control measures Category SHE002 (Selection-MDA) SHE004 (Classic) SHE005 (Classic) SHE006 (Classic) PET sequences Total number of unique PETs 12 888 13 196 12 988 13 102 PET matches to the genome 0 matches 2953 (22.9%) 2903 (22.0%) 2895 (22.3%) 2925 (22.3%) 1 match 9641 (74.8%) 8266 (62.5%) 9851 (75.8%) 9936 (75.8%) >1 match 294 (2.3%) 2027 (15.4%) 242 (1.9%) 241 (1.8%) Mapping accuracy All PETs 88.4% 89.1% 88.2% 88.4% PETs from the top 20 transcriptional units 98.5% 97.9% 98.5% 99.2% GC percentage 49.7% 48.9% 48.2% 48.3% Categories of PETs with 1 match to the genome Known 5697 (59.1%) 5253 (63.6%) 6080 (61.7%) 6083 (61.2%) ESTs 3512 (36.4%) 2678 (32.4%) 3291 (33.4%) 3385 (34.1%) Gene predictions 380 (3.9%) 303 (3.7%) 431 (4.4%) 420 (4.2%) Novel 52 (0.5%) 31 (0.4%) 48 (0.5%) 48 (0.5%) Transcriptional units Total number of transcriptional units 3584 3362 3780 3776 Categories of Transcriptional units Known 2278 (63.6%) 2309 (68.7%) 2490 (65.9%) 2506 (66.4%) ESTs 997 (27.8%) 817 (24.3%) 965 (25.5%) 956 (25.3%) Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016 Gene predictions 265 (7.4%) 209 (6.2%) 287 (7.6%) 280 (7.4%) Novel 44 (1.2%) 27 (0.8%) 38 (1.0%) 34 (0.9%) prepared by the MDA approach are of good quality. Of a order to compare the two libraries at the same number of total 12 888 unique PETs sequenced, the number of PETs PETs, we created three smaller virtual libraries, SHE004, that could not be mapped to the human genome was SHE005 and SHE006 (Table 1), by random selection of 22.9%. This number is comparable to the percentage of data from bacterial propagation library SHE001, such unmappable PETs (26%) shown in a mouse embryonic that the virtual libraries had the same approximate size as stem cell library (4), and indicates that the MDA approach that of the MDA-prepared SHE002. Diﬀerences within has a low percentage of chimeras due to multimers as well the set of these three virtual libraries would reﬂect as high accuracy ampliﬁcation, which allows the ampliﬁed sampling variation. Hence, if the diﬀerences between the sequences to map well to the genome. In addition, the MDA approach and the conventional approach are mapping accuracy (percentage within 100 bp of the signiﬁcant, then the diﬀerences between SHE002, and transcription start site or polyadenylation site) for all SHE004, SHE005 and SHE006 should be much larger known PETs in SHE002 was 92.5% for 50 tags and 91.9% than the diﬀerences between SHE004, SHE005 and for 30 tags, comparable to the mouse ES cell GIS-PET (4), SHE006. The percentages of PET matches to the which showed results of 90.7% for 50 tags and 86.9% for genome, numbers of transcriptional units, as well as 30 tags. Overall, the percentage of PETs with both 50 and 30 mapping accuracies of SHE004, SHE005 and SHE006 are tags that map accurately is 88.4% for the entire library. comparable to that of SHE002, indicating that the MDA- While high, this measure includes mRNAs that have prepared library is of similar quality as that of the alternative splicing and alternative transcription start sites conventionally-prepared library constructed from the and hence represents a lower bound. The 12 888 unique same starting material (Table 1). PETs were collapsed into 3584 transcriptional units. To Next, we checked whether the MDA procedure caused more accurately measure the mapping accuracy of the any biases in the sample. Because MDA is a diﬀerent library, we examined PET sequences from the top 20 most ampliﬁcation method from bacterial ampliﬁcation, we abundant transcriptional units, which are well-annotated. wished to investigate if there was any base bias. Base bias The overall mapping accuracy is 98.5% for the top 20 was measured by calculating the GC percentage of the transcriptional units of SHE002. This high level of library. There is minimal base bias between the MDA mapping accuracy indicates that Selection-MDA method method and the conventional method (Table 1). can accurately capture gene identiﬁcation signatures. Again because MDA is a diﬀerent ampliﬁcation In order to directly compare the performance of the method, we investigated whether there is any bias towards Selection-MDA protocol with the standard protocol, we any category of genes, such as novel genes. We grouped wanted to compare the quality control measures of the the PETs and transcriptional units into ‘known genes’, MDA-prepared GIS-PET library with those of a GIS- ‘gene predictions’, ‘ESTs’ and ‘novel genes’. All libraries PET library (SHE001) prepared by conventional bacterial showed similar distributions, indicating minimal category ampliﬁcation. As the size of the data sampled from library bias (Table 1). SHE001 (the total number of PETs is 613 905) is almost The Selection-MDA step could not have introduced a 50-fold larger than the size sampled from library SHE002 length bias in this particular library, because Selection- (the total number of PETs is 12 888), a direct comparison MDA was performed on single PET clones, which are all of these two libraries will not be meaningful. Therefore, in of a ﬁxed size. Therefore, we could not test whether e32 Nucleic Acids Research, 2008, Vol. 36, No. 5 PAGE 6 OF 7 45 Table 2. Identities of Top 20 most abundant transcriptional units Percentage of all known mRNAs (%) 40 SHE002 (Selection-MDA) (genes) SHE004 (Classic) 35 SHE005 (Classic) 30 Rank SHE002 SHE004 SHE005 SHE006 SHE006 (Classic) (Selection-MDA) (Classic) (Classic) (Classic) 25 20 1 FTL FTL FTL FTL 15 2 GAPDH MIF ENO1 MIF 10 3 MIF ENO1 MIF ENO1 5 4 TPI1 PRDX1 RPL13 LOC388817 5 ENO1 IFITM1 RPS2 RPS2 0 6 LOC388817 C14orf172 TPI1 RPL13 0 500 1000 1500 2000 2500 3000 3500 4000 4500 > 5000 7 RPL13 K-ALPHA-1 LOC388817 H3F3A mRNA span (in 500 bp bins) 8 OAZ1 PGK1 RPL9 PRDX1 9 FTH1 RPL13 K-ALPHA-1 TMSL3 Figure 3. Analysis of length bias between the MDA approach and the 10 TMSL3 PFN1 FTH1 TPI1 bacterial ampliﬁcation approach. We tested for the presence of length 11 H3F3A LOC388817 MDK H2AFZ bias by classifying the mRNA lengths of the best-matching Known 12 IFITM1 RPL18 H2AFZ K-ALPHA-1 Genes, ESTs, or Gene Predictions from each library into 500-bp bins, 13 H2AFZ IFITM3 H3F3A FTH1 which were then plotted on a graph. There is a small length bias. 14 PRDX1 ACTG1 PGK1 PRDX4 Because the length bias is small, it is possible that the apparent bias is 15 C14orf172 PRDX4 RPL18 PFN1 due to sampling variation. 16 PFN1 MDK TMSL3 C14orf172 Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016 17 RPL15 OAZ1 IFITM1 IFITM1 18 TPT1 RPL8 PRDX1 RPL9 Selection-MDA would result in length biases or not. 19 RPL9 RPLP0 C14orf172 RPL18 However, given that MDA was performed on the full- 20 RPL10 STOML2 OAZ1 PGK1 length cDNA library maxiprep to obtain more material for the construction of the single-PET library in the MDA procedure, we reasoned that this step might have GIS-PET library and found that the Selection-MDA introduced a length bias, and hence investigated whether method results in a library with similar content and there was a length bias. We tested for the presence of quality control statistics as compared with a library length bias by investigating the mRNA lengths of the best- constructed from the same starting material that was matching known genes, ESTs or gene predictions, and ampliﬁed with bacteria and harvested through scraping found there was a length bias towards shorter mRNAs on bacterial colonies from solid surface agar. the part of Selection-MDA, but the bias is small Comparing the steps between the MDA version and the (Figure 3). Given that the bias is small, it is possible bacterial propagation method, it is clear that the MDA that the apparent bias could still be the result of sampling version requires much less hands-on labor. In terms of the variation. physical handling, the MDA version uses small scale Next, we reasoned that the contents of the SHE002, 1.5 ml tubes of material whereas the bacterial propagation SHE004, SHE005 and SHE006 libraries should be similar, method uses 10 large Q-trays and many maxiprep because the same starting full-length cDNA library was columns. The approximate times for each step that used for the preparation of the two libraries. Hence, we diﬀered between the two protocols was estimated compared the top 20 most abundant transcriptional units (Figure 2A). Comparing the absolute times required, the of each library with each other. The average number of MDA method requires 4 h less time than the bacterial transcriptional units that are the same between SHE002 propagation method. Considering the fact that many of (the MDA-prepared library) and any randomly selected the time-consuming steps in MDA do not require hands- library from a bacterial propagation library is 13. The on activities and hence allows other projects to be carried average number of transcriptional units that are the same out in parallel, the time requirement of the MDA method between the bacterial propagation libraries is 14, suggest- is much less than the bacterial propagation method. With ing that the agreement between the MDA method and the recent improvements in the MDA method (for example, bacterial ampliﬁcation method is similar to the agreement the Illustra Genomiphi V2 DNA Ampliﬁcation kit from between randomly selected libraries chosen from the same GE Healthcare), further time savings could be possible. bacterial propagation library (Table 2). This analysis thus The concept of performing bacterial selection followed indicates that the contents of the MDA-prepared library by MDA (Selection-MDA) may be used to replace show a good match to those of the conventionally ampliﬁcation steps in complex libraries, and represents a prepared library. substantial improvement to existing cloning-based proto- cols. The Selection-MDA method is an eﬀective and simple method for the unbiased ampliﬁcation of a pool of complex clones, which allows scale-up and elimination of DISCUSSION tedious scraping steps in library-preparation protocols. Taken together, we have shown the method of inserting The method may be readily integrated and applied to plasmids into bacteria for a short selection interval current cloning-based protocols. followed by MDA is a feasible method for the construc- In conclusion, Selection-MDA is a novel method tion of a complex library. We have successfully applied for the ampliﬁcation of cloned libraries consisting of Selection-MDA to the construction of a complex complex DNA. We applied Selection-MDA to a GIS-PET PAGE 7 OF 7 Nucleic Acids Research, 2008, Vol. 36, No. 5 e32 library, an example of a cloned, complex DNA library, to 9. Zhao,X.D., Xu,H., Chew,J.L., Liu,J., Chiu,K.P., Choo,A., illustrate the beneﬁts of Selection-MDA. Library prepara- Orlov,Y.L., Sung,K.W., Shahab,A., Kuznetsov,V.A. et al. (2007) Whole-genome mapping of histone H3 Lys4 and 27 trimethylations tion was made simpler, and diﬀerences between the MDA- reveals distinct genomic compartments in human embryonic stem prepared library and a library prepared by the classic cells. Cell Stem Cell, 1, 286–298. protocol were minimal. Hence, Selection-MDA is an 10. Ng,P., Tan,J.J., Ooi,H.S., Lee,Y.L., Chiu,K.P., Fullwood,M.J., eﬀective and useful improvement to current cloning-based Srinivasan,K.G., Perbost,C., Du,L., Sung,W.K. et al. (2006) Multiplex sequencing of paired-end ditags (MS-PET): a strategy for protocols. the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res., 34, e84. 11. Ruan,Y., Ooi,H.S., Choo,S.W., Chiu,K.P., Zhao,X.D., SUPPLEMENTARY DATA Srinivasan,K.G., Yao,F., Choo,C.Y., Liu,J., Ariyaratne,P. et al. Supplementary Data are available at NAR Online. (2007) Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res., 17, 828–838. ACKNOWLEDGEMENTS 12. Ng,P., Wei,C.L. and Ruan,Y. (2006) Paired-end diTagging for transcriptome and genome analysis. In Ausubel,F.M., Brent,R., The authors gratefully acknowledge Mr H. Thoreau and Kingston,R.E., Moore,D.D., Seidman,J.G., Smith,J.A. and the Genome Technology & Biology Group at the Genome Struhl,K. (eds), Current Protocols in Molecular Biology, 2006, Unit Institute of Singapore for high-throughput sequencing 21.12. John Wiley and Sons, Inc, New York. 13. Chiu,K.P., Wong,C.H., Chen,Q., Ariyaratne,P., Ooi,H.S., Wei,C.L., support. National Institutes of Health (1R01HG003521- Sung,W.K. and Ruan,Y. (2006) PET-Tool: a software suite for 01 to C.L.W and Y.J.R.); Agency for Science, Technology Downloaded from http://nar.oxfordjournals.org/ by guest on February 22, 2016 comprehensive processing and managing of Paired-End diTag and Research (ASTAR grants to C.L.W. and Y.J.R.; (PET) sequence data. BMC Bioinformatics, 7, 390. ASTAR National Science Scholarship to M.J.F.). 14. Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database Funding to pay the Open Access publication charges for of genomes, transcripts and proteins. Nucleic Acids Res., 35, this article was provided by the Agency for Science, D61–D65. Technology and Research. 15. Hsu,F., Kent,W.J., Clawson,H., Kuhn,R.M., Diekhans,M. and Haussler,D. (2006) The UCSC known genes. Bioinformatics, 22, Conﬂict of interest statement. None declared. 1036–1046. 16. Gerhard,D.S., Wagner,L., Feingold,E.A., Shenmen,C.M., Grouse,L.H., Schuler,G., Klein,S.L., Old,S., Rasooly,R., Good,P. REFERENCES et al. (2004) The status, quality, and expansion of the NIH full-length cDNA project: the mammalian gene collection (MGC). 1. Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Genome Res., 14, 2121–2127. Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. 17. Hubbard,T.J., Aken,B.L., Beal,K., Ballester,B., Caccamo,M., (2001) Initial sequencing and analysis of the human genome. Chen,Y., Clarke,L., Coates,G., Cunningham,F., Cutts,T. et al. Nature, 409, 860–921. (2007) Ensembl 2007. Nucleic Acids Res., 35, D610–D617. 2. Waterston,R.H., Lindblad-Toh,K., Birney,E., Rogers,J., Abril,J.F., 18. Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. dbEST—database for ‘expressed sequence tags’. Nat. Genet., 4, et al. (2002) Initial sequencing and comparative analysis of the 332–333. mouse genome. Nature, 420, 520–562. 19. Korf,I., Flicek,P., Duan,D. and Brent,M.R. (2001) Integrating 3. Strausberg,R.L., Feingold,E.A., Klausner,R.D. and Collins,F.S. genomic homology into gene structure prediction. Bioinformatics, 17 (1999) The mammalian gene collection. Science, 286, 455–457. (Suppl 1), S140–S148. 4. Ng,P., Wei,C.L., Sung,W.K., Chiu,K.P., Lipovich,L., Ang,C.C., 20. Guigo,R., Dermitzakis,E.T., Agarwal,P., Ponting,C.P., Parra,G., Gupta,S., Shahab,A., Ridwan,A., Wong,C.H. et al. (2005) Gene Reymond,A., Abril,J.F., Keibler,E., Lyle,R., Ucla,C. et al. (2003) identiﬁcation signature (GIS) analysis for transcriptome Comparison of mouse and human genomes followed by characterization and genome annotation. Nat. Methods, 2, 105–111. experimental veriﬁcation yields an estimated 1,019 additional genes. 5. Wei,C.L., Wu,Q., Vega,V.B., Chiu,K.P., Ng,P., Zhang,T., Proc. Natl Acad. Sci. USA, 100, 1140–1145. Shahab,A., Yong,H.C., Fu,Y., Weng,Z. et al. (2006) A global map 21. Parra,G., Agarwal,P., Abril,J.F., Wiehe,T., Fickett,J.W. and of p53 transcription-factor binding sites in the human genome. Cell, Guigo,R. (2003) Comparative gene prediction in human and mouse. 124, 207–219. Genome Res., 13, 108–117. 6. Esteban,J.A., Salas,M. and Blanco,L. (1993) Fidelity of phi 29 22. Burge,C. and Karlin,S. (1997) Prediction of complete DNA polymerase. Comparison between protein-primed initiation gene structures in human genomic DNA. J. Mol. Biol., 268, 78–94. and DNA polymerization. J. Biol. Chem., 268, 2719–2726. 23. Elbing,K. and Brent,R. (2002) Growth in liquid media. 7. Garmendia,C., Bernad,A., Esteban,J.A., Blanco,L. and Salas,M. In Ausubel,F.M., Brent,R., Kingston,R.E., Moore,D.D., (1992) The bacteriophage phi 29 DNA polymerase, a proofreading Seidman,J.G., Smith,J.A. and Struhl,K. (eds), Current Protocols in enzyme. J. Biol. Chem., 267, 2594–2599. Molecular Biology, 2002, Unit 1.2.1. John Wiley and Sons, Inc, New 8. Blanco,L., Bernad,A., Lazaro,J.M., Martin,G., Garmendia,C. and York. Salas,M. (1989) Highly eﬃcient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode of DNA replication. J. Biol. Chem., 264, 8935–8940.

References (24)

Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860-921.
Waterston,R.H., Lindblad-Toh,K., Birney,E., Rogers,J., Abril,J.F., Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520-562.
Strausberg,R.L., Feingold,E.A., Klausner,R.D. and Collins,F.S. (1999) The mammalian gene collection. Science, 286, 455-457.
Ng,P., Wei,C.L., Sung,W.K., Chiu,K.P., Lipovich,L., Ang,C.C., Gupta,S., Shahab,A., Ridwan,A., Wong,C.H. et al. (2005) Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat. Methods, 2, 105-111.
Wei,C.L., Wu,Q., Vega,V.B., Chiu,K.P., Ng,P., Zhang,T., Shahab,A., Yong,H.C., Fu,Y., Weng,Z. et al. (2006) A global map of p53 transcription-factor binding sites in the human genome. Cell, 124, 207-219.
Esteban,J.A., Salas,M. and Blanco,L. (1993) Fidelity of phi 29 DNA polymerase. Comparison between protein-primed initiation and DNA polymerization. J. Biol. Chem., 268, 2719-2726.
Garmendia,C., Bernad,A., Esteban,J.A., Blanco,L. and Salas,M. (1992) The bacteriophage phi 29 DNA polymerase, a proofreading enzyme. J. Biol. Chem., 267, 2594-2599.
Blanco,L., Bernad,A., Lazaro,J.M., Martin,G., Garmendia,C. and Salas,M. (1989) Highly efficient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode of DNA replication. J. Biol. Chem., 264, 8935-8940.
Zhao,X.D., Xu,H., Chew,J.L., Liu,J., Chiu,K.P., Choo,A., Orlov,Y.L., Sung,K.W., Shahab,A., Kuznetsov,V.A. et al. (2007) Whole-genome mapping of histone H3 Lys4 and 27 trimethylations reveals distinct genomic compartments in human embryonic stem cells. Cell Stem Cell, 1, 286-298.
Ng,P., Tan,J.J., Ooi,H.S., Lee,Y.L., Chiu,K.P., Fullwood,M.J., Srinivasan,K.G., Perbost,C., Du,L., Sung,W.K. et al. (2006) Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res., 34, e84.
Ruan,Y., Ooi,H.S., Choo,S.W., Chiu,K.P., Zhao,X.D., Srinivasan,K.G., Yao,F., Choo,C.Y., Liu,J., Ariyaratne,P. et al. (2007) Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res., 17, 828-838.
Ng,P., Wei,C.L. and Ruan,Y. (2006) Paired-end diTagging for transcriptome and genome analysis. In Ausubel,F.M., Brent,R., Kingston,R.E., Moore,D.D., Seidman,J.G., Smith,J.A. and Struhl,K. (eds), Current Protocols in Molecular Biology, 2006, Unit 21.12. John Wiley and Sons, Inc, New York.
Chiu,K.P., Wong,C.H., Chen,Q., Ariyaratne,P., Ooi,H.S., Wei,C.L., Sung,W.K. and Ruan,Y. (2006) PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data. BMC Bioinformatics, 7, 390.
Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 35, D61-D65.
Hsu,F., Kent,W.J., Clawson,H., Kuhn,R.M., Diekhans,M. and Haussler,D. (2006) The UCSC known genes. Bioinformatics, 22, 1036-1046.
Gerhard,D.S., Wagner,L., Feingold,E.A., Shenmen,C.M., Grouse,L.H., Schuler,G., Klein,S.L., Old,S., Rasooly,R., Good,P. et al. (2004) The status, quality, and expansion of the NIH full-length cDNA project: the mammalian gene collection (MGC).
Genome Res., 14, 2121-2127.
Hubbard,T.J., Aken,B.L., Beal,K., Ballester,B., Caccamo,M., Chen,Y., Clarke,L., Coates,G., Cunningham,F., Cutts,T. et al. (2007) Ensembl 2007. Nucleic Acids Res., 35, D610-D617.
Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) dbEST-database for 'expressed sequence tags'. Nat. Genet., 4, 332-333.
Korf,I., Flicek,P., Duan,D. and Brent,M.R. (2001) Integrating genomic homology into gene structure prediction. Bioinformatics, 17 (Suppl 1), S140-S148.
Guigo,R., Dermitzakis,E.T., Agarwal,P., Ponting,C.P., Parra,G., Reymond,A., Abril,J.F., Keibler,E., Lyle,R., Ucla,C. et al. (2003) Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc. Natl Acad. Sci. USA, 100, 1140-1145.
Parra,G., Agarwal,P., Abril,J.F., Wiehe,T., Fickett,J.W. and Guigo,R. (2003) Comparative gene prediction in human and mouse. Genome Res., 13, 108-117.
Burge,C. and Karlin,S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 78-94.
Elbing,K. and Brent,R. (2002) Growth in liquid media. In Ausubel,F.M., Brent,R., Kingston,R.E., Moore,D.D., Seidman,J.G., Smith,J.A. and Struhl,K. (eds), Current Protocols in Molecular Biology, 2002, Unit 1.2.1. John Wiley and Sons, Inc, New York.

About the author

Jack Tan

Duke-NUS Graduate Medical School, Faculty Member

Papers

Followers

View all papers from Jack Tanarrow_forward

The use of multiple displacement amplification to amplify complex DNA libraries

Sign up for access to the world's latest research

Related papers

References (24)

Related papers

Related topics