Our intention is to illustrate how some impossibility results from voting theory would apply in t... more Our intention is to illustrate how some impossibility results from voting theory would apply in this setting, being possibly applicable to other protein folding problems as well. We consider concepts and results from voting theory and unveil methodological difficulties for the approach mentioned above. With our observations, we intend to highlight how key theoretical barriers, already exposed by economists, can be relevant for the development of new methods, new algorithms, for problems related to protein folding.
The CYRENE Project focuses on the study of cis-regulatory genomics and gene regulatory networks (... more The CYRENE Project focuses on the study of cis-regulatory genomics and gene regulatory networks (GRN) and has three components: a cisGRN-Lexicon, a cisGRN-Browser, and the Virtual Sea Urchin software system. The project has been done in collaboration with Eric Davidson and is deeply inspired by his experimental work in genomic regulatory systems and gene regulatory networks. The current CYRENE cisGRN-Lexicon contains the regulatory architecture of 200 transcription factors encoding genes and 100 other regulatory genes in eight species: human, mouse, fruit fly, sea urchin, nematode, rat, chicken, and zebrafish, with higher priority on the first five species. The only regulatory genes included in the cisGRN-Lexicon (CYRENE genes) are those whose regulatory architecture is validated by what we call the Davidson Criterion: they contain functionally authenticated sites by site-specific mutagenesis, conducted in vivo, and followed by gene transfer and functional test. This is recognized as the most stringent experimental validation criterion to date for such a genomic regulatory architecture. The CYRENE cisGRN-Browser is a full genome browser tailored for cis-regulatory annotation and investigation. It began as a branch of the Celera Genome Browser (available as open source at http:// sourceforge.net/projects/celeragb/) and has been transformed to a genome browser fully devoted to regulatory genomics. Its access paradigm for genomic data is zoom-to-the-DNA-base in real time. A more recent component of the CYRENE project is the Virtual Sea Urchin system (VSU), an interactive visualization tool that provides a four-dimensional (spatial and temporal) map of the gene regulatory networks of the sea urchin embryo.
We focus on the combinatorial analysis of physical mapping with repeated probes. We present compu... more We focus on the combinatorial analysis of physical mapping with repeated probes. We present computational complexity results, and we describe and analyze an algorithmic strategy. We are following the research avenue proposed by Karp [9] on modeling the problem as a combinatorial problem-the Hypergraph Superstring Problem-intimately related to the Lander-Waterman stochastic model [16]. We show that a sparse version of the problem is MAXSNP-complete, a result that carries over to the general case. We show that the minimum Sperner decomposition of a set collection, a problem that is related to the Hypergraph Superstring problem, is NP-complete. Finally we show that the Generalized Hypergraph Superstring Problem is also MAXSNP-hard. We present an efficient algorithm for retrieving the PQ-tree of optimal zero repetition solutions, that provides a constant approximation to the optimal solution on sparse data. We provide experimental results on simulated data.
Images form a rich information source, which remains underutilized in biomedical document classif... more Images form a rich information source, which remains underutilized in biomedical document classification. We present here work that uses both image-and text-based features in order to identify articles of interest, in this case, pertaining to cis-regulatory modules in the context of gene-networks. Extending on our new idea, which we have recently introduced, of using OCR-based features to identify DNA contents in images, we combine image and text based classifiers to categorize documents as relevant or irrelevant to cis-regulatory modules. Using a set of hundreds of articles, marked by experts as relevant or irrelevant to cisregulatory modules, we train/test image and text based classifiers, as well as classifiers integrating both. Our results indicate that the latter show the best performance with Recall, F-measure and Utility measures all above 0.9, demonstrating the significance of incorporating image data, and specifically OCR-based features, into the document categorization process. Moreover, the use of character distribution properties to represent images is directly relevant to other biomedical images containing text (e.g. RNA, proteins). Diagrams and other images containing text are also prevalent outside the biomedical domain, hence the work stands to be applicable and beneficial in other application areas.
Mathematical methods for protein structure analysis and design : C.I.M.E. Summer School, Martina Franca, Italy, July 9-15, 2000 : advanced lectures
Springer eBooks, 2003
... 92 7 Acknowledgements 95 References 96 OPTIMA: A New Score Function for the Detection of Remo... more ... 92 7 Acknowledgements 95 References 96 OPTIMA: A New Score Function for the Detection of Remote Homologs Maricel Kann, Richard A ... a Combination of Many Neural Networks Claus Lundegaard, Thomas Nordahl Petersen, Morten Nielsen, Henrik Bohr, Jacob Bohr, S0ren ...
We present performance-guaranteed approximation algorithms for the protein folding problem in the... more We present performance-guaranteed approximation algorithms for the protein folding problem in the hydrophobichydrophilic model, Dill (1985). To our knowledge, our algorithms arethe first approximation algorithms inthe literature with guaranteed performance for this model, Dill (1994). The hydrophobic-hydrophilic model abstracts the dominant force of protein folding: the hydrophobic interaction. The protein is modeled as a chain of amino acids of length n which are of two types: H (hydrophobic, i.e., nonpolar) and P (hydrophilic, i.e., polar). Although this model is a simplification of more complex protein folding models, the protein folding structure prediction problem is notoriously difficult for this model. Our algorithms have linear (3n\ time and achieve a three-dimensional motein confor-., mation that has a guaranteed free energy w~hin 3/8 of optimal, By achieving speed and near-optimality simultaneously, our algorithms are consistent with the recently proposed framework of protein folding by Sali, Shakhnovich and Karplus (1994). Equally important, the folding pathway and final conformations of our algorithms are biologically plausible. The algorithms define folding pathways that fit within the framework of diffusion-collision protein folding proposed by Karplus and Weaver (1979), and final conformations generated by the algorithms have significant secondary structure (anti-parallel sheets, beta sheets, hydrophobic core). Previous algorithms have employed exhaustive search of protein sequences and conformation for sequences of length 11 or less. For longer sequences (length~3O), previous algorithms have performed random sampling of sequences for which exhaustive search of conformations was performed. Our result answers the open problem of Ngo, Marks and Karplus (1994) about the possible existence of an approximation algorithm for protein structure prediction in any well-studied model of protein folding.
Celera Genomics. His research interests include simulation of biochemical systems, sequence analy... more Celera Genomics. His research interests include simulation of biochemical systems, sequence analysis and data mining, and computational methods for studying genome polymorphisms.
Point matching under non-uniform distortions (T. Akutsu, K. Kanaya, A. Ohyama, A. Fujiyama) Dynam... more Point matching under non-uniform distortions (T. Akutsu, K. Kanaya, A. Ohyama, A. Fujiyama) Dynamic maintenance and visualization of molecular surfaces (C.L. Bajaj, V. Pascucci, A. Shamir, R.J. Holt, A.N. Netravali) Point placement on the line by distance data (P. Damaschke) On the consistency of the minimum evolution principle of phylogenetic inference (F. Denis, O. Gascuel) Algorithm for statistical alignment of two sequences derived from a Poisson sequence length distribution (I. Miklos) Recognizing DNA graphs is difficult (R. Pendavingh, P. Schuurman, G.J. Woeginger) Weighted sequence graphs: boosting iterated dynamic programming using locally suboptimal solutions (B. Schwikowski, M. Vingron) Aligning two fragmented sequences (V. Veeramachaneni, P. Berman, W. Miller) The algorithmics of folding proteins on lattices (V. Chandru, A. DattaSharma, V.S. Anil Kumar) Approximate protein folding in the HP side chain model on extended cubic lattices (V. Heun)
The American oyster Crassostrea virginica, an ecologically and economically important estuarine o... more The American oyster Crassostrea virginica, an ecologically and economically important estuarine organism, can suffer high mortalities in areas in the Northeast United States due to Roseovarius Oyster Disease (ROD), caused by the gram-negative bacterial pathogen Roseovarius crassostreae. The goals of this research were to provide insights into: 1) the responses of American oysters to R. crassostreae, and 2) potential mechanisms of resistance or susceptibility to ROD. The responses of oysters to bacterial challenge were characterized by exposing oysters from ROD-resistant and susceptible families to R. crassostreae, followed by high-throughput sequencing of cDNA samples from various timepoints after disease challenge. Sequence data was assembled into a reference transcriptome and analyzed through differential gene expression and functional enrichment to uncover genes and processes potentially involved in responses to ROD in the American oyster. While susceptible oysters experienced constant levels of mortality when challenged with R. crassostreae, resistant oysters showed levels of mortality similar to non-challenged oysters. Oysters exposed to R. crassostreae showed differential expression of transcripts involved in immune recognition, signaling, protease inhibition, detoxification, and apoptosis. Transcripts involved in metabolism were enriched in susceptible oysters, suggesting that bacterial infection places a large metabolic demand on these oysters. Transcripts differentially expressed in resistant oysters in response to infection included the immune modulators IL-17 and arginase, as well as several genes involved in extracellular matrix remodeling. The identification of potential genes and processes responsible for defense against R. crassostreae in the American oyster provides insights into potential mechanisms of disease resistance.
Research in Computational Molecular Biology: 9th Annual International Conference, RECOMB 2005, Cambridge, MA, USA, May 14-18, 2005, Proceedings (Lecture ... Science / Lecture Notes in Bioinformatics)
Uploads
Papers by Sorin Istrail