Papers by Jonathan Warrell

bioRxiv (Cold Spring Harbor Laboratory), Jun 3, 2024
Motivation: Spatial transcriptomics technologies, which generate a spatial map of gene activity, ... more Motivation: Spatial transcriptomics technologies, which generate a spatial map of gene activity, can deepen the understanding of tissue architecture and its molecular underpinnings in health and disease. However, the high cost makes these technologies difficult to use in practice. Histological images co-registered with targeted tissues are more affordable and routinely generated in many research and clinical studies. Hence, predicting spatial gene expression from the morphological clues embedded in tissue histological images, provides a scalable alternative approach to decoding tissue complexity. Results: Here, we present a graph neural network based framework to predict the spatial expression of highly expressed genes from tissue histological images. Extensive experiments on two separate breast cancer data cohorts demonstrate that our method improves the prediction performance compared to the state-of-the-art, and that our model can be used to better delineate spatial domains of biological interest.
Hybrid Quantum-Classical Stochastic Networks with Boltzmann Layers
Nature Communications, Jul 29, 2020
Biology and Philosophy, Sep 15, 2020

A new tool for technical standardization of the Ki67 immunohistochemical assay
Modern Pathology, 2021
Ki67, a nuclear proliferation-related protein, is heavily used in anatomic pathology but has not ... more Ki67, a nuclear proliferation-related protein, is heavily used in anatomic pathology but has not become a companion diagnostic or a standard-of-care biomarker due to analytic variability in both assay protocols and interpretation. The International Ki67 Working Group in breast cancer has published and has ongoing efforts in the standardization of the interpretation of Ki67, but they have not yet assessed technical issues of assay production representing multiple sources of variation, including antibody clones, antibody formats, staining platforms, and operators. The goal of this work is to address these issues with a new standardization tool. We have developed a cell line microarray system in which mixes of human Karpas 299 or Jurkat cells (Ki67+) with Sf9 (Spodoptera frugiperda) (Ki67-) cells are present in incremental standardized ratios. To validate the tool, six different antibodies, including both ready-to-use and concentrate formats from six vendors, were used to measure Ki67 proliferation indices using IHC protocols for manual (bench-top) and automated platforms. The assays were performed by three different laboratories at Yale and analyzed using two image analysis software packages, including QuPath and Visiopharm. Results showed statistically significant differences in Ki67 reactivity between each antibody clone. However, subsets of Ki67 assays using three clones performed in three different labs show no significant differences. This work shows the need for analytic standardization of the Ki67 assay and provides a new tool to do so. We show here how a cell line standardization system can be used to normalize the staining variability in proliferation indices between different antibody clones in a triple negative breast cancer cohort. We believe that this cell line standardization array has the potential to improve reproducibility among Ki67 assays and laboratories, which is critical for establishing Ki67 as a standard-of-care assay.

Covid-19 has resulted in the death of more than 1,500,000 individuals. Due to the pandemic's ... more Covid-19 has resulted in the death of more than 1,500,000 individuals. Due to the pandemic's severity, thousands of genomes have been sequenced and publicly stored with extensive records, an unprecedented amount of data for an outbreak in a single year. Simultaneously, prediction models offered region-specific and often contradicting results, while states or countries implemented mitigation strategies with little information on success, precision, or agreement with neighboring regions. Even though viral transmissions have been already documented in a historical and geographical context, few studies aimed to model geographic and temporal flow from viral sequence information. Here, using a case study of 7 states, we model the flow of the Covid-19 outbreak with respect to phylogenetic information, viral migration, inter- and intra-regional connectivity, epidemiologic and demographic characteristics. By assessing regional connectivity from genomic variants, we can significantly impr...
We report the integrative analysis of more than 2,600 whole cancer genomes and their matching nor... more We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient's tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.

Nature, 2020
The discovery of drivers of cancer has traditionally focused on protein-coding genes1–4. Here we ... more The discovery of drivers of cancer has traditionally focused on protein-coding genes1–4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5′ region of TP53, in the 3′ untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangeme...

While many quantum computing (QC) methods promise theoretical advantages over classical counterpa... more While many quantum computing (QC) methods promise theoretical advantages over classical counterparts, quantum hardware remains limited. Exploiting near-term QC in computer-aided drug design (CADD) thus requires judicious partitioning between classical and quantum calculations. We present HypaCADD, a hybrid classical-quantum workflow for finding ligands binding to proteins, while accounting for genetic mutations. We explicitly identify modules of our drug design workflow currently amenable to replacement by QC: non-intuitively, we identify the mutation-impact predictor as the best candidate. HypaCADD thus combines classical docking and molecular dynamics with quantum machine learning (QML) to infer the impact of mutations. We present a case study with the SARS-CoV-2 protease and associated mutants. We map a classical machine-learning module onto QC, using a neural network constructed from qubit-rotation gates. We have implemented this in simulation and on two commercial quantum compu...
We introduce approaches to simplifying neural networks and enhancing their interpretability using... more We introduce approaches to simplifying neural networks and enhancing their interpretability using activation-based neuron tuning and personalized weight matrix products. Inspired by the evolutionary principle of the survival of the fittest, we gradually remove neurons with little to no learning efficacy during training and hypothesize that their absence renders opaque models more interpretable. Experimental results pertaining to cancer and diabetes treatment appear to favor our hypothesis and generate more biomedically salient results. Our approaches also allow for interpretations at the sample level, a feature of particular importance in relation to personalized medicine.
We introduceProbabilistic Dependent Type Systems (PDTS) via a functional language based on a subs... more We introduceProbabilistic Dependent Type Systems (PDTS) via a functional language based on a subsystem of intuitionistic type theory including depende nt sums and products, which is expanded to include stochastic functions. We provide a sampling-based semanti cs for the language based on non-deterministic beta reduction. Further, we derive a probabilistic logic from th e PDTS introduced as a direct result of the CurryHoward isomorphism. The probabilistic logic derived is sho wn to provide a universal representation for finite discrete distributions.
Uploads
Papers by Jonathan Warrell