Papers by Jayson Harshbarger
Connectome: V0.1.0
connectome

BMC Genomics, 2017
Background: Differential gene expression (DGE) analysis is a technique to identify statistically ... more Background: Differential gene expression (DGE) analysis is a technique to identify statistically significant differences in RNA abundance for genes or arbitrary features between different biological states. The result of a DGE test is typically further analyzed using statistical software, spreadsheets or custom ad hoc algorithms. We identified a need for a web-based system to share DGE statistical test results, and locate and identify genes in DGE statistical test results with a very low barrier of entry. Results: We have developed DEIVA, a free and open source, browser-based single page application (SPA) with a strong emphasis on being user friendly that enables locating and identifying single or multiple genes in an immediate, interactive, and intuitive manner. By design, DEIVA scales with very large numbers of users and datasets. Conclusions: Compared to existing software, DEIVA offers a unique combination of design decisions that enable inspection and analysis of DGE statistical test results with an emphasis on ease of use.
FANTOM5 web resource for the large-scale genome-wide transcription start site activity profiles of wide-range of mammalian cells
Human Genomics, 2016

Database, 2016
The Functional Annotation of the Mammalian Genome project (FANTOM5) mapped transcription start si... more The Functional Annotation of the Mammalian Genome project (FANTOM5) mapped transcription start sites (TSSs) and measured their activities in a diverse range of biological samples. The FANTOM5 project generated a large data set; including detailed information about the profiled samples, the uncovered TSSs at high base-pair resolution on the genome, their transcriptional initiation activities, and further information of transcriptional regulation. Data sets to explore transcriptome in individual cellular states encoded in the mammalian genomes have been enriched by a series of additional analysis, based on the raw experimental data, along with the progress of the research activities. To make the heterogeneous data set accessible and useful for investigators, we developed a web-based database called Semantic catalog of Samples, Transcription initiation And Regulators (SSTAR). SSTAR utilizes the open source wiki software MediaWiki along with the Semantic MediaWiki (SMW) extension, which provides flexibility to model, store, and display a series of data sets produced during the course of the FANTOM5 project. Our use of SMW demonstrates the utility of the framework for dissemination of large-scale analysis

Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells
Science, 2015
Although it is generally accepted that cellular differentiation requires changes to transcription... more Although it is generally accepted that cellular differentiation requires changes to transcriptional networks, dynamic regulation of promoters and enhancers at specific sets of genes has not been previously studied en masse. Exploiting the fact that active promoters and enhancers are transcribed, we simultaneously measured their activity in 19 human and 14 mouse time courses covering a wide range of cell types and biological stimuli. Enhancer RNAs, then messenger RNAs encoding transcription factors, dominated the earliest responses. Binding sites for key lineage transcription factors were simultaneously overrepresented in enhancers and promoters active in each cellular system. Our data support a highly generalizable model in which enhancer transcription is the earliest event in successive waves of transcriptional change during cellular differentiation or activation.
Corrigendum: Functional annotation of human long noncoding RNAs via molecular phenotyping
Genome Research
Long non-coding RNAs (lncRNAs) constitute the majority of transcripts in mammalian genomes and ye... more Long non-coding RNAs (lncRNAs) constitute the majority of transcripts in mammalian genomes and yet, their functions remain largely unknown. We systematically suppressed 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). The resulting transcriptomic profiles recapitulated the observed cellular phenotypes, yielding specific roles for over 40% of analyzed lncRNAs in regulating distinct biological pathways, transcriptional machinery, alternative promoter activity and architecture usage. Overall, combining cellular and molecular profiling provided a powerful approach to unravel the distinct functions of lncRNAs, which we highlight with specific functional roles for ZNF213-AS1 and lnc-KHDC3L-2.

Scientific data, Nov 28, 2017
The promoter landscape of several non-human model organisms is far from complete. As a part of FA... more The promoter landscape of several non-human model organisms is far from complete. As a part of FANTOM5 data collection, we generated 13 profiles of transcription initiation activities in dog and rat aortic smooth muscle cells, mesenchymal stem cells and hepatocytes by employing CAGE (Cap Analysis of Gene Expression) technology combined with single molecule sequencing. Our analyses show that the CAGE profiles recapitulate known transcription start sites (TSSs) consistently, in addition to uncover novel TSSs. Our dataset can be thus used with high confidence to support gene annotation in dog and rat species. We identified 28,497 and 23,147 CAGE peaks, or promoter regions, for rat and dog respectively, and associated them to known genes. This approach could be seen as a standard method for improvement of existing gene models, as well as discovery of novel genes. Given that the FANTOM5 data collection includes dog and rat matched cell types in human and mouse as well, this data would al...

FANTOM enters 20th year: expansion of transcriptomic atlases and functional annotation of non-coding RNAs
Nucleic Acids Research
The Functional ANnoTation Of the Mammalian genome (FANTOM) Consortium has continued to provide ex... more The Functional ANnoTation Of the Mammalian genome (FANTOM) Consortium has continued to provide extensive resources in the pursuit of understanding the transcriptome, and transcriptional regulation, of mammalian genomes for the last 20 years. To share these resources with the research community, the FANTOM web-interfaces and databases are being regularly updated, enhanced and expanded with new data types. In recent years, the FANTOM Consortium's efforts have been mainly focused on creating new non-coding RNA datasets and resources. The existing FANTOM5 human and mouse miRNA atlas was supplemented with rat, dog, and chicken datasets. The sixth (latest) edition of the FANTOM project was launched to assess the function of human long non-coding RNAs (lncRNAs). From its creation until 2020, FANTOM6 has contributed to the research community a large dataset generated from the knock-down of 285 lncRNAs in human dermal fibroblasts; this is followed with extensive expression profiling and ...

A draft network of ligand–receptor-mediated multicellular signalling in human
Nature Communications, 2015
Cell-to-cell communication across multiple cell types and tissues strictly governs proper functio... more Cell-to-cell communication across multiple cell types and tissues strictly governs proper functioning of metazoans and extensively relies on interactions between secreted ligands and cell-surface receptors. Herein, we present the first large-scale map of cell-to-cell communication between 144 human primary cell types. We reveal that most cells express tens to hundreds of ligands and receptors to create a highly connected signalling network through multiple ligand-receptor paths. We also observe extensive autocrine signalling with approximately two-thirds of partners possibly interacting on the same cell type. We find that plasma membrane and secreted proteins have the highest cell-type specificity, they are evolutionarily younger than intracellular proteins, and that most receptors had evolved before their ligands. We provide an online tool to interactively query and visualize our networks and demonstrate how this tool can reveal novel cell-to-cell interactions with the prediction that mast cells signal to monoblastic lineages via the CSF1-CSF1R interacting pair.

Nucleic acids research, Jan 20, 2015
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified ... more The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one m...
Genome Biology, 2015
The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and... more The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom. gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

Interactive visualization and analysis of large-scale sequencing datasets using ZENBU
Nature Biotechnology, 2014
ABSTRACT ZENBU ( fantom.gsc.riken.jp/zenbu ) is a data integration, data processing, and visualiz... more ABSTRACT ZENBU ( fantom.gsc.riken.jp/zenbu ) is a data integration, data processing, and visualization system based around three main web interfaces : an expression data enhanced genome browser interface, a secured user system for data upload and secured data sharing, and a data explorer interface to find and manipulate data across the many supported experimental data types and to find shared user configurations One of the key differences is that ZENBU allows for novel data exploration through data integration and "on-demand" data processing within the system. This means that more raw or unprocessed data can be loaded into the ZENBU system, and then ZENBU can perform many of the basic data manipulations that previously required bioinformatics experts with knowledge of the unix command line and a collection of bioinformatics tools. In ZENBU, the data is not a static picture, but instead it is a living melting pot where scientists and can explore and discover. Have a look at our case studies to see powerful examples of the ZENBU data processing and visualization capabilities. Another key concept in ZENBU is that of data-pooling from multiple data sources into a single merged Track. It is becoming much easier to do many experiments within a study. The simple process of managing different experimental combinations into different visualization tracks is becoming unmanagable. Data-pooling allows one to easily compare experimental expression within a series of related experiments that would previously require bioinformaticians to externally process each group analysis and upload each as different precalculated visualization tracks. With ZENBU, the data can be loaded independently and the system can perform the pooling and group analysis. Because the system performs the pooling/group operations, the data can be interactively explored via region selection within a pooled track or through filtering of experiments within the pool. These realtime interactions within a pooled data track are immediately reflected in both the expression profile visualization and in the "experimental expression bar graph". ZENBU also provides a platform for scientific data social-networking through a secured user environment for data upload and controlled data sharing within user managed collaborations. Collaborations and data sharing are managed in a facebook style of "friend requests" providing users with the flexibility to create and manage their own collaborations without needing central adminstrators. ZENBU also provides guest access to view published, public data and without any data upload functions. User profiles are available to anyone and are managed through OpenID cooperation with major sites like google, yahoo, mixi, genomespace.org and many others. websites : http://fantom.gsc.riken.jp/zenbu/ documentation/wiki : http://fantom.gsc.riken.jp/zenbu/wiki/index.php/Main_Page
A promoter-level mammalian expression atlas
Nature, 2014
Regulated transcription controls the diversity, developmental pathways and spatial organization o... more Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly…
Uploads
Papers by Jayson Harshbarger