Works at IIT Delhi on computational/mathematical models of learning, information processing and decision-making in complex systems (biological, cognitive, social).
2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
In the last few years, deep neural networks' compression has become an important strand of machin... more In the last few years, deep neural networks' compression has become an important strand of machine learning and computer vision research. Deep models require sizeable computational complexity and storage when used, for instance, for Human Action Recognition (HAR) from videos, making them unsuitable to be deployed on edge devices. In this paper, we address this issue and propose a method to effectively compress Recurrent Neural Networks (RNNs) such as Gated Recurrent Units (GRUs) and Long-Short-Term-Memory Units (LSTMs) that are used for HAR. We use a Variational Information Bottleneck (VIB) theory-based pruning approach to limit the information flow through the sequential cells of RNNs to a small subset. Further, we combine our pruning method with a specific group-lasso regularization technique that significantly improves compression. The proposed techniques reduce model parameters and memory footprint from latent representations, with little or no reduction in the validation accuracy while increasing the inference speed several-fold. We perform experiments on the three widely used Action Recognition datasets, viz. UCF11, HMDB51, and UCF101, to validate our approach. We show that our method achieves over 70 times greater compression than the nearest competitor with comparable accuracy for action recognition on UCF11. * equal contribution els on these devices has become much more intense than ever. However, some essential resources in these devices, such as the storage, computational units, and battery power, are limited, which pose several challenges in incorporating large DNNs under low-cost/resource settings.
For the task of predicting a reference sentence amidst grammatical variants, what is the role of ... more For the task of predicting a reference sentence amidst grammatical variants, what is the role of Uniform Information Density (UID) effects?
Cognitive Factors Influencing Word Order Variation in Hindi Actives and Passives
Forward Surprisal Models Production Planning in Reading Aloud
The main subject and the associated verb in English must agree in grammatical number as per the S... more The main subject and the associated verb in English must agree in grammatical number as per the Subject-Verb Agreement (SVA) phenomenon. It has been found that the presence of a noun between the verb and the main subject, whose grammatical number is opposite to that of the main subject, can cause speakers to produce a verb that agrees with the intervening noun rather than the main noun; the former thus acts as an agreement attractor. Such attractors have also been shown to pose a challenge for RNN models without explicit hierarchical bias to perform well on SVA tasks. Previous work suggests that syntactic cues in the input can aid such models to choose hierarchical rules over linear rules for number agreement. In this work, we investigate the effects of the choice of training data, training algorithm, and architecture on hierarchical generalization. We observe that the models under consideration fail to perform well on sentences with no agreement attractor when trained solely on nat...
According to the UNIFORM INFORMATION DENSITY (UID) hypothesis (Levy and Jaeger, 2007; Jaeger, 201... more According to the UNIFORM INFORMATION DENSITY (UID) hypothesis (Levy and Jaeger, 2007; Jaeger, 2010), speakers tend to distribute information density across the signal uniformly while producing language. The prior works cited above studied syntactic reduction in language production at particular choice points in a sentence. In contrast, we use a variant of the above UID hypothesis in order to investigate the extent to which word order choices in Hindi are influenced by the drive to minimize the variance of information across entire sentences. To this end, we propose multiple lexical and syntactic measures (at both word and constituent levels) to capture the uniform spread of information across a sentence. Subsequently, we incorporate these measures in machine learning models aimed to distinguish between a naturally occurring corpus sentence and its grammatical variants (expressing the same idea). Our results indicate that our UID measures are not a significant factor in predicting th...
Deep extreme multi-label learning (XML) requires training deep architectures that can tag a data ... more Deep extreme multi-label learning (XML) requires training deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. XML applications such as ad and product recommendation involve labels rarely seen during training but which nevertheless hold the key to recommendations that delight users. Effective utilization of label metadata and high quality predictions for rare labels at the scale of millions of labels are thus key challenges in contemporary XML research. To address these, this paper develops the SiameseXML framework based on a novel probabilistic model that naturally motivates a modular approach melding Siamese architectures with high-capacity extreme classifiers, and a training pipeline that effortlessly scales to tasks with 100 million labels. SiameseXML offers predictions 2–13% more accurate than leading XML methods on public benchmark datasets, as well as in live A/B tests on the Bing search engine, it offers sign...
Machine learning [1] is concerned with algorithmically finding patterns and relationships in data... more Machine learning [1] is concerned with algorithmically finding patterns and relationships in data, and using these to perform tasks such as classification and prediction in various domains. We now introduce some relevant terminology and provide an overview of a few sorts of machine learning approaches.
Gene Regulatory Networks (GRNs) hold the key to understanding and solving many problems in biolog... more Gene Regulatory Networks (GRNs) hold the key to understanding and solving many problems in biological sciences, with critical applications in medicine and therapeutics. However, discovering GRNs in the laboratory is a cumbersome and tricky affair, since the number of genes and interactions, say in a mammalian cell, are very large. We aim to discover these GRNs computationally, by using gene expression levels as a “time-series” dataset. We research and employ techniques from probability and information theory, theory of dynamical systems, and graph structure estimation, to establish causal relations between genes, on synthetic datasets. Furthermore, we suggest methods for global estimation of gene networks. Therefore, narrowing the space of genetic interactions to be looked at when discovering these GRNs in the lab.
We investigate the relative impact of two influential theories of language comprehension, viz., D... more We investigate the relative impact of two influential theories of language comprehension, viz., Dependency Locality Theory(Gibson 2000; DLT) and Surprisal Theory (Hale 2001, Levy 2008), on preverbal constituent ordering in Hindi, a predominantly SOV language with flexible word order. Prior work in Hindi has shown that word order scrambling is influenced by information structure constraints in discourse. However, the impact of cognitively grounded factors on Hindi constituent ordering is relatively underexplored. We test the hypothesis that dependency length minimization is a significant predictor of syntactic choice, once information status and surprisal measures (estimated from n-gram i.e., trigram and incremental dependency parsing models) have been added to a machine learning model. Towards this end, we setup a framework to generate meaning-equivalent grammatical variants of Hindi sentences by linearizing preverbal constituents of projective dependency trees in the Hindi-Urdu Tre...
Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with... more Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels and small amount of training data per rare label offer significant statistical and computational challenges. State-of-the-art deep XC methods attempt to remedy this by incorporating textual descriptions of labels but do not adequately address the problem. This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds. Core contributions of ECLARE include a frugal architecture and scalable techniques to train deep models along with label correlation graphs at the scale of millions of labels. In particular, ECLARE offers predictions that are 2-14% more accurate on both publicly available benchmark datasets as well as proprietary datasets for a related products recommendation task sourced from the Bing search engine. Code for ECLARE is available at https://github.com/Extreme-classification/ECLARE CCS CONCEPTS • Computing methodologies → Machine learning; Supervised learning by classification.
Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021
Extreme multi-label classification (XML) involves tagging a data point with its most relevant sub... more Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label metadata such as textual descriptions of the labels. On the other hand, classical techniques that can utilize label metadata via representation learning using deep networks struggle in extreme settings. This paper develops the DECAF algorithm that addresses these challenges by learning models enriched by label metadata that jointly learn model parameters and feature representations using deep networks and offer accurate classification at the scale of millions of labels. DECAF makes specific contributions to model architecture design, initialization, and training, enabling it to offer up to 2-6% more accurate prediction than leading extreme classifiers on publicly available benchmark product-to-product recommendation datasets, such as LF-AmazonTitles-1.3M. At the same time, DECAF was found to be up to 22× faster at inference than leading deep extreme classifiers, which makes it suitable for real-time applications that require predictions within a few milliseconds. The code for DECAF is available at the following URL https://github.com/Extreme-classification/DECAF.
High grade gliomas (HGGs) are infiltrative in nature. Differentiation between vasogenic edema and... more High grade gliomas (HGGs) are infiltrative in nature. Differentiation between vasogenic edema and non-contrast enhancing tumor is difficult as both appear hyperintense in T-W/FLAIR images. Most studies involving differentiation between vasogenic edema and non-enhancing tumor consider radiologist-based tumor delineation as the ground truth. However, analysis by a radiologist can be subjective and there remain both inter- and intra-rater differences. The objective of the current study is to develop a methodology for differentiation between non-enhancing tumor and vasogenic edema in HGG patients based on T perfusion MRI parameters, using a ground truth which is independent of a radiologist's manual delineation of the tumor. This study included 9 HGG patients with pre- and post-surgery MRI data and 9 metastasis patients with pre-surgery MRI data. MRI data included conventional T-W, T-W, and FLAIR images and DCE-MRI dynamic images. In this study, the authors hypothesize that surgerie...
Incomplete immunisation coverage causes preventable illness and death in both developing and deve... more Incomplete immunisation coverage causes preventable illness and death in both developing and developed countries. Identification of factors that might modulate coverage could inform effective immunisation programmes and policies. We constructed a performance indicator that could quantitatively approximate measures of the susceptibility of immunisation programmes to coverage losses, with an aim to identify correlations between trends in vaccine coverage and socioeconomic factors. We undertook a data-driven time-series analysis to examine trends in coverage of diphtheria, tetanus, and pertussis (DTP) vaccination across 190 countries over the past 30 years. We grouped countries into six world regions according to WHO classifications. We used Gaussian process regression to forecast future coverage rates and provide a vaccine performance index: a summary measure of the strength of immunisation coverage in a country. Overall vaccine coverage increased in all six world regions between 1980...
The identification of transition models of biological systems (Petri Net models, for example) in ... more The identification of transition models of biological systems (Petri Net models, for example) in noisy environments has not been examined to any significant extent, although they have been used to model the ideal behaviour of metabolic, signalling and genetic networks. Progress has been made in identifying such models from sequences of qualitative states of the system; and, more recently, with additional logical constraints as background knowledge. Both forms of model identification assume the data are correct, which is often unrealistic since biological systems are inherently stochastic. In this paper, we model the transition noise that can affect model identification as a Markov process where the corresponding transition functions are assumed to be known. We investigate, in the presence of this transition noise, the identification of transitions in a target model. The experiments are reconstructions of known networks from simulated data with varying amounts of transition-noise added. In each case, the target model traces a specific trajectory through the state-space. Model structures that explain the noisy state-sequences are obtained based on recent work which formulates the identification of transition models as logical consequence-finding. With noisy data, we need to extend this formulation by allowing the abduction of new transitions. The resulting structures may be both incorrect and incomplete with respect to the target model. We quantify the ability to identify the transitions in the target model, using probability estimates computed from transition-sequences using PRISM. Empirical results suggest that we are able to identify correctly the transitions in the target model with transition noise levels ranging from low to high values.
Loops are irregular structures which connect two secondary structure elements in proteins. They o... more Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop's end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between the anchors of a loop does not increase with an increase in the number of loop residues. Loop span is also unaffected by the secondary structures at the end points, unless the two anchors are part of an anti-parallel beta sheet. As loop span appears to be independent of global properties of the protein we suggest that its distribution can be described by a random fluctuation model based on the Maxwell-Boltzmann distribution. It is believed that the primary difficulty in protein loop structure prediction comes from the number of residues in the loop. Following the idea that loop span is an independent local property, we investigate its effect on protein loop structure prediction and show how normalised span (loop stretch) is related to the structural complexity of loops. Highly contracted loops are more difficult to predict than stretched loops.
Primary thanks go to my supervisors, Nick Jones, Charlotte Deane, and Mason Porter, whose ideas a... more Primary thanks go to my supervisors, Nick Jones, Charlotte Deane, and Mason Porter, whose ideas and guidance have of course played a major role in shaping this thesis. I would also like to acknowledge the very useful suggestions of my examiners, Mark Fricker and Jukka-Pekka Onnela, which have helped improve this work. I am very grateful to all the members of the three Oxford groups I have had the fortune to be associated with: Systems and Signals, Protein Informatics, and the Systems Biology Doctoral Training Centre. Their companionship has served greatly to educate and motivate me during the course of my time in Oxford. In particular, Anna Lewis and Ben Fulcher, both working on closely related D.Phil. projects, have been invaluable throughout, and have assisted and inspired my work in many different ways. Gabriel Villar and Samuel Johnson have been collaborators and co-authors who have helped me to develop some of the ideas and methods used here. There are several other people who have generously provided data, code, or information that has been directly useful for
Node and link roles in protein-protein interaction networks
Abstract A key question in modern biology is how the complexity of protein-protein interaction ne... more Abstract A key question in modern biology is how the complexity of protein-protein interaction net-works relates to biological functionality. One way of understanding the set of proteins and their interactions (the interactome) is to look at them as a network of nodes connected by ...
Our study investigates the impact of linguistic complexity and planning on word durations in Hind... more Our study investigates the impact of linguistic complexity and planning on word durations in Hindi read aloud speech. Reading aloud involves both comprehension and production processes, and we use measures defined by two influential theories of sentence comprehension, Surprisal Theory and Dependency Locality Theory, to model the time taken to enunciate individual words. We model planning processes using an information-theoretic measure we call FORWARD SURPRISAL, inspired by surprisal theory which has been prominent in recent psycholinguistic work. Forward surprisal aims to capture articulatory planning when readers incorporate parafoveal viewing during reading aloud. Using a Linear Mixed Model containing memory and surprisal costs as predictors of word duration in read aloud speech (parts-ofspeech and speakers being intercept terms), we investigate the following hypotheses: 1. High values of linguistic complexity measures (lex-ical+PCFG surprisal and DLT memory costs) lead to high word durations. 2. High values of forward lexical surprisal tend to induce high word durations. 3. High-frequency words are read aloud faster than low-frequency words. We validate the above hypotheses using data from the TDIL corpus of read aloud speech. Further, using a Generalized Linear Model to predict content and function word labels we show that lexical surprisal measures do not help distinguish between these 2 classes. Thus reading aloud might not involve distinct access strategies for content and function words, unlike spontaneous speech.
Referee report. For: Stewarding antibiotic stewardship in intensive care units with Bayesian artificial intelligence [version 1; referees: 1 approved with reservations]
Uploads
Papers by Sumeet Agarwal