Skip to main content
Pierre Baldi

    Pierre Baldi

    QCD-jets at the LHC are described by simple physics principles. We show how super-resolution generative networks can learn the underlying structures and use them to improve the resolution of jet images. We test this approach on massless... more
    QCD-jets at the LHC are described by simple physics principles. We show how super-resolution generative networks can learn the underlying structures and use them to improve the resolution of jet images. We test this approach on massless QCD-jets and on fat top-jets and find that the network reproduces their main features even without training on pure samples. In addition, we show how a slim network architecture can be constructed once we have control of the full network performance.
    A brief idea
    Sherpa is a free open-source hyperparameter optimization library for machine learning models. It is designed for problems with computationally expensive iterative function evaluations, such as the hyperparameter tuning of deep neural... more
    Sherpa is a free open-source hyperparameter optimization library for machine learning models. It is designed for problems with computationally expensive iterative function evaluations, such as the hyperparameter tuning of deep neural networks. With Sherpa, scientists can quickly optimize hyperparameters using a variety of powerful and interchangeable algorithms. Additionally, the framework makes it easy to implement custom algorithms. Sherpa can be run on either a single machine or a cluster via a grid scheduler with minimal configuration. Finally, an interactive dashboard enables users to view the progress of models as they are trained, cancel trials, and explore which hyperparameter combinations are working best. Sherpa empowers machine learning researchers by automating the tedious aspects of model tuning and providing an extensible framework for developing automated hyperparameter-tuning strategies. Its source code and documentation are available at https://github.com/LarsHH/she...
    ABSTRACT Computer-based learning systems enable interactive learning opportunities that are not possible in a traditional teaching setting. We have previously developed Reaction Explorer an interactive tutorial system for organic... more
    ABSTRACT Computer-based learning systems enable interactive learning opportunities that are not possible in a traditional teaching setting. We have previously developed Reaction Explorer an interactive tutorial system for organic chemistry, synthesis, and mechanisms at the college level. The tutorial is powered by an underlying organic chemistry expert system comprising over 1,500 reaction rules, allowing it to generate a virtually infinite collection of problems, and has been used by students at our University for the past three years. The work presented here seeks to develop novel intelligent modules to optimize and personalize student learning trajectories by monitoring each step in a student's progress and learning to propose optimal individualized problems that are at the boundary of a student's knowledge. Specifically, the system is being upgraded with modules for computer-based dynamic assessment and personalized instruction based on concepts from the theory of knowledge spaces.
    Modern therapeutic research is a very time-consuming, complex and costly process which can considerably benefit from the use of statistical machine learning techniques. In particular, using predictive models to quantify the toxicity or... more
    Modern therapeutic research is a very time-consuming, complex and costly process which can considerably benefit from the use of statistical machine learning techniques. In particular, using predictive models to quantify the toxicity or activity of a molecule allows to considerably alleviate the cost of the discovery and development of a new drug. We develop and study structure-based feature representations of small molecules and successfully leverage them to create predictors for several of their chemical, physical and biological properties. We address the prediction of biological activity more in depth by studying virtual high-throughput screening (vHTS), which aims at exploiting a first exploratory biological screen to learn how to rank untested compounds according to their activity against a particular target. More specifically, we present a new algorithm, the Influence Relevance Voter (IRV), particularly tailored to that problem, and show that it is preferable to state-of-the-art methods. One of the most desirable qualities of a vHTS algorithm is its ability to present the most active compounds in the very top ranked molecules. This capacity for what is called “early recognition” allows experimentalists to focus only on a small fraction of the compounds. To properly analyze and compare virtual high-throughput screening algorithms, we develop the concentrated receiving-operator characteristic (CROC) framework, an extension of the ROC framework for the quantitative evaluation, visualization, and optimization of early recognition. Finally we develop machine learning methods for the challenging problem of reaction prediction. Inspired by human chemists, we study elementary reaction steps; in this approach reaction prediction becomes a matter of learning to rank elementary mechanisms by favorability. We do not address this task directly, but rather undertake two necessary preliminary problems. We first develop a large database of elementary mechanisms, annotated with favorability information. We then propose a feature representation of the atoms of a molecule, which we leverage to predict whether or not they belong to a site of reactivity; eventually such a classifier can be used to filter out disfavored elementary reactions.
    Humans perceive light in the visible spectrum (400-700 nm). Some night vision systems use infrared light that is not perceptible to humans and the images rendered are transposed to a digital display presenting a monochromatic image in the... more
    Humans perceive light in the visible spectrum (400-700 nm). Some night vision systems use infrared light that is not perceptible to humans and the images rendered are transposed to a digital display presenting a monochromatic image in the visible spectrum. We sought to develop an imaging algorithm powered by optimized deep learning architectures whereby infrared spectral illumination of a scene could be used to predict a visible spectrum rendering of the scene as if it were perceived by a human with visible spectrum light. This would make it possible to digitally render a visible spectrum scene to humans when they are otherwise in complete “darkness” and only illuminated with infrared light. To achieve this goal, we used a monochromatic camera sensitive to visible and near infrared light to acquire an image dataset of printed images of faces under multispectral illumination spanning standard visible red (604 nm), green (529 nm) and blue (447 nm) as well as infrared wavelengths (718,...
    Motivation Accurately predicting protein secondary structure and relative solvent accessibility is important for the study of protein evolution, structure and an early-stage component of typical protein 3D structure prediction pipelines.... more
    Motivation Accurately predicting protein secondary structure and relative solvent accessibility is important for the study of protein evolution, structure and an early-stage component of typical protein 3D structure prediction pipelines. Results We present a new improved version of the SSpro/ACCpro suite of predictors for the prediction of protein secondary structure (in three and eight classes) and relative solvent accessibility. The changes include improved, TensorFlow-trained, deep learning predictors, a richer set of profile features (232 features per residue position) and sequence-only features (71 features per position), a more recent Protein Data Bank (PDB) snapshot for training, better hyperparameter tuning and improvements made to the HOMOLpro module, which leverages structural information from protein segment homologs in the PDB. The new SSpro 6 outperforms the previous version (SSpro 5) by 3–4% in Q3 accuracy and, when used with HOMOLPRO, reaches accuracy in the 95–100% r...
    The high overlapping nature of various features across multiple mental health disorders suggests the existence of common psychopathology factor(s) (p-factors) that mediate similar phenotypic presentations across distinct but relatable... more
    The high overlapping nature of various features across multiple mental health disorders suggests the existence of common psychopathology factor(s) (p-factors) that mediate similar phenotypic presentations across distinct but relatable disorders. In this perspective, we argue that circadian rhythm disruption (CRD) is a common underlying p-factor that bridges across mental health disorders within their age and sex contexts. We present and analyze evidence from the literature for the critical roles circadian rhythmicity plays in regulating mental, emotional, and behavioral functions throughout the lifespan. A review of the literature shows that coarse CRD, such as sleep disruption, is prevalent in all mental health disorders at the level of etiological and pathophysiological mechanisms and clinical phenotypical manifestations. Finally, we discuss the subtle interplay of CRD with sex in relation to these disorders across different stages of life. Our perspective highlights the need to s...
    We study the effectiveness of theoretically-motivated high-level jet observables in the extreme context of jets with a large number of hard sub-jets (up to N = 8). Previous studies indicate that high-level observables are powerful,... more
    We study the effectiveness of theoretically-motivated high-level jet observables in the extreme context of jets with a large number of hard sub-jets (up to N = 8). Previous studies indicate that high-level observables are powerful, interpretable tools to probe jet substructure for N ≤ 3 hard sub-jets, but that deep neural networks trained on low-level jet constituents match or slightly exceed their performance. We extend this work for up to N = 8 hard sub-jets, using deep particle-flow networks (PFNs) and Transformer based networks to estimate a loose upper bound on the classification performance. A fully-connected neural network operating on a standard set of high-level jet observables, 135 N-subjetiness observables and jet mass, reach classification accuracy of 86.90%, but fall short of the PFN and Transformer models, which reach classification accuracies of 89.19% and 91.27% respectively, suggesting that the constituent networks utilize information not captured by the set of high...
    Colorectal cancer (CRC) is a leading cause of mortality worldwide, and preventive screening modalities such as colonoscopy have been shown to noticeably decrease CRC incidence and mortality. Improving colonoscopy quality remains a... more
    Colorectal cancer (CRC) is a leading cause of mortality worldwide, and preventive screening modalities such as colonoscopy have been shown to noticeably decrease CRC incidence and mortality. Improving colonoscopy quality remains a challenging task due to limiting factors including the training levels of colonoscopists and the variability in polyp sizes, morphologies, and locations. Deep learning methods have led to state-of-the-art systems for the identification of polyps in colonoscopy videos. In this study, we show that deep learning can also be applied to the segmentation of polyps in real time, and the underlying models can be trained using mostly weakly labeled data, in the form of bounding box annotations that do not contain precise contour information. A novel dataset, Polyp-Box-Seg of 4070 colonoscopy images with polyps from over 2000 patients, is collected, and a subset of 1300 images is manually annotated with segmentation masks. A series of models is trained to evaluate v...
    We report a method for the phase reconstruction of an ultrashort laser pulse based on the deep learning of the nonlinear spectral changes induce by self-phase modulation. The neural networks were trained on simulated pulses with random... more
    We report a method for the phase reconstruction of an ultrashort laser pulse based on the deep learning of the nonlinear spectral changes induce by self-phase modulation. The neural networks were trained on simulated pulses with random initial phases and spectra, with pulse durations between 8.5 and 65 fs. The reconstruction is valid with moderate spectral resolution, and is robust to noise. The method was validated on experimental data produced from an ultrafast laser system, where near real-time phase reconstructions were performed. This method can be used in systems with known linear and nonlinear responses, even when the fluence is not known, making this method ideal for difficult to measure beams such as the high energy, large aperture beams produced in petawatt systems.
    Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language... more
    Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language processing and beyond. Here we investigate the fundamental building blocks of attention and their computational properties. Within the standard model of deep learning, we classify all possible fundamental building blocks of attention in terms of their source, target, and computational mechanism. We identify and study three most important mechanisms: additive activation attention, multiplicative output attention (output gating), and multiplicative synaptic attention (synaptic gating). The gating mechanisms correspond to multiplicative extensions of the standard model and are used across all current attention-based deep learning architectures. We study their functional properties and estimate the capacity of several attentional building blocks in the case...
    A simple way to generate a Boolean function is to take the sign of a real polynomial in n variables. Such Boolean functions are called polynomial threshold functions. How many low-degree polynomial threshold functions are there? The... more
    A simple way to generate a Boolean function is to take the sign of a real polynomial in n variables. Such Boolean functions are called polynomial threshold functions. How many low-degree polynomial threshold functions are there? The partial case of this problem for degree d=1 was solved by Zuev in 1989, who showed that the number T(n,1) of linear threshold functions satisfies _2 T(n,1) ≈ n^2, up to smaller order terms. However the number of polynomial threshold functions for any higher degrees, including d=2, has remained open. We settle this problem for all fixed degrees d >1, showing that _2 T(n,d) ≈ n n< d. The solution relies on connections between the theory of Boolean threshold functions, hyperplane arrangements, and random tensors. Perhaps surprisingly, it uses also a recent result of E.Abbe, A.Shpilka, and A.Wigderson on Reed-Muller codes.
    In a physical neural system, backpropagation is faced with a number of obstacles including: the need for labeled data, the violation of the locality learning principle, the need for symmetric connections, and the lack of modularity.... more
    In a physical neural system, backpropagation is faced with a number of obstacles including: the need for labeled data, the violation of the locality learning principle, the need for symmetric connections, and the lack of modularity. Tourbillon is a new architecture that addresses all these limitations. At its core, it consists of a stack of circular autoencoders followed by an output layer. The circular autoencoders are trained in self-supervised mode by recirculation algorithms and the top layer in supervised mode by stochastic gradient descent, with the option of propagating error information through the entire stack using non-symmetric connections. While the Tourbillon architecture is meant primarily to address physical constraints, and not to improve current engineering applications of deep learning, we demonstrate its viability on standard benchmark datasets including MNIST, Fashion MNIST, and CIFAR10. We show that Tourbillon can achieve comparable performance to models trained...
    Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter... more
    Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter settings that perform well across random seeds. In particular, we benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions. For this we consider the Successive Halving, Random Search, and Bayesian Optimization algorithms, the latter two with and without repetitions. We apply these to tuning the PPO2 algorithm on the Cartpole balancing task and the Inverted Pendulum Swing-up task. We demonstrate that pruning may negatively affect the optimization and that repeated sampling does not help in finding hyperparameter settings that perform better across random seeds. From our experiments we conclude that Bayesian optimiz...
    Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests. In domains where agents can choose to take their own action or delegate their action to a central mediator, an open... more
    Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests. In domains where agents can choose to take their own action or delegate their action to a central mediator, an open question is how mediators should take actions on behalf of delegating agents. The main existing approach uses delegating agents to punish non-delegating agents in an attempt to get all agents to delegate, which tends to be costly for all. We introduce a Pareto Mediator which aims to improve outcomes for delegating agents without making any of them worse off. Our experiments in random normal form games, a restaurant recommendation game, and a reinforcement learning sequential social dilemma show that the Pareto Mediator greatly increases social welfare. Also, even when the Pareto Mediator is based on an incorrect model of agent utility, performance gracefully degrades to the pre-intervention level, due to the individual autonomy preserved by the voluntar...
    Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter... more
    Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter settings that perform well across random seeds. In particular, we benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions. For this we consider the Successive Halving, Random Search, and Bayesian Optimization algorithms, the latter two with and without repetitions. We apply these to tuning the PPO2 algorithm on the Cartpole balancing task and the Inverted Pendulum Swing-up task. We demonstrate that pruning may negatively affect the optimization and that repeated sampling does not help in finding hyperparameter settings that perform better across random seeds. From our experiments we conclude that Bayesian optimiz...
    Particle colliders are the primary experimental instruments of high-energy physics. By creating conditions that have not occurred naturally since the Big Bang, collider experiments aim to probe the most fundamental properties of matter... more
    Particle colliders are the primary experimental instruments of high-energy physics. By creating conditions that have not occurred naturally since the Big Bang, collider experiments aim to probe the most fundamental properties of matter and the universe. These costly experiments generate very large amounts of noisy data, creating important challenges and opportunities for machine learning. In this work we use deep learning to greatly improve the statistical power on three benchmark problems involving: (1) Higgs bosons; (2) supersymmetric particles; and (3) Higgs boson decay modes. This approach increases the expected discovery significance over traditional shallow methods, by 50%, 2%, and 11% respectively. In addition, we explore the use of model compression to transfer information (dark knowledge) from deep networks to shallow networks.

    And 667 more