JAN-WILLEM VAN DE MEENT

I am an Associate Professor (Universitair Hoofddocent) at the University of Amsterdam, where I co-direct the AMLab with Max Welling. I am also an Assistant Professor (on leave) at Northeastern University, where I continue to advise and collaborate.

My research develops models for artificial intelligence by combining probabilistic programming and deep learning. Our work seeks to understand what inductive biases can enable models to generalize from limited data. These inductive biases can take the form of a simulator that incorporates knowledge of an underlying physical system, causal structure, or symmetries of the underlying domain. We combine model development with research on methods for inference in these models. We also put this work into practice in collaborations with researchers in robotics, NLP, healthcare, and the physical sciences.

The technical backbone in much of our work is probabilistic programming. I am one of the creators of Anglican, a probabilistic language based on Clojure. My group currently develops Probabilistic Torch, a library for deep generative models that extends PyTorch. I am writing a book on probabilistic programming, a draft of which is available on arXiv. I am also a co-chair of the international conference on probabilistic programming (PROBPROG).

News

SEP 2021 ∙ As of 1 September, I will start at the University of Amsterdam and will be on leave from Northeastern University. I will continue to advise current students, but am not recruiting PhD students or postdocs at Northeastern this cycle. I will participate in PhD recruiting through ELLIS, as will other faculty at AMLab. Please contact me at my UvA address with inquiries.

JUL 2021 ∙ I am delighted to have received the NSF CAREER award award for my research on deep learning and probabilistic programming!

JUL 2021 ∙ Robin Walters has received his first grant as PI for his work on representation-theoretic foundations of deep learning under the NSF the Scale MoDL program!

JUN 2021 ∙ The NSF has funded joint work on MDP abstractions with Lawson Wong, Rob Platt, and Robin Walters!

MAY 2021 ∙ Sam and Heiko’s paper on inference combinators will appear at UAI 2021!

MAY 2021 ∙ Hao’s and Babak’s paper on Conjugate Energy-based Models was will appear at ICML 2021!

JAN 2021 ∙ Alican’s and Babak’s paper on rate-regularization and generalization in VAEs will appear at AISTATS 2021.

JAN 2021 ∙ Our group will be presenting 3 extended abstracts at AABI this year [1, 2, 3].

DEC 2020 ∙ Ondrej’s paper on action priors will appear at AAMAS 2021.

contact-photo

Jan-Willem van de Meent

Associate Professor (UHD)
University of Amsterdam
Informatics Institute
Science Park 904, Room C3.259
Postbus 94323, 1090 GH Amsterdam

Assistant Professor (on leave)
Northeastern University
Khoury College of Computer Sciences

Current Students and Postdocs

Robin Walters
Robin Walters
Postdoctoral Fellow
Ondrej Biza
Ondrej Biza
Ph.D. Candidate
Co-advised with
Robert Platt and
Lawson Wong
Babak Esmaeili
Babak Esmaeili
Ph.D. Candidate
Jered McInerney
Jered McInerney
Ph.D. Candidate
Co-advised with
Byron Wallace
Eli Sennesh
Eli Sennesh
Ph.D. Candidate
Co-advised with
Lisa Feldman Barrett and
Karen Quigley
Sam Stites
Sam Stites
Ph.D. Candidate
Hao Wu
Hao Wu
Ph.D. Candidate
Xiongyi Zhang
Xiongyi Zhang
Ph.D. Candidate
Co-advised with
Byron Wallace
Heiko Zimmermann
Heiko Zimmermann
Ph.D. Candidate

Working Papers

vandemeent_ftml_2018
An Introduction to Probabilistic Programming
This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages.

Selected Papers

sennesh_biopsych_2021
Interoception as modeling, allostasis as control
The brain regulates the body by anticipating its needs and attempting to meet them before they arise – a process called allostasis. Allostasis requires a model of the changing sensory conditions within the body, a process called interoception. In this paper, we examine how interoception may provide performance feedback for allostasis. We suggest studying allostasis in terms of control theory, reviewing control theory’s applications to related issues in physiology, motor control, and decision making. We synthesize these by relating them to the important properties of allostatic regulation as a control problem. We then sketch a novel formalism for how the brain might perform allostatic control of the viscera by analogy to skeletomotor control, including a mathematical view on how interoception acts as performance feedback for allostasis. Finally, we suggest ways to test implications of our hypotheses.
zimmermann_aabi_2021
Nested Variational Inference
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.
zhang_arxiv_2021
Disentangling Representations of Text by Masking Transformers
Representations in large language models such as BERT encode a range of features into a single vector. In this work, we explore whether it is possible to learn disentangled representations by identifying subnetworks in pre-trained models that encode distinct, complementary aspects of the representation. Concretely, we learn binary masks over transformer weights or hidden units to uncover the subset of features that correlate with a specific factor of variation. This sidesteps the need to train a disentangled model from scratch within a particular domain. We evaluate the ability of this method to disentangle representations of syntax and semantics, and sentiment from genre in the context of movie reviews.
wu_aabi_2021
Conjugate Energy-Based Models
We propose conjugate energy-based models (CEBMs), a class of deep latent-variable models with a tractable posterior. Conjugate EBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mapping between data and latent variables. However these models omit a generator, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that conjugate EBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-distribution detection on a variety of datasets.
stites_combinators_2021
Learning Proposals for Probabilistic Programs with Inference Combinators
We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernels and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. We demonstrate the flexibility of this framework in applications to advanced variational methods based on amortized Gibbs sampling and annealing.
biza_aamas_2021
Action Priors for Large Action Spaces in Robotics
In robotics, it is often not possible to learn useful policies using pure model-free reinforcement learning without significant reward shaping or curriculum learning. As a consequence, many researchers rely on expert demonstrations to guide learning. However, acquiring expert demonstrations can be expensive. This paper proposes an alternative approach where the solutions of previously solved tasks are used to produce an action prior that can facilitate exploration in future tasks. The action prior is a probability distribution over actions that summarizes the set of policies found solving previous tasks. Our results indicate that this approach can be used to solve robotic manipulation problems that would otherwise be infeasible without expert demonstrations.
bozkurt_arxiv_2019
Rate-regularization and Generalization in VAEs
Variational autoencoders (VAEs) optimize an objective that comprises a reconstruction loss (the distortion) and a KL term (the rate). The rate is an upper bound on the mutual information, which is often interpreted as a regularizer that controls the degree of compression. We here examine whether inclusion of the rate term also improves generalization. We perform rate-distortion analyses in which we control the strength of the rate term, the network capacity, and the difficulty of the generalization problem. Lowering the strength of the rate term paradoxically improves generalization in most settings, and reducing the mutual information typically leads to underfitting. Our results suggest that the standard spherical Gaussian prior is not an inductive bias that typically improves generalization, prompting further work to understand what choices of priors improve generalization in VAEs.
biza_arxiv_2020
Learning discrete state abstractions with deep variational inference
Abstraction is crucial for effective sequential decision making in domains with large state spaces. In this work, we propose a variational information bottleneck method for learning approximate bisimulations, a type of state abstraction. Our method is suited for environments with high-dimensional states and learns from a stream of experience collected by an agent acting in a Markov decision process. Through a learned discrete abstract model, we can efficiently plan for unseen goals in a multi-goal Reinforcement Learning setting. We test our method in simplified robotic manipulation domains with image states. We also compare it against previous model-based approaches to finding bisimulations in discrete grid-world-like environments.
sennesh_arxiv_2019
Neural Topographic Factor Analysis for fMRI Data
Neuroimaging experiments produce a large volume (gigabytes) of high-dimensional spatio-temporal data for a small number of sampled participants and stimuli. To enable the analysis of variation fMRI experiments, we propose Neural Topographic Factor Analysis (NTFA), a deep generative model that parameterizes factors as functions of embeddings for participants and stimuli.
wu_arxiv_2019
Amortized Population Gibbs Samplers with Neural Sufficient Statistics
We develop amortized population Gibbs (APG) samplers, a new class of autoencoding variational methods for deep probabilistic models. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train block proposals to approximate Gibbs conditionals by minimizing an inclusive KL divergence. To ensure that proposals generalize across input datasets that vary in size, we introduce a new parameterization in terms of neural sufficient statistics. Experiments demonstrate that learned proposals converge to the known analytical conditional posterior in conjugate models, and that APG samplers can learn inference networks for highly-structured deep generative models when the conditional posteriors are intractable.
mcinerney_arxiv_2020
Query-Focused EHR Summarization to Aid Imaging Diagnosis
Electronic Health Records (EHRs) provide vital contextual information to radiologists and other physicians when making a diagnosis. Unfortunately, because a given patient's record may contain hundreds of notes and reports, identifying relevant information within these in the short time typically allotted to a case is very difficult. We propose and evaluate Tranformer-based models that extract text snippets from patient records to aid diagnosis. We train these models by using groups of International Classification of Diseases (ICD) codes observed in 'future' records serve as noisy proxies for 'downstream' diagnoses. Evaluationsby radiologists demonstrate that these distantly supervised models yield better extractive summaries than do unsupervised approaches.
esmaeili_arxiv_2018
Structured Disentangled Representations
Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks.
esmaeili_arxiv_2018b
Structured Neural Topic Models for Reviews
We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. VALTA defines a user-item encoder that maps bag-of-words vectors for combined reviews associated with each paired user and item onto structured embeddings, which in turn define per-aspect topic weights. We model individual reviews in a structured manner by infer- ring an aspect assignment for each sentence in a given review, where the per-aspect topic weights obtained by the user-item encoder serve to define a mixture over topics, conditioned on the aspect. The result is an autoencoding neural topic model for reviews, which can be trained in a fully unsupervised manner to learn topics that are structured into aspects.
seaman_arxiv_2018
Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs
As autonomous agents become more ubiquitous, they will eventually have to reason about the mental state of other agents, including those agents' beliefs, desires and goals - so-called theory of mind reasoning. We introduce a collection of increasingly complex theory of mind models of a "chaser" pursuing a "runner", which are implemented as nested probabilistic programs. We show that planning can be performed using nested importance sampling methods, resulting in rational behaviors from both agents, and show that allocating additional computation to perform nested reasoning about agents result in lower-variance estimates of expected utility.
sennesh_arxiv_2018
Composing Modeling and Inference Operations with Probabilistic Program Combinators
We introduce a combinator library for the Probabilistic Torch framework. Combinators are functions that accept models and return transformed models. We assume that models are dynamic, but that model composition is static, in the sense that combinator application takes place prior to evaluating the model on data. Model combinators use classic functional constructs such as map and reduce to define a computation at a coarsened level of representation. Inference combinators alter the evaluation strategy using operations such as importance resampling and application of a transition kernel, whilst preserving proper weighting.
jain_emnlp_2018
Learning Disentangled Representations of Texts with Application to Biomedical Abstracts
We propose a method for learning disentangled representations of texts that code for distinct and complementary aspects, with the aim of affording efficient model transfer and interpretability. To induce disentangled embeddings, we propose an adversarial objective based on the (dis)similarity between triplets of documents with respect to specific aspects. Our motivating application is embedding biomedical abstracts describing clinical trials in a manner that disentangles the populations, interventions, and outcomes in a given trial. We show that our method learns representations that encode these clinically salient aspects, and that these can be effectively used to perform aspect-specific retrieval.
rainforth_arxiv_2018
Inference Trees: Adaptive Inference with Exploration
We introduce inference trees (ITs), a new adaptive Monte Carlo inference method building on ideas from Monte Carlo tree search. Unlike most existing methods which are implicitly based on pure exploitation, ITs explicitly aim to balance exploration and exploitation in the inference process, alleviating common pathologies and ensuring consistency. More specifically, ITs use bandit strategies to adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner.
siddharth_nips_2017
Learning Disentangled Representations with Semi-Supervised Deep Generative Models
We propose to learn disentangled representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables.
rainforth_nips_2016
Bayesian Optimization for Probabilistic Programs
We present the first general purpose framework for marginal maximum a pos- teriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction.
tolpin_fpl_2016
Design and Implementation of Probabilistic Programming Language Anglican
We present the first general purpose framework for marginal maximum a pos- teriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction.
vandemeent_aistats_2016
Black-Box Policy Search with Probabilistic Programs
In this work we show how to represent policies as programs: that is, as stochastic simulators with tunable parameters. To learn the parameters of such policies we develop connections between black box variational inference and existing policy search approaches. We then explain how such learning can be implemented in a probabilistic programming system. We demonstrate both conciseness of policy representation and automatic policy parameter learning for a set of canonical reinforcement learning problems.
vandemeent_aistats_2015
Particle Gibbs with Ancestor Sampling for Probabilistic Programs
Particle Markov chain Monte Carlo techniques rank among current state-of-the-art methods for probabilistic program inference. A drawback of these techniques is that they rely on importance resampling, which results in degenerate particle trajectories and a low effective sample size for variables sampled early in a program. We here develop a for- malism to adapt ancestor resampling, a technique that mitigates particle degeneracy, to the probabilistic programming setting.
vandemeent_aistats_2014
A New Approach to Probabilistic Programming Inference
We demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. It applies to Turing-complete proba- bilistic programming languages and supports accurate inference in models that make use of complex control flow, including stochas- tic recursion. It also includes primitives from Bayesian nonparametric statistics. Our experiments show that this approach can be more e cient than previously introduced single-site Metropolis-Hastings methods.
vandemeent_bpj_2014
Empirical Bayes Methods Enable Advanced Population-Level Analyses of Single-Molecule FRET Experiments
We demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. It applies to Turing-complete proba- bilistic programming languages and supports accurate inference in models that make use of complex control flow, including stochas- tic recursion. It also includes primitives from Bayesian nonparametric statistics. Our experiments show that this approach can be more e cient than previously introduced single-site Metropolis-Hastings methods.