Time Title
Friday, Jan. 22, 2021
02:30PM - 03:30PM
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

Rob Fergus
Research Scientist, DeepMind
Professor of Computer Science, New York University

Abstract
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity. The resulting model, trained on sequences alone, contains information about biological properties in its representations. The learned representation space has a multi-scale organization reflecting structure from the level of biochemical properties of amino acids to remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and can be identified by linear projections.
Friday, Jan. 29, 2021
04:10PM - 05:10PM
Lessons from Multilingual Machine Translation

Rico Sennrich
Professor of Computational Linguistics, University of Zurich

Abstract
Neural models have brought rapid advances to the field of machine translation, and have also opened up new opportunities. One of these is the training of machine translation models in two or more translation directions to transfer knowledge between languages, potentially even allowing for zero-shot translation in directions with no parallel training data. However, multilingual modelling also brings new challenges and questions; how can we represent multiple languages and alphabets with a compact vocabulary of symbols? Does multilingual modelling scale to many languages, and at which point does model capacity become a bottleneck? How can we increase the reliability of zero-shot translation? In this talk, I will discuss recent research and open problems in multilingual machine translation. (slides) (video)
Friday, Feb. 5, 2021
04:10PM - 05:10PM
Machine Learning for Mathematical Reasoning

Christian Szegedy
Staff Research Scientist, Google

Abstract
In this talk I will discuss the application of transformer based language models and graph neural networks on automated reasoning tasks in first-order and higher-order logic. After a short introduction of the type of problems addressed and the general search procedure, we give applications of graph neural networks on premise selection and tactic prediction, Also we demonstrate the power of language models various generative reasoning tasks; type inference, conjecturing, assumption and equation completion. Also we give an overview on graph encodings of formulas and their uses. (video)
Friday, Feb. 12, 2021
04:10PM - 05:10PM
Evolving Machine Learning Algorithms

Esteban Real
Software Engineer, Google Brain

Abstract
The effort devoted to hand-crafting machine learning (ML) models has motivated the use of automated methods. These methods, collectively known as AutoML, can today optimize the models’ architectures to surpass the performance of manual designs. I will discuss how evolutionary techniques can allow AutoML not only to perform such optimization, but also to discover complete ML algorithms from scratch. Using only basic mathematical operations as building blocks, our experiments give rise to ML techniques such as backpropagation, simple neural networks, and weight averaging. This is the case even if derivatives are not provided among the building blocks; gradients simply arise as a consequence of the search process. I will pay special attention to the role of death in allowing evolution to handle noisy measurements. (slides) (video)
Friday, Feb. 19, 2021
04:10PM - 05:10PM
Demystifying deep learning through high-dimensional statistics

Jeffrey Pennington
Research Scientist, Google Brain

Abstract
As deep learning continues to amass ever more practical success, its novelty has slowly faded, but a sense of mystery persists and we still lack satisfying explanations for how and why these models perform so well. Among the various empirical observations that contribute to this sense of mystery is the apparent violation of the so-called bias-variance tradeoff, which specifies that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, striking a balance between simpler models that exhibit high bias and more complex models that exhibit high variance of the predictive function. Far from being unique to deep learning, the violation of this classical tenet of learning theory is in fact commonplace in high-dimensional inference. In this talk, I will describe a high-dimensional asymptotic analysis of random feature kernel regression that allows for a precise understanding of how the bias and variance behave in this simple model. I will then connect this analysis to neural network training through the Neural Tangent Kernel, and describe how a multivariate decomposition of the variance enables a more complete understanding of the rich empirical phenomena observed in practice. (slides) (video)
Wednesday, Feb. 24, 2021
04:00PM - 05:00PM
Joint Physics and Applied Mathematics Colloquium; Evidence-Based Elections

Philip B. Stark
Professor of Statistics, University of California, Berkeley

Abstract
Elections rely on people, hardware, and software, all of which are fallible and subject to manipulation. Well resourced nation-states continue to attack U.S. elections. Voting equipment is built by private vendors–some foreign, but all using foreign parts. Many states even outsource election results reporting to foreign firms. How can we conduct and check elections in a way that provides evidence that the reported winners really won–despite malfunctions and malfeasance? Evidence-based elections require voter-verified (generally, hand-marked) paper ballots kept demonstrably secure throughout the canvass and manual audits of election results against the trustworthy paper trail. Hand-marked paper ballots are far more trustworthy than machine-marked ballots for a variety of reasons. Two kinds of audits are required to provide affirmative evidence that outcomes are correct; compliance audits to establish whether the paper trail is complete and trustworthy, and risk-limiting audits (RLAs). RLAs test the hypothesis that an accurate manual tabulation of the votes would find that one or more reported winners did not win. To reject that hypothesis means there is convincing evidence that a full hand tally would confirm the reported results. For a broad variety of social choice functions, including plurality, multi-winner plurality, supermajority, proportional representation rules such as D’Hondt, Borda count, approval voting, and instant-runoff voting (aka ranked-choice voting), the hypothesis that one or more outcomes is wrong can be reduced to the hypothesis that the means of one or more lists of nonnegative numbers is not greater than 1/2. Martingale methods for testing such nonparametric hypotheses sequentially are especially practical. RLAs are in law in several states and have been piloted in more than a dozen; there have been roughly 60 pilots in jurisdictions of all sizes, including roughly 10 audits of statewide contests. Open-source software to support RLAs is available.
Friday, Feb. 26, 2021
10:00AM - 11:00AM
Learning through Interaction in Cooperative Multi-Agent Systems

Kalesha Bullard
Postdoctoral Researcher, Facebook AI Research

Abstract
Effective communication is an important skill for enabling information exchange and cooperation in multi-agent systems, in which agents coexist in a shared environment with humans and/or other artificial agents. Indeed, human domain experts can be a highly informative source of instructive guidance and feedback (supervision). My prior work explores this type of interaction in depth, as a mechanism for enabling learning for artificial agents. However, dependence upon human partners for acquiring or adapting skills has important limitations. Human time and cognitive load is typically constrained (particularly in realistic settings) and data collection from humans, though potentially qualitatively rich, can be slow and costly to acquire. Yet, the ability to learn through interaction with other agents represents another powerful mechanism for enabling interactive learning. Though other artificial agents may also be novices, agents can co-learn through providing each other evaluative feedback (reinforcement), given the learning task has been sufficiently structured and allows for generalization to novel settings. This talk presents research that investigates methods for enabling agents to learn general communication skills through interactions with other agents. In particular, the talk will focus on my ongoing work within Multi-Agent Reinforcement Learning, investigating emergent communication protocols, inspired by communication in real-world problem settings. We present a novel problem setting and a general approach that allows for zero-shot coordination (ZSC), i.e., discovering protocols that can generalize to independently trained agents. We also explore and analyze specific difficulties associated with finding globally optimal protocols, as complexity of the communication task increases. Overall, this work opens up exciting avenues for learning general communication protocols in complex domains. (video)
Friday, Feb. 26, 2021
04:10PM - 05:10PM
AutoML for Efficient Vision Learning

Mingxing Tan
Staff Software Engineer, Google Brain

Abstract
This talk will focus on a few recent progresses we have made on AutoML, particularly on neural architecture search for efficient convolutional neural networks. We will first discuss the challenges and solutions in designing network architecture search spaces / algorithms / constraints, as well as hyperparamter auto-tuning. Afterwards, we will discuss how to scale neural networks for better accuracy and efficiency. We will conclude the talk with some representative AutoML applications on image classification, detection, segmentation.
Monday, Mar. 1, 2021
04:10PM - 05:10PM
Enabling world models via unsupervised representation learning of environments

Dumitru Erhan
Staff Research Scientist, Google Brain

Abstract
In order to build intelligent agents that quickly adapt to new scenes, conditions, tasks, we need to develop techniques, algorithms and models that can operate on little data or that can generalize from training data that is not similar to the test data. World Models have long been hypothesized to be a key piece in the solution to this problem. In this short talk I will describe our recent advances for modeling sequential observations. These approaches can help with building agents that interact with the environment and mitigate the sample complexity problems in reinforcement learning. They can also enable agents that generalize quicker to new scenarios, tasks, objects and situations and are thus more robust to environment changes. (slides) (video)
Friday, Mar. 5, 2021
04:10PM - 05:10PM
Mathematical aspects of neural network approximation and learning

Joan Bruna
Associate Professor
Courant Institute of Mathematical Sciences
New York University

Abstract
High-dimensional learning remains an outstanding phenomena where experimental evidence outpaces our current mathematical understanding, mostly due to the recent empirical successes of Deep Learning. Neural Networks provide a rich yet intricate class of functions with statistical abilities to break the curse of dimensionality, and where physical priors can be tightly integrated into the architecture to improve sample efficiency. Despite these advantages, an outstanding theoretical challenge in these models is computational, by providing an analysis that explains successful optimization and generalization in the face of existing worst-case computational hardness results. In this talk, we will describe snippets of such challenge, covering respectively optimization and approximation. First, we will focus on the framework that lifts parameter optimization to an appropriate measure space. We will overview existing results that guarantee global convergence of the resulting Wasserstein gradient flows, and present our recent results that study typical fluctuations of the dynamics around their mean field evolution, as well as extensions of this framework beyond vanilla supervised learning to account for symmetries in the function and in competitive optimization. Next, we will discuss the role of depth in terms of approximation, and present novel results establishing so-called ‘depth separation’ for a broad class of functions. We will conclude by discussing consequences in terms of optimization, highlighting current and future mathematical challenges. (slides) (video)
Friday, Mar. 12, 2021
04:10PM - 05:10PM
Joint AWM (Association for Women in Mathematics) and APPM (Applied Mathematics) Department Colloquium; Computing Fluid Flows in Complex Geometry

Marsha Berger
Silver Professor of Computer Science and Mathematics, Courant Institute of Mathematical Sciences, New York University

Abstract
We give an overview of the difficulties in simulating fluid flow in complex geometry. The principal approaches use either oerlapping or patched body-fitted grdis, unstructured grids, or Cartesian (non-body-fitted) grids, with our work focusing on the latter. Cartesian methods have the advantage that no explicit mesh generation is needed, greatly reducing the human effort involved in complex flow computations. However it is a challenge to find stable and accurate difference formulas for the irregular Cartesian cells cut by the boundary. We discuss some of the steps involved in preparing for and carrying out a fluid flow simulation in complicated geometry. We present some of the technical issues involved in this approach, including the special discretizations needed to avoid loss of accuracy and stability at the irregular cells, as well as how we obtain highly scalable parallel performance. This method is in routine use for aerodynamic calculations in several organizations, including NASA Ames Research Center. Several open problems are discussed.
Wednesday, Mar. 17, 2021
04:10PM - 05:10PM
SARS-CoV-2 across scales

Stephen Kissler
Postdoctoral Fellow at Harvard T.H. Chan School of Public Health

Abstract
Mathematical models have provided key insights into SARS-CoV-2 dynamics at the global, local, and physiological scales. I will summarize some of our research efforts within each of these contexts. Early in the pandemic, a deterministic multi-strain coronavirus transmission model revealed how SARS-CoV-2 could behave during its transition from a pandemic to a seasonally epidemic virus and allowed us to compare a variety of intervention strategies. As the racial and ethnic disparities in COVID-19 morbidity and mortality became clear, Bayesian inferential methods helped us demonstrate that poor health outcomes in New York City were associated with high community prevalence in neighborhoods where individuals could not stay home from work. Most recently, models of within-host SARS-CoV-2 viral dynamics have allowed us to evaluate potential mechanisms for the increased infectiousness of the SARS-CoV-2 variant B.1.1.7. I will discuss the mathematical and epidemiological aspects of each of these efforts and discuss how they provide important context for one another. (slides) (video)
Friday, Mar. 19, 2021
04:10PM - 05:10PM
T5 and large language models; The good, the bad, and the ugly

Colin Raffel
Assistant Professor of Computer Science
University of North Carolina, Chapel Hill
Staff Research Scientist, Google Brain

Abstract
T5 and other large pre-trained language models have proven to be a crucial component of the modern natural language processing pipeline. In this talk, I will discuss the good and bad characteristics of these models through the lens of five recent papers. In the first, we empirically survey the field of transfer learning for NLP and scale up our findings to attain state-of-the-art results on many popular benchmarks. Then, I show how we can straightforwardly extend our model to be able to process text in over 100 languages. The strong performance of these models gives rise to a natural question; What kind of knowledge and skills do they pick up during pre-training? I will provide some answers by first showing that they are surprisingly good at answering trivia questions that test basic “world knowledge”, but also demonstrating that they memorize non-trivial amounts of (possibly private) pre-training data, even when no overfitting is evident. Finally, I will wrap up on a sober take on recent progress to improve upon the architectures of language models. (slides) (video)
Friday, Mar. 26, 2021
04:10PM - 05:10PM
Integrating Domain-Knowledge into Deep Learning

Ruslan Salakhutdinov
UPMC professor of Computer Science
Carnegie Mellon University

Abstract
In this talk I will first discuss deep learning models that can find semantically meaningful representations of words, learn to read documents and answer questions about their content. I will introduce methods that can augment neural representation of text with structured data from Knowledge Bases (KBs) for question answering, and show how we can answer complex multi-hop questions using a text corpus as a virtual KB. In the second part of the talk, I will show how we can design modular hierarchical reinforcement learning agents for visual navigation that can perform tasks specified by natural language instructions, perform efficient exploration and long-term planning, learn to build the map of the environment, while generalizing across domains and tasks. (video)
Wednesday, Mar. 31, 2021
04:10PM - 05:10PM
Towards AI for 3D Content Creation

Sanja Fidler
Associate Professor, University of Toronto
Director of AI, NVIDIA

Abstract
3D content is key in several domains such as architecture, film, gaming, and robotics. However, creating 3D content can be very time consuming – the artists need to sculpt high quality 3d assets, compose them into large worlds, and bring these worlds to life by writing behaviour models that “drives” the characters around in the world. In this talk, I’ll discuss some of our recent efforts on introducing automation in the 3D content creation process using A.I.
Friday, Apr. 2, 2021
03:00 PM - 04:00 PM
Ultrasound Image Formation in the Deep Learning Age

Muyinatu A. Lediju Bell
Assistant Professor of Electrical and Computer Engineering, Biomedical Engineering, and Computer Science, Johns Hopkins University

Abstract
The success of diagnostic and interventional medical procedures is deeply rooted in the ability of modern imaging systems to deliver clear and interpretable information. After raw sensor data is received by ultrasound and photoacoustic imaging systems in particular, the beamforming process is often the first line of software defense against poor quality images. Yet, with today’s state-of-the-art beamformers, ultrasound and photoacoustic images remain challenged by channel noise, reflection artifacts, and acoustic clutter, which combine to complicate segmentation tasks and confuse overall image interpretation. These challenges exist because traditional beamforming and image formations steps are based on flawed assumptions in the presence of significant inter- and intrapatient variations. In this talk, I will introduce the PULSE Lab’s novel alternative to beamforming, which improves ultrasound and photoacoustic image quality by learning from the physics of sound wave propagation. We replace traditional beamforming steps with deep neural networks that only display segmented details and structures of interest. Our pioneering image formation methods hold promise for robotic tracking tasks, visualization and visual servoing of surgical tool tips, and assessment of relative distances between the surgical tool and nearby critical structures (e.g., major blood vessels and nerves that if injured will cause severe complications, paralysis, or patient death). (slides) (video)
Friday, Apr. 2, 2021
04:10 PM - 05:10 PM
Image understanding and image-to-image translation through the lens of information loss

Andrey Zhmoginov
Research Software Engineer, Google AI

Abstract
The computation performed by a deep neural network is typically composed of multiple processing stages during which the information contained in the model input gradually “dissipates” as different areas of the input space end up being mapped to the same output values. This seemingly simple observation provides a useful perspective for designing and understanding computation performed by various deep learning models from convolutional networks used in image classification and segmentation to recurrent neural networks and generative models. In this talk, we will review three such examples. First, we discuss the design of the MobileNetV2 model and the properties of the expansion layer that plays the central role in this architecture. In another example, we will look at the CycleGAN model and discuss the unexpected properties that emerge as a result of using “cyclic consistency loss” for training it. Finally, we discuss the information bottleneck approach and show how this formalism can be used to identify salient regions in images. (slides) (video)
Wednesday, Apr. 7, 2021
03:00 PM - 04:00 PM
Volumetric medical image processing with deep learning

Fausto Milletarì
Johnson & Johnson

Abstract
One of the fundamental capabilities of deep learning is its ability of accomplishing a multitude of tasks by learning features directly from raw data instead of relying on a fixed set of features purposefully engineered by humans. Although this has brought both a simplification of the implementations and a great performance increase, most work have been limited to analysis of 2D images. Moving to the realm of 3D or even Nd images, such as those obtained when scanning the structures of the human body, brings undeniable advantages given by the ability of perceiving structures beyond the 2D image plane at the expenses of computational load and memory. When considering the content of the image in more than 2 dimensions it is also possible to improve the learning objectives to achieve a simpler and often more effective formulation. In this talk I will present my recent contributions in the field of medical image analysis with a special focus on techniques applied on signals having 3 or more dimensions. (slides) (video)
Friday, Apr. 9, 2021
04:10 PM - 05:10 PM
GShard; Scaling Giant Models with Conditional Computation and Automatic Sharding

Yanping Huang
Staff Software Engineer, Google Brain

Abstract
Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and computation. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost, ease of programming, and efficient implementation on parallel devices. GShard is a module composed of a set of lightweight annotation APIs and an extension to the XLA compiler. It provides an elegant way to express a wide range of parallel computation patterns with minimal changes to the existing model code. GShard enabled us to scale up multilingual neural machine translation Transformer models with Sparsely-Gated Mixture-of-Experts beyond 600 billion parameters using automatic sharding. We demonstrate that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art. (slides) (video)
Wednesday, Apr. 14, 2021
10:00AM - 11:00AM
AlphaStar; Grandmaster level in StarCraft II using multi-agent reinforcement learning

Oriol Vinyals
Research Scientist, Google DeepMind

Abstract
Games have been used for decades as an important way to test and evaluate the performance of artificial intelligence systems. As capabilities have increased, the research community has sought games with increasing complexity that capture different elements of intelligence required to solve scientific and real-world problems. In recent years, StarCraft, considered to be one of the most challenging Real-Time Strategy (RTS) games and one of the longest-played esports of all time, has emerged by consensus as a “grand challenge” for AI research. In this talk, I will introduce our StarCraft II program AlphaStar, the first Artificial Intelligence to reach Grandmaster status without any game restrictions. The focus will be on the technical contributions which made possible this milestone in AI. (video)
Friday, Apr. 16, 2021
10:00 AM - 11:00 AM
Why medicine is creating exciting new frontiers for machine learning

Mihaela van der Schaar
John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine, University of Cambridge
Turing Fellow, The Alan Turing Institute in London
Chancellor’s Professor, UCLA

Abstract
Medicine stands apart from other areas where machine learning can be applied. While we have seen advances in other fields with lots of data, it is not the volume of data that makes medicine so hard, it is the challenges arising from extracting actionable information from the complexity of the data. It is these challenges that make medicine the most exciting area for anyone who is really interested in the frontiers of machine learning – giving us real-world problems where the solutions are ones that are societally important and which potentially impact on us all. Think Covid 19! In this talk I will show how machine learning is transforming medicine and how medicine is driving new advances in machine learning, including new methodologies in automated machine learning, interpretable and explainable machine learning, dynamic forecasting, and causal inference. (video)
Friday, Apr. 23, 2021
04:10 PM - 05:10 PM
Unsupervised Reinforcement Learning

Pieter Abbeel
Professor of Electrical Engineering and Computer Science
UC Berkeley

Abstract
Deep reinforcement learning (Deep RL) has seen many successes, including learning to play Atari games, the classical game of Go, robotic locomotion and manipulation. However, past successes are ultimately in fairly narrow problem domains compared to the complexity of the real world. In this talk I will describe key limitations of existing deep reinforcement learning methods and discuss how advances in unsupervised representation learning and in unsupervised reinforcement learning could play a key role in solving more complex problems. Ultimately, of course, we need our AI agents to do the things we want them to do, and I’ll discuss recent progress on human-in-the-loop reinforcement learning, which empowers a human supervisor to teach an AI agent new skills without the usual extensive reward engineering or curriculum design efforts. (slides) (video)
Monday, Apr. 26, 2021
04:10 PM - 05:10 PM
Recent advances in speech representation learning

Abdelrahman Mohamed
Research scientist, Facebook AI Research

Abstract
Self-supervised representation learning methods recently achieved great successes in NLP and computer vision domains, reaching new performance levels while reducing required labels for many downstream scenarios. Speech representation learning is experiencing similar progress, with work primarily focused on automatic speech recognition (ASR) as the downstream task. This talk will focus on our recent work on weakly-, semi-, and self-supervised speech representation learning. Learning such high-quality speech representations enabled our research on the Generative Spoken Language Modeling (GSLM) task, where both acoustic and linguistic characteristics of a language are learned directly from raw audio without any lexical or text resources. (slides)
Friday, Apr. 30, 2021
04:10 PM - 05:10 PM
A guided tour of contextual word representations for language understanding

Matthew Peters
Senior Research Scientist
Allen Institute for Artificial Intelligence

Abstract
The last 3-4 years have seen a tremendous increase in the abilities of natural language understanding systems to perform tasks such as text generation, question answering, and information extraction. These gains have been largely been driven by improvements in methods for transfer learning, where a large neural network is pretrained with a huge unannotated text corpus, and then further optimized for a target end task. The optimization objective function used during pretraining encourages the network to encode complex characteristics of word usage and meaning into its word representations. These representations provide a feature vector capturing the meaning of each word in its context in a way that is usable for many end applications. In this talk, I’ll provide a guided tour of these methods. I’ll start with the key ideas behind the Sesame Street models; ELMo (Peters et al 2018), BERT (Devlin et al 2019), and others. Then, we’ll dive into the inner workings of these models by probing and analyzing their internal states and show they learn a surprising amount of linguistic and world knowledge (Peters et al 2018, Liu et al 2019). I’ll also describe an approach to allow sparse access to human curated knowledge (Peters et al 2019), as well as algorithmic improvements to scale them to long text documents (Beltagy et al 2020). Finally, I’ll conclude with a framework and benchmark dataset for moving beyond the current supervised learning approaches, to allow these models to generalize to unseen end tasks without any labeled data (Weller et al 2020). (slides)