I am a Research Scientist at DeepMind, part of the Deep Learning team. In March 2021, I completed my PhD at the Department of Computer Science and Technology, University of Cambridge, where I was supervised by Prof Pietro Liò and a member of King’s College. My thesis received no corrections during the viva and is available here.
My professional experience includes undergraduate Software Engineering internships at Google and Facebook. During the PhD, I was a Research Intern at Mila, an AI Resident at X, an ML consultant for Relation Therapeutics and a Research Scientist intern at DeepMind, hosted by Piotr Mirowski in the Robotics, Embodied Agents and Lifelong learning team. I have also been active in the research community, having most recently co-organised the ViGIL workshop at NAACL 2021.
My great passion for teaching pushed me towards supervising undergraduate courses, final year projects and Master’s research projects (+200h and ~60 students), interviewing CS applicants, chairing women@CL and introducing professionals to Machine Learning concepts as a Cambridge Spark Teaching Fellow. I also helped prepare and deliver Master’s courses, both on the practical (L42 NN practicals in 2018, 2019) and theoretical side (R250 GNN seminars in 2020 and 2021).
Outside work, I love rowing, travelling, playing the piano/guitar/singing in a rock band and chasing my favourite bands on tour. 🎼 I’ve recently revived an old habit of writing poetry and taken up cycling!
PhD in Machine Learning, 2021
University of Cambridge
MPhil in Advanced Computer Science, 2017
University of Cambridge
BA in Computer Science, 2016
University of Cambridge
An essential aim of artificial intelligence research is to design agents that will eventually cooperate with humans within the real world. To this end, embodied learning is emerging as one of the most important efforts contributed by the machine learning community towards this goal. Recently developing sub-fields concern various aspects of such systems—visual reasoning, language representations, causal mechanisms, robustness to out-of-distribution inputs, to name only a few. In particular, multimodal learning and language grounding are vital to achieving a strong understanding of the real world. Humans build internal representations via interacting with their environment, learning complex associations between visual, auditory and linguistic concepts. Since the world abounds with structure, graph-based encodings are also likely to be incorporated in reasoning and decision-making modules. Furthermore, these relational representations are rather symbolic in nature—providing advantages over other formats, such as raw pixels—and can encode various types of links (temporal, causal, spatial) which can be essential for understanding and acting in the real world. This thesis presents three research works that study and develop likely aspects of future intelligent agents. The first contribution centers on vision-and-language learning, introducing a challenging embodied task that shifts the focus of an existing one to the visual reasoning problem. By extending popular visual question answering (VQA) paradigms, I also designed several models that were evaluated on the novel dataset. This produced initial performance estimates for environment understanding, through the lens of a more challenging VQA downstream task. The second work presents two ways of obtaining hierarchical representations of graph-structured data. These methods either scaled to much larger graphs than the ones processed by the best-performing method at the time, or incorporated theoretical properties via the use of topological data analysis algorithms. Both approaches competed with contemporary state-of-the-art graph classification methods, even outside social domains in the second case, where the inductive bias was PageRank-driven. Finally, the third contribution delves further into relational learning, presenting a probabilistic treatment of graph representations in complex settings such as few-shot, multi-task learning and scarce-labelled data regimes. By adding relational inductive biases to neural processes, the resulting framework can model an entire distribution of functions which generate datasets with structure. This yielded significant performance gains, especially in the aforementioned complex scenarios, with semantically-accurate uncertainty estimates that drastically improved over the neural process baseline. This type of framework may eventually contribute to developing lifelong-learning systems, due to its ability to adapt to novel tasks and distributions. The benchmark, methods and frameworks that I have devised during my doctoral studies suggest important future directions for embodied and graph representation learning research. These areas have increasingly proved their relevance to designing intelligent and collaborative agents, which we may interact with in the near future. By addressing several challenges in this problem space, my contributions therefore take a few steps towards building machine learning systems to be deployed in real-life settings.
Neural Processes (NPs) are powerful and flexible models able to incorporate uncertainty when representing stochastic processes, while maintaining a linear time complexity. However, NPs produce a latent description by aggregating independent representations of context points and lack the ability to exploit relational information present in many datasets. This renders NPs ineffective in settings where the stochastic process is primarily governed by neighbourhood rules, such as cellular automata (CA), and limits performance for any task where relational information remains unused. We address this shortcoming by introducing Message Passing Neural Processes (MPNPs), the first class of NPs that explicitly makes use of relational structure within the model. Our evaluation shows that MPNPs thrive at lower sampling rates, on existing benchmarks and newly-proposed CA and Cora-Branched tasks. We further report strong generalisation over density-based CA rule-sets and significant gains in challenging arbitrary-labelling and few-shot learning setups.
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on our benchmark. This establishes an initial understanding of how well VQA-style methods can perform within the novel EQA paradigm.
Master’s research projects: Structure-aware Generation of Molecules in Protein Pockets (Pavol Drotar, 2020-21) (92⁄100), Machine Unlearning (Mukul Rathi, 2020-21) (91⁄100), Goal-Conditioned Reinforcement Learning in the Presence of an Adversary (Carlos Purves, 2019-20) (87⁄100), Representation Learning for Spatio-Temporal Graphs (Felix Opolka, 2018-19) (85⁄100) (presented at ICLR RLGM), Dynamic Temporal Analysis for Graph Structured Data (Aaron Solomon, 2018-19) (presented at ICLR RLGM)
Computer Science Tripos Part II projects: Benchmarking Graph Neural Networks using Wikipedia (Péter Mernyei, 2019-20, Novel Applications spotlight talk at ICML GRL+), Multimodal Relational Reasoning for Visual Question Answering (Aaron Tjandra, 2019-20), The PlayStation Reinforcement Learning Environment (Carlos Purves, 2018-19) (80⁄100) (presented at NeurIPS Deep RL), Deep Learning for Music Recommendation (Andrew Wells, 2017-18) (76⁄100).
Undergraduate courses for Murray Edwards, King’s, and Queens’ Colleges: AI, Databases, Discrete Mathematics, Foundations of Computer Science, Logic and Proof, Machine Learning and Real-world Data.