I am a Senior AI researcher, most recently co-lead of Generative Music at Google DeepMind, with 9 years of ML experience and extensive background in computer science and competitive programming.
I’m motivated by my work having a clear purpose, focus and positive impact in the real world. Music is my greatest passion - I’ve brought essential contributions to AI tools that can enhance the creative process of making music and allow quicker exploration of ideas. I was part of the core Lyria and Music AI Sandbox teams and GDM tech lead for the YouTube Dream Track quality workstream.
My PhD at King’s College, University of Cambridge was awarded with no thesis corrections. I also earned a First Class BA and an MPhil with Distinction from Cambridge. During my studies, I interned at Big Tech and startup companies, top research labs in academia and industry. Since graduate years, I’ve mentored and taught for 100s of hours. I love giving demos, talks or lectures and always get positive energy from a room full of people!
Outside work, I love cycling, rowing, travelling, playing/recording the piano/guitar and chasing my favourite bands on tour. 🎼 I sometimes write poetry and lyrics for (ever-)future songs :)
PhD in Machine Learning, 2021
University of Cambridge
MPhil in Advanced Computer Science, 2017
University of Cambridge
BA in Computer Science, 2016
University of Cambridge
We introduce a challenging decision-making task that we call active acquisition for multimodal temporal data (A2MT). In many real-world scenarios, input features are not readily available at test time and must instead be acquired at significant cost. With A2MT, we aim to learn agents that actively select which modalities of an input to acquire, trading off acquisition cost and predictive performance. A2MT extends a previous task called active feature acquisition to temporal decision making about high-dimensional inputs. Further, we propose a method based on the Perceiver IO architecture to address A2MT in practice. Our agents are able to solve a novel synthetic scenario requiring practically relevant cross-modal reasoning skills. On two large-scale, real-world datasets, Kinetics-700 and AudioSet, our agents successfully learn cost-reactive acquisition behavior. However, an ablation reveals they are unable to learn to learn adaptive acquisition strategies, emphasizing the difficulty of the task even for state-of-the-art models. Applications of A2MT may be impactful in domains like medicine, robotics, or finance, where modalities differ in acquisition cost and informativeness.
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books.
2024: One of 3 research leads for the generative music effort, working with the other leads to solicit research ideas, plan workstreams, ensure communication between leadership and team members, maintain momentum, and run recurring team meetings. IC work on model controls and finetuning for product use-cases. Regularly delivered demos of our music AI tech to industry stakeholders. The work of our team was presented at various events including Google I/O 2024.
2023: I was a core contributor to Lyria and Music AI Tools (Sandbox), and GDM tech lead for one of the Youtube Shorts Dream Track workstreams, where I coordinated with several YouTube teams to help our research team hit the quality launch bar and inform leadership product decisions.
Master’s research projects: Structure-aware Generation of Molecules in Protein Pockets (Pavol Drotar, 2020-21) (92⁄100) (presented at NeurIPS MLSB), Machine Unlearning (Mukul Rathi, 2020-21) (91⁄100), Goal-Conditioned Reinforcement Learning in the Presence of an Adversary (Carlos Purves, 2019-20) (87⁄100), Representation Learning for Spatio-Temporal Graphs (Felix Opolka, 2018-19) (85⁄100) (presented at ICLR RLGM), Dynamic Temporal Analysis for Graph Structured Data (Aaron Solomon, 2018-19) (presented at ICLR RLGM)
Computer Science Tripos Part II projects: Benchmarking Graph Neural Networks using Wikipedia (Péter Mernyei, 2019-20, Novel Applications spotlight talk at ICML GRL+), Multimodal Relational Reasoning for Visual Question Answering (Aaron Tjandra, 2019-20), The PlayStation Reinforcement Learning Environment (Carlos Purves, 2018-19) (80⁄100) (presented at NeurIPS Deep RL), Deep Learning for Music Recommendation (Andrew Wells, 2017-18) (76⁄100).
Undergraduate courses for Murray Edwards, King’s, and Queens’ Colleges: AI, Databases, Discrete Mathematics, Foundations of Computer Science, Logic and Proof, Machine Learning and Real-world Data.