Sergio Casas

Sergio Casas

Sr Staff Researcher and TLM

Waymo

About me

I am a researcher and engineering leader driven by the mission of bringing AI from the lab into the real world. I am currently a Sr Staff Researcher and Tech Lead Manager at Waymo, developing Foundation Models for Self-Driving Cars. My work builds on a deep background in autonomous technology, including my time leading the Perception and Behavior Reasoning team at Waabi, my role as a Research Scientist at Uber ATG, and my PhD in end-to-end autonomy from the University of Toronto, where I was advised by Raquel Urtasun.

Interests

  • Artificial Intelligence
  • Machine Learning
  • Computer Vision
  • Generative Models
  • Autonomous Driving
  • Robotics

Education

  • PhD in Computer Science, 2020 - 2024

    University of Toronto

  • MSc in Computer Science, 2018 - 2020

    University of Toronto

  • BSc in Computer Science, 2013 - 2017

    Universitat Politècnica de Catalunya

  • BSc in Industrial Tech. Engineering, 2012 - 2017

    Universitat Politècnica de Catalunya

Selected Publications

For a complete and up-to-date list of publications visit my Google Scholar

(* denotes equal contribution)

Scaling Laws of Motion Forecasting and Planning

Technical Report 2025
Studying how motion forecasting and planning models scale with compute and data, both in open-loop and closed-loop
Scaling Laws of Motion Forecasting and Planning

MAD: Memory-Augmented Detection of 3D Objects

CVPR 2025
Pushing the boundaries of Memory-based Perception
MAD: Memory-Augmented Detection of 3D Objects

DIO: Decomposable Implicit 4D Occupancy-Flow World Model

CVPR 2025
Object-Centric Occupancy Foundation Model
DIO: Decomposable Implicit 4D Occupancy-Flow World Model

DeTra: A Unified Model for Object Detection and Trajectory Forecasting

ECCV 2024
Unified object detection and trajectory prediction as trajectory refinement.
DeTra: A Unified Model for Object Detection and Trajectory Forecasting

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

CVPR 2024 (Oral)
Occupancy Foundation Model (Unsupervised)
UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

ICLR 2024
LiDAR World Model.
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

ImplicitO: Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

CVPR 2023 (Highlight)
Efficient implicit occupancy perception and forecasting model.
ImplicitO: Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

MP3: A Unified Model to Map, Perceive, Predict and Plan

CVPR 2021 (Best Paper Candidate, Oral)
Interpretable end-to-end neural motion planning without high-definition maps
MP3: A Unified Model to Map, Perceive, Predict and Plan

TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors

CVPR 2021
Realistic long-term vehicle behavior simulation learned from imitation and common sense
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors

Implicit Latent Variable Model for Scene-Consistent Motion Forecasting

ECCV 2020
ILVM characterizes the joint distribution over multiple actors’ future trajectories
Implicit Latent Variable Model for Scene-Consistent Motion Forecasting

End-to-end Interpretable Neural Motion Planner

CVPR 2019 (Oral)
Neural motion planner from LiDAR and HD maps
End-to-end Interpretable Neural Motion Planner

Intentnet: Learning to Predict Intention from Raw Sensor Data

CoRL 2018 (Spotlight)
Joint perception and prediction from LiDAR point clouds and HD maps
Intentnet:  Learning to Predict Intention from Raw Sensor Data