Sergio Casas

Sergio Casas

Sr Staff TLM @ Waabi

Ph.D. @ UofT

About me

I am a Sr Staff Tech Lead Manager at Waabi, where I lead our Perception and Behavior Reasoning team. I completed my PhD at the University of Toronto supervised by Professor Raquel Urtasun. You can download my thesis here. My research lies at the intersection of computer vision, machine learning, and robotics.

Interests

  • Artificial Intelligence
  • Machine Learning
  • Computer Vision
  • Robotics - Autonomous Driving
  • Generative Models
  • Imitation Learning

Education

  • PhD in Computer Science, 2020 - 2024

    University of Toronto

  • MSc in Computer Science, 2018 - 2020

    University of Toronto

  • BSc in Computer Science, 2013 - 2017

    Universitat Politècnica de Catalunya

  • BSc in Industrial Tech. Engineering, 2012 - 2017

    Universitat Politècnica de Catalunya

Selected Publications

For a complete and up-to-date list of publications visit my Google Scholar

(* denotes equal contribution)

DeTra: A Unified Model for Object Detection and Trajectory Forecasting

ECCV 2024
Unified object detection and trajectory prediction as trajectory refinement.
DeTra: A Unified Model for Object Detection and Trajectory Forecasting

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

CVPR 2024 (Oral)
Occupancy Foundation Model (Unsupervised)
UnO: Unsupervised Occupancy Fields for Perception and Forecasting

QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving

ICRA 2024
End-to-end autonomy leveraging implicit occupancy.
QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving

Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

ICLR 2024
LiDAR World Model.
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

ImplicitO: Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

CVPR 2023 (Highlight)
Efficient implicit occupancy perception and forecasting model.
ImplicitO: Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

MP3: A Unified Model to Map, Perceive, Predict and Plan

CVPR 2021 (Best Paper Candidate, Oral)
Interpretable end-to-end neural motion planning without high-definition maps
MP3: A Unified Model to Map, Perceive, Predict and Plan

LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving

ICCV 2021
Contingency planning from diverse joint trajectory samples for all actors in the scene
LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving

TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors

CVPR 2021
Realistic long-term vehicle behavior simulation learned from imitation and common sense
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors

Implicit Latent Variable Model for Scene-Consistent Motion Forecasting

ECCV 2020
ILVM characterizes the joint distribution over multiple actors’ future trajectories
Implicit Latent Variable Model for Scene-Consistent Motion Forecasting

End-to-end Interpretable Neural Motion Planner

CVPR 2019 (Oral)
Neural motion planner from LiDAR and HD maps
End-to-end Interpretable Neural Motion Planner

Intentnet: Learning to Predict Intention from Raw Sensor Data

CoRL 2018 (Spotlight)
Joint perception and prediction from LiDAR point clouds and HD maps
Intentnet:  Learning to Predict Intention from Raw Sensor Data