Learning World Models with Large-Scale Public Satellite Datasets

Currently open to new students?

Yes
No

Description

This is an extension of a successful previous SCS undergrad research project, which was accepted to NeurIPS 2024.  https://mapitanywhere.github.io/  
Humans often reason about unknown spaces to navigate effectively. For example, even if their view may be occluded by cars, humans can reasonably predict where road crossings are based on building placements and associated prior knowledge about how a world is structured.
We address the task of map prediction, where given a first-person-view (FPV) RGB image, we predict the bird's eye view (BEV) semantic layout of the surrounding world, even in occluded, unknown areas. While many map prediction methods exist, they are task-specific: where the methods are limited to the small amount of labeled FPV→ BEV datasets available. This paradigm therefore hinders the generalizability of such methods. 
Our project seeks to scale map prediction learning using large-scale public datasets such as satellite imagery and street-view images. By leveraging these vast resources, we expect to see generalizable behaviors in the model such that we can finetune task-specific map prediction tasks from little labeled data, and ideally emergent behaviors to help in other related tasks: top-down semantic mapping, localization, robot exploration, and planning.
One exciting downstream application of this work is offroad navigation, where understanding the robot's surroundings over long distances through predicting BEV map is critical for safe and efficient operation. We will validate the module's robustness and its potential for long range perception in diverse and challenging environments with real-world testing on our and collaborator's autonomous offroad vehicles.



Skills Required

What skills are required to be able to successfully execute the project?
  • Foundation in Computer Vision, Deep Learning
  • Strong coding skills in Python / C++
  • Familiar with one machine learning development framework such as PyTorch, TensorFlow
  • Experience in large-scale dataset processing / ROS would be a plus

Student Learning Objectives

What will the student be able to do by the end of the independent study?
  • Learn and develop innovative vision algorithms
  • Develop a strong understanding of map prediction, large-scale data-driven methods, and their applications in robotics
  • Potentially publish a paper in CV/robotics conferences

Classes Accepted into Project

Junior
Senior
Graduate Student

Compensation

Units 9
Units 12
Pay


Contact

Scientist, postdoc or student to contact about the project: