Learning World Models with Large-Scale Public Satellite Datasets

Currently open to new students?Yes
No
DescriptionThis is an extension of a successful previous SCS undergrad research project, which was accepted to NeurIPS 2024.  https://mapitanywhere.github.io/  
﻿
Map It Anywhere (MIA)Map It Anywhere (MIA)
Humans often reason about unknown spaces to navigate effectively. For example, even if their view may be occluded by cars, humans can reasonably predict where road crossings are based on building placements and associated prior knowledge about how a world is structured.
We address the task of map prediction, where given a first-person-view (FPV) RGB image, we predict the bird's eye view (BEV) semantic layout of the surrounding world, even in occluded, unknown areas. While many map prediction methods exist, they are task-specific: where the methods are limited to the small amount of labeled FPV→ BEV datasets available. This paradigm therefore hinders the generalizability of such methods. 
Our project seeks to scale map prediction learning using large-scale public datasets such as satellite imagery and street-view images. By leveraging these vast resources, we expect to see generalizable behaviors in the model such that we can finetune task-specific map prediction tasks from little labeled data, and ideally emergent behaviors to help in other related tasks: top-down semantic mapping, localization, robot exploration, and planning.
One exciting downstream application of this work is offroad navigation, where understanding the robot's surroundings over long distances through predicting BEV map is critical for safe and efficient operation. We will validate the module's robustness and its potential for long range perception in diverse and challenging environments with real-world testing on our and collaborator's autonomous offroad vehicles.
﻿
One of our robot platforms: offroad ATVOne of our robot platforms: offroad ATV
﻿
﻿
Skills RequiredWhat skills are required to be able to successfully execute the project?
Foundation in Computer Vision, Deep Learning
Strong coding skills in Python / C++
Familiar with one machine learning development framework such as PyTorch, TensorFlow
Experience in large-scale dataset processing / ROS would be a plus
Student Learning ObjectivesWhat will the student be able to do by the end of the independent study?
Learn and develop innovative vision algorithms
Develop a strong understanding of map prediction, large-scale data-driven methods, and their applications in robotics
Potentially publish a paper in CV/robotics conferences
Classes Accepted into ProjectJunior
Senior
Graduate Student
CompensationUnits 9
Units 12
Pay
﻿
ContactScientist, postdoc or student to contact about the project:
Yifei Liu ( yifeil5@andrew.cmu.edu ) 
Cherie Ho ( cherieh@andrew.cmu.edu )