This is a very simple well defined project. Our goal is to have a fast single-forward pass open-vocabulary zero-shot encoder which can be queried to get semantic segmentation. We will evaluate both 2D and 3D segmentation performance. Current leaders in this space are:
Improving beyond RayFronts and Trident is easy. We need someone to push for lots of benchmarking. The resulting model will be used in many downstream robotics applications.
Target venues: CVPR / ECCV / ICCV
- Strong coding skills in python and particularly PyTorch.
- Ability to read, understand, and replicate results of papers.
- Proactive.
Scientist, postdoc or student to contact about the project: