SUBJECT: Ph.D. Proposal Presentation
   
BY: Muhammad Zubair Irshad
   
TIME: Monday, April 17, 2023, 1:00 p.m.
   
PLACE: CODA, C1108
   
TITLE: Learning Object and Agent-centric Neural 3D Scene Representations
   
COMMITTEE: Dr. Zsolt Kira, Chair (IC)
Dr. Aaron Young (ME)
Dr. Nader Sadegh (ME)
Dr. Shreyas Kousik (ME)
Dr. Adrien Gaidon (TRI)
 

SUMMARY

Recent advances in deep learning have led to a 'data-centric intelligence' in the last decade i.e. artificially intelligent models unlocking the potential to ingest a large amount of data and be really good at performing digital tasks such as text-to-image generation, generating elaborate pieces of texts for machine-human conversation, detailed 3D understanding from just multi-view images and grounding language in robotic affordances. This thesis covers the topic of learning with structured inductive bias to design approaches unlocking the potential of 'principle-centric intelligence' for the real-world i.e. building intelligent systems capable of performing complex tasks in the real-world using strong inductive priors in the form of hierarchy, decomposition, and large-scale synthetic data. In essence, these strong priors would enable improved real-world 3D understanding involving perception, language, and decision-making without requiring a large amount of real-world labeled data, only learning in simulation and transferring it to reality i.e. sim2real, and generalizing to novel scenes and new environments. This thesis demonstrates these ‘principles’ and applies them to a variety of tasks relating to robotics such as navigating to a goal purely based on visual inputs given natural language instructions without access to a prior map, reconstructing the 3D objects including their shapes and textures for downstream manipulation and object-centric holistic 3D scene understanding. While classical techniques are brittle and fail to generalize to unseen scenarios and data-centric approaches require a large amount of labeled data, the aim of this thesis is to build intelligent agents which require very-less real-world data or data acquired only from simulation to generalize to highly dynamic and cluttered environments in novel simulations (i.e. sim2sim) or real-world unseen environments (i.e. sim2real) for a holistic scene understanding of the 3D world.