 
              Speaker: Ziwei Liu (Nanyang Technological University)
Talk Title: From Multimodal Generative Models to Dynamic World Modeling
Abstract: Beyond the confines of flat screens, multimodal generative models are crucial to create immersive experiences in virtual reality, not only for human users but also for robotics. Virtual environments or real-world simulators, often comprised of complex 3D/4D assets, significantly benefit from the accelerated creation enabled by Gen AI. In this talk, we will introduce our latest research progress on multimodal generative models for objects, avatars, scenes, motions, and ultimately dynamic world models.
Short Bio: Ziwei Liu is currently an Associate Professor at Nanyang Technological University, Singapore. His research revolves around computer vision, machine learning and computer graphics. He has published extensively on top-tier conferences and journals in relevant fields, including CVPR, ICCV, ECCV, NeurIPS, ICLR, SIGGRAPH, TPAMI, TOG and Nature Machine Intelligence. He is the recipient of PAMI Mark Everingham Prize, CVPR Best Paper Award Candidate, Asian Young Scientist Fellowship, International Congress of Basic Science Frontiers of Science Award and MIT Technology Review Innovators under 35 Asia Pacific. He serves as an Area Chair of CVPR, ICCV, ECCV, NeurIPS and ICLR, as well as an Associate Editor of IJCV.
 
              Speaker: Mike Zheng Shou (National University of Singapore)
Talk Title: Video Intelligence in the Era of Multimodal
Bio: Mike Shou is an Assistant Professor under Presidential Young Professorship at National University of Singapore. He was a Research Scientist at Facebook AI in the Bay Area. He obtained his Ph.D. degree at Columbia University with Prof Shih-Fu Chang. His research mainly focuses on video and multimodal. He received the Best Paper Finalist at CVPR 2022, Best Student Paper Nomination at CVPR 2017, EgoVis Distinguished Paper Award 2022/23. His team won 1st place in the international challenges including ActivityNet, EPIC-Kitchens, Ego4D. He is a ST Engineering Distinguished Professor and a Fellow of National Research Foundation Singapore. He is on the Forbes 30 Under 30 Asia list.
 
              Speaker: Guosheng Lin (Nanyang Technological University)
Talk Title: Recent Advances in 3D Generation: From 3D Assets to CAD Models
Abstract: In this talk, I will present our latest progress in 3D generative learning guided by text or image input. I will begin with our method for high-quality 3D asset generation from images, where we propose an efficient coarse-to-fine framework that combines compact coarse-level representations and part-aware voxel refinement. I will then introduce our approach for generating parametric CAD models directly from real-world images, eliminating the need for expensive 3D scanning. Together, these works push the frontier of visual generation and bring us closer to practical, scalable 3D modelling for real applications.
Bio: Guosheng Lin is an Associate Professor at the College of Computing and Data Science, Nanyang Technological University. His research interests are in computer vision, with a focus on data-efficient learning and generative learning. He has published over 100 research articles in prestigious venues. He serves as an Associate Editor for the IEEE journals TMM and TCSVT. He also serves as an Area Chair or Senior PC member for flagship conferences such as CVPR, ACM MM, IJCAI, and AAAI.
 
              Speaker: Hao Fei (National University of Singapore)
Talk Title: On Path to Multimodal Generalist: General-Level and General-Bench
Abstract: AI systems are increasingly capable of handling diverse types of data—such as text, images, and audio. However, many of these multimodal systems excel only in specific tasks or data modalities, lacking the broad adaptability seen in human intelligence. Also the existing evaluation paradigm that simply assumes that higher performance across tasks indicates a stronger MLLM capability can be problematic. This research introduces two tools/resources: General-Level, a framework that assesses an AI model's ability to integrate and apply knowledge across different tasks and data types; and General-Bench, a comprehensive dataset comprising over 700 tasks and 325,000 examples designed to evaluate this integrative capability. By applying these tools to over 100 existing AI models, we discovered that while some models perform well on individual tasks, they often struggle to transfer knowledge between different types of tasks or data. This indicates a gap in achieving truly general-purpose multimodal AGI. Our work aims to guide the development of more versatile multimodal AI systems that can seamlessly understand and generate multiple forms of data, moving us closer to AI that mirrors human-like general intelligence.
Bio: Hao Fei is a Senior Research Fellow at National University of Singapore. His research focuses on vision-language understanding and generation, multimodal large foundation models. He has published over 60 papers in top-tier venues such as IEEE TPAMI, IEEE TKDE, ACM TOIS, AI, ICML, NeurIPS, CVPR, ACL, ICLR, and AAAI, gaining over 7k citations. He has received many accolades including 2022 Outstanding CIPSC PhD Dissertation Award, 2023 WAIC Rising Star Award, 2024 WAIC Excellent Young Paper Award, and Stanford World's Top 2% Scientist. His representative contributions include NExT-GPT, Vitron, and the Generalist Benchmark. He has organized over 20 international workshops, tutorials, and shared tasks, and frequently serves as Area Chair/Senior PC for top conferences including CVPR, ICML, NeurIPS, AAAI, ACL, IJCAI, and ACM MM, as well as Associate Editor of ACM TALLIP and Neurocomputing.
| Time (Dublin) | Session | 
| 13:30 – 13:40 | Welcome Message from the Chairs | 
| 13:40 – 14:20 | Keynote 1 — From Multimodal Generative Models to Dynamic World Modeling (Ziwei Liu) | 
| 14:20 – 15:00 | Keynote 2 — Video Intelligence in the Era of Multimodal (Mike Zheng Shou) | 
| 15:00 – 15:40 | Keynote 3 — Recent Advances in 3D Generation: From 3D Assets to CAD Models (Guosheng Lin) | 
| 15:40 – 16:20 | Keynote 4 — On Path to Multimodal Generalist: General-Level and General-Bench (Hao Fei) | 
| 16:20 – 16:30 | Paper Presentation 1 — Annotation-Free Prompt Expansion for Chinese Text-to-Image Generation | 
| 16:30 – 16:40 | Paper Presentation 2 — Enabling Dynamic Storytelling via Training-Free Multimodal Synchronized Video Synthesis with Character Consistency | 
| 16:40 – 17:00 | Closing & Networking | 
For any questions, please email to LGM3A2024@gmail.com