News

  • 7/5/2023 - CFP is released.
  • 7/5/2023 - Workshop homepage is now available.

Call for Papers

This workshop intends to 1) provide a platform for researchers to present their latest works and receive feedback from experts in the field, 2) foster discussions on current challenges and opportunities in multimodal analysis and application, 3) identify emerging trends and opportunities in the field, and 4) explore their potential impact on future research and development. Potential topics include, but are not limited to:
  • Multimodal data augmentation
  • Multimodal data analysis and understanding
  • Multimodal question answering
  • Multimodal generation
  • Multimodal retrieval augmentation
  • Multimodal recommendation
  • Multimodal summarization and text generation
  • Multimodal agents
  • Multimodal prompting
  • Multimodal continual learning
  • Multimodal fusion and integration of information
  • Multimodal applications/pipelines
  • Multimodal systems management and indexing
  • Multimodal mobile/lightweight deployment
Important dates:
  • Workshop Papers Submission: July 19, 2024
  • Workshop Papers Notification: August 5, 2024
  • Camera-ready Submission: August 19, 2024
  • Conference dates: 28 October - 1 November 2024
Please note: The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth.

Submission

  • Submission Guidelines:
  • Submitted papers (.pdf format) must be the same format & template as the main conference. The submition format The manuscript’s length is limited to one of the two options: a) 4 pages plus 1-page reference; or b) 8 pages plus up to 2-page reference. All papers will be peer-reviewed by experts in the field. Acceptance will be based on relevance to the workshop, scientific novelty, and technical quality.
  • Submission Site: https://easychair.org/conferences/?conf=lgm3a
  • Organizers

    • Shihao Xu (Huawei Singapore Research Center, Singapore)
    • Yiyang Luo (Huawei Singapore Research Center, Singapore)
    • Justin Dauwels (Delft University of Technology)
    • Andy Khong (Nanyang Technological University, Singapore)
    • Zheng Wang (Huawei Singapore Research Center, Singapore)
    • Qianqian Chen (Huawei Singapore Research Center, Singapore)
    • Chen Cai (Huawei Singapore Research Center, Singapore)
    • Wei Shi (Huawei Singapore Research Center, Singapore)
    • Tat-Seng Chua (National University of Singapore, Singapore)

    Speakers

    Keynote 1

    Prof. Ziwei Liu is a Nanyang Assistant Professor (2020-) at College of Computing and Data Science in Nanyang Technological University, with MMLab@NTU. Previously, he was a research fellow (2018-2020) in CUHK with Prof. Dahua Lin and a post-doc researcher (2017-2018) in UC Berkeley with Prof. Stella Yu. His research interests include computer vision, machine learning and computer graphics. Ziwei received his Ph.D. (2013-2017) from CUHK, Multimedia Lab, advised by Prof. Xiaoou Tang and Prof. Xiaogang Wang. He is fortunate to have internships at Microsoft Research and Google Research. Ziwei is the recipient of MIT Technology Review Innovators under 35 Asia Pacific, ICBS Frontiers of Science Award, CVPR Best Paper Award Candidate and WAIC Yunfan Award. His works have been transferred to products, including Microsoft Pix, SenseGo and Google Clips.

    Talk Title: Multi-Modal Generative AI with Foundation Models
    Abstract: Generating photorealistic and controllable visual contents has been a long-pursuing goal of artificial intelligence (AI), with extensive real-world applications. It is also at the core of embodied intelligence. In this talk, I will discuss our work in AI-driven visual context generation of humans, objects and scenes, with an emphasis on combining the power of neural rendering with large multimodal foundation models. Our generative AI framework has shown its effectiveness and generalizability on a wide range of tasks.

    Keynote 2

    Prof. Mike Zheng Shou is a tenure-track Assistant Professor at National University of Singapore and a former Research Scientist at Facebook AI in the Bay Area. He holds a PhD degree from Columbia University in the City of New York, where he worked with Prof. Shih-Fu Chang. He was awarded the Wei Family Private Foundation Fellowship. He received the best paper finalist at CVPR'22 and the best student paper nomination at CVPR'17. His team won 1st place in multiple international challenges including ActivityNet 2017, EPIC-Kitchens 2022, Ego4D 2022 & 2023. He is a Fellow of the National Research Foundation (NRF) Singapore and has been named on the Forbes 30 Under 30 Asia list.

    Talk Title: Multimodal Video Understanding and Generation
    Abstract: Exciting progress has been made in multimodal video intelligence, including both understanding and generation, these two pillars in video. Despite being promising, several key challenges still remain. In this talk, I will introduce our attempts to address some of them. (1) For understanding, I will share All-in-one, which employs one single unified network for efficient video-language modeling, and EgoVLP, which is the first video-language pre-trained model for egocentric video. (2) For generation, I will introduce our study of efficient video diffusion models (i.e., Tune-A-Video, 4K GitHub stars). (3) Finally, I would like to discuss our recent exploration, Show-o, one single LLM that unifies multimodal understanding and generation.