Large Generative Models Meet Multimodal Applications (LGM3A)
Workshop at ACM Multimedia 2024
Scope and Topics
This workshop aims to explore the potential of large generative models to revolutionize the way we interact with multimodal information.
A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, and Qwen, etc.
These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL).
With the surge in multimodal content—encompassing images, videos, audio, and 3D models—over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements.
These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc.
Concurrently, certain research initiatives have delved into generating specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production.
There are also endeavors to integrate LLMs with external tools to achieve a near 'any-to-any' multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT.
Collectively, these models, spanning not only text and image generation but also other modalities, are referred to as large generative models.
This workshop will provide an opportunity for researchers, practitioners, and industry professionals to explore the latest trends and best practices in the field of multimodal applications of large generative models.
We also remark that the submissions are not limited to the use of such models. The workshop will also focus on exploring the challenges and opportunities of integrating large language models with other AI technologies such as computer vision and speech recognition.
Additionally, the workshop will provide a platform for participants to present their research, share their experiences, and discuss potential collaborations.