What Does Multimodal Mean In the Context of Machine Learning?

Multimodal Model

Multimodal models process and relate information from different types of inputs, like text and images. They are often used in tasks that require understanding of both visual and textual content.

They are akin to tour guides at a museum who explain an artifact by combining what they see (image) with what they know (text).

Example

MMBT

Sign up for our monthly newsletter, The Drift.

Subscribe