After ChatGPT, Microsoft working on AI model that takes images as cues

by Editor | Mar 3, 2023

chat gpt

As the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, a new AI model that can also respond to visual cues or images, apart from text prompts or messages.

New Delhi, March 3,2023: As the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, a new AI model that can also respond to visual cues or images, apart from text prompts or messages.

The multimodal large language model (MLLM) can help in an array of new tasks, including image captioning, visual question answering and more.

Kosmos-1 can pave the way for the next-stage beyond ChatGPT’s text prompts.

“A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context and follow instructions,” said Microsoft’s AI researchers in a paper.

The paper suggests that multimodal perception, or knowledge acquisition and “grounding” in the real world, is needed to move beyond ChatGPT-like capabilities to artificial general intelligence (AGI), reports ZDNet.

“More importantly, unlocking multimodal input greatly widens the applications of language models to more high-value areas, such as multimodal machine learning, document intelligence, and robotics,” the paper read.

The goal is to align perception with LLMs, so that the models are able to see and talk.

Experimental results showed that Kosmos-1 achieves impressive performance on language understanding, generation, and even when directly fed with document images.

It also showed good results in perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and vision tasks, such as image recognition with descriptions (specifying classification via text instructions).

“We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs,” said the team.

0 Comments

Submit a Comment Cancel reply

← It's like a breakup: Sacked Google India worker NASA-SpaceX Crew-6 docks safely at ISS after hour-long delay →

Recent Posts

IIM Raipur’s New MBA Batch Begins Its Leadership Journey

IIM Raipur’s New MBA Batch Begins Its Leadership Journey

Raipur, (Press Release)Indian Institute of Management (IIM) Raipur, a leading institution recognized for #BuildingBusinessOwners, hosted the Inauguration and Orientation Program for the MBA Batch 2025–27, marking the beginning of a transformative academic journey...

Sharks Scouting for Their Next Big Investment at Farmley’s Indian Healthy Snacking Summit 2025

Sharks Scouting for Their Next Big Investment at Farmley’s Indian Healthy Snacking Summit 2025

Noida: (Press Release) Farmley, a leading healthy snacking brand, is all set to provide a gateway to budding entrepreneurs by partnering with Shark Tank India. As part of this special arrangement, business owners attending the upcoming Indian Healthy Snacking Summit...

Hyderabad: A Thriving Hub of Opportunities, Business, and Innovation

Hyderabad: A Thriving Hub of Opportunities, Business, and Innovation

Maeeshat News Network | Hyderabad Hyderabad, the capital city of Telangana, India, has emerged as a dynamic powerhouse of opportunities, business, and innovation over the past few decades. Known historically for its rich cultural heritage, architectural marvels like...

Kerala’s Grand Mufti Secures Reprieve for Nurse Facing Execution in Yemen

Kerala’s Grand Mufti Secures Reprieve for Nurse Facing Execution in Yemen

Maeeshat News Network | Kerala In a remarkable display of compassion and diplomacy, Grand Mufti Kanthapuram A.P. Aboobacker Musliyar, a prominent Islamic scholar and leader from Kerala, has played a pivotal role in deferring the execution of Nimisha Priya, an Indian...

0 Comments

Submit a Comment Cancel reply