Elon Musk’s xAI Unveils Grok 1.5 Vision AI Model in Preview, To Compete With GPT-4 Vision and Gemini Pro 1.5

Elon Musk’s artificial intelligence (AI) firm xAI has unveiled a new AI model dubbed Grok 1.5 Vision. This large language model (LLM) is an enhanced version of the recently released Grok 1.5 model. With this upgrade, the AI model is now equipped with computer vision, making it capable of accepting visual media as input. It can process images and answer questions about it. Notably, the announcement came just days after OpenAI introduced its own computer vision-powered GPT-4 model.

The announcement was made by the official X (formerly known as Twitter) account of xAI. The firm shared a blog post detailing the new AI model and shared some of its benchmark scores. Since the vision capabilities were added to the recently unveiled Grok 1.5 model, most of the details remain the same. It has the same context window of 1,28,000 tokens and the general benchmark scores are also likely to remain the same.

xAI also shared benchmark scores of Grok 1.5 Vision tested on a benchmark developed by the company. The AI firm calls it the RealWorldQA benchmark and it measures “real-world spatial understanding”. It also tested the model in several other benchmarks such as MMMU, Mathvista, ChartQA, and more. While Grok outperformed OpenAI’s GPT-4 with Vision and Gemini 1.5 Pro in RealWorldQA, it scored less in MMMU and ChartQA.

For the unversed, computer vision is a branch of computer science that deals with equipping computers (and AI models) with the ability to identify and understand objects in the real world using images and videos. This is designed to help computers see and process visual signals the way humans do. With the rise of multimodal AI models, many firms are now focusing on developing vision-focused models. Google’s Gemini 1.5 Pro and OpenAI’s GPT-4 with Vision both have this capability.

This technology also offers a wide range of applications. The Indian calorie tracking and nutrition feedback platform Healthify recently added a feature called Snap where users can click a picture of a food item or cuisine, and GPT-4 with Vision-powered AI chatbot suggests how the recipe can be made healthier, and how much exercise one needs to do to burn the extra calories. In future, AI models with computer vision can assist in the diagnosis of diseases, building self-driving cars, and more.

Affiliate links may be automatically generated – see our ethics statement for details.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who’sThat360 on Instagram and YouTube.

Square Enix Aims to Release Third Game in Final Fantasy 7 Remake Trilogy by 2027

Apple Loses Top Phonemaker Spot to Samsung as iPhone Shipments Drop, IDC Says

Read More

If you object to your article being published here, please contact us.

Tags: Compete Elon Gemini GPT4 Grok Model Musks Preview Pro Unveils Vision xAI

Sony’s Planned PlayStation Handheld Report Gets Backing, Possible ‘Prototype’ May Exist

Hume Introduces Interpretability-Based Voice Control Feature for AI Voice Customisation

Bitcoin’s Push Toward $100,000 Runs Into a Wall of Resistance

Sony Rolls Out Limited-Time PS5 Themes Based on Older PlayStation Consoles to Mark 30th Anniversary

World Labs Unveils AI System That Can Generate 3D Interactive Worlds Using an Image

Hugging Face Introduces Open-Source SmolVLM Vision Language Model Focused on Efficiency

Trending Tags

Trending Tags

Sony’s Planned PlayStation Handheld Report Gets Backing, Possible ‘Prototype’ May Exist

Hume Introduces Interpretability-Based Voice Control Feature for AI Voice Customisation

Bitcoin’s Push Toward $100,000 Runs Into a Wall of Resistance

Sony Rolls Out Limited-Time PS5 Themes Based on Older PlayStation Consoles to Mark 30th Anniversary

World Labs Unveils AI System That Can Generate 3D Interactive Worlds Using an Image

Hugging Face Introduces Open-Source SmolVLM Vision Language Model Focused on Efficiency

Trending Tags

Trending Tags

Elon Musk’s xAI Unveils Grok 1.5 Vision AI Model in Preview, To Compete With GPT-4 Vision and Gemini Pro 1.5

appi

First came the alert message, then the boom of interceptions

Latest news

Norway sends F-35s, 100 soldiers to guard Polish airport near Ukraine

German Economy Minister Habeck to open business summit in Nairobi

Sony’s Planned PlayStation Handheld Report Gets Backing, Possible ‘Prototype’ May Exist

Popular news

China’s Kaisa forecasts bigger H1 net loss on slower property deliveries

ChatGPT Advanced Voice Mode Rolls Out to Some ChatGPT Plus Users

Google Has an Illegal Monopoly on Search, US Judge Finds

iQoo Neo 7 Pro 5G Discounted Price Revealed Ahead of Amazon Great Indian Festival Sale

US Considers Breaking Up Google in Rare Antitrust Move

About Us

Category

Site links

Newsletter