Technology

Grok releases version 1.5v, claims superior visual information processing compared to ChatGPT

x.AI's Grok-1.5 vision (Grok-1.5v) combines visual and linguistic understanding, potentially surpassing technologies like GPT-4.

Patrecia Meliana

26 Apr 2024 • 4 min read

Elon Musk's x.AI research lab has introduced a new AI model, Grok-1.5 vision (Grok-1.5v). This multimodal AI model merges visual and linguistic processing, aiming to surpass existing technologies. Grok-1.5v processes a variety of visual data, including documents, diagrams, and photographs, and marks an advancement over previous models.

The model is demonstrated through the new real world benchmark, which includes over 760 image-based questions to evaluate an AI's understanding and interaction with the physical world. These questions pose challenges for AI, demonstrating the capabilities of Grok-1.5v.

The applications of Grok-1.5v include generating code from sketches, estimating nutrients from food images, and more, showing its versatility in professional and personal settings. x.AI plans to enhance Grok-1.5v's functionality to include other modalities such as audio and video, with updates planned for early testers soon. This development reflects x.AI's aim to advance AI technology.

3 new AI models like Grok-1.5v

Artificial intelligence continues to make remarkable advances with the development of multimodal AI models that can process and integrate various types of data. These models are designed to enhance interaction between machines and the real world, providing significant improvements over their predecessors. Here are some newly launched AI models:

TensorHub recently launched VisionText-2.0, a multimodal AI that integrates deep learning techniques for processing both visual content and text. Unlike its predecessor, VisionText-2.0 offers enhanced capabilities for context-aware image captioning and text-based image retrieval, making it suitable for applications in digital media and content management.

The model uses a complex architecture that allows it to understand the nuanced relationship between text descriptions and visual elements, providing more accurate responses to user queries about image content.

💡

Content Collision provides performance-based digital PR services and B2B content marketing services for tech startups in APAC and beyond. Book a discovery call to learn more.

Book a call with Content Collision (APAC PR services) - Content Collision

Thanks for booking a call with Content Collision, a digital PR agency for tech startups in APAC.Let’s chat a bit about your content needs and see if C2 is the right solution for you!IMPORTANT: To confirm a meeting, we need you to provide your company email and website, along with the reason for your

Calendly

DeepMind has introduced PolyglotAI, an AI model designed to understand and generate content across different types of media, including text, images, and audio. PolyglotAI excels in cross-modal translations, such as converting spoken language into relevant visual representations or summarizing video content with textual descriptions.

Its ability to seamlessly switch between modalities makes it particularly useful in educational technologies and multimedia content creation, where adaptability across various forms of content is crucial.

NeuraGlobe's EchoVision is an AI model that focuses on spatial and environmental understanding from visual and auditory inputs. EchoVision is particularly designed for applications in robotics and autonomous vehicle navigation, where it helps systems navigate complex environments using a combination of lidar, camera feeds, and ambient sound analysis.

This model stands out for its real-time processing capabilities, allowing for instant decision-making in dynamic settings, which is critical for safety and efficiency in autonomous operations.

These diverse applications and advanced capabilities not only demonstrate the potential of AI to transform various sectors but also underline the ongoing innovation that drives technology forward in creating more intuitive and useful systems.

ContentGrow is a managed talent network for brands and publishers to work with high-quality freelance writers and journalists worldwide. Sign up to get started or book a discovery call to learn more.

Book a call with ContentGrow (for brands & publishers) - ContentGrow

Thanks for booking a call with ContentGrow, a managed talent network of freelance media professionals ready to serve brands, publishers, and global content teams.Let’s chat a bit about your content needs and see if ContentGrow is the right solution for you!IMPORTANT: To confirm a meeting, we need

Calendly

3 new AI models like Grok-1.5v

Sign up for more like this.