Computer Vision

28 Posts

Image illustrates data flow from raw satellite sources through processing to embeddings for climate tracking.

Earth Modeled in 10-Meter Squares: Google’s AlphaEarth Foundations tracks the whole planet’s climate, land use, potential for disasters, in detail and at scale

Researchers built a model that integrates satellite imagery and other sensor readings across the entire surface of the Earth to reveal patterns of climate, land use, and other features.

Table comparing DINO, DINOv2, DINOv3, SigLIP 2, and PE on segmentation, depth estimation, tracking, and classification tasks.

Computer Vision

Better Image Processing Through Self-Supervised Learning: Meta’s DINOv3 gets an updated loss term and improved vision performance

DINOv2 showed that a vision transformer pretrained on unlabeled images could produce embeddings that are useful for a wide variety of tasks. Now it has been updated to improve the performance of its embeddings in segmentation and other vision tasks.

Researcher adjusts sensors on a quadruped robot disguised as a Tibetan antelope in a high-altitude reserve.

Computer Vision

Robot Antelope Joins Herd: Chinese scientists disguise modified robot dog as antelope to study herd behavior

Researchers in China disguised a quadruped robot as a Tibetan antelope to help study the animals close-up.

Hand holding a Pixel 10 smartphone using Google’s Magic Cue AI assistant, which suggests a location reply in a text message conversation.

Computer Vision

Proactive AI Assistance for Phones: Inside Magic Cue, Google’s new AI assistant for Pixel 10

Google’s latest smartphone sports an AI assistant that anticipates the user’s needs and presents helpful information without prompting.

A fully autonomous surgical robot clips and cuts a bile duct in an ex-vivo gallbladder removal experiment using the da Vinci system, guided by AI.

Computer Vision

Robot Surgeon Cuts and Clips: Doctors at Stanford, Johns Hopkins, and Optosurgical operate on animal organs without human intervention

An autonomous robot performed intricate surgical operations without human intervention.

Diagram of Walmart’s Element platform for AI app development, which unifies data and containerizes processing across multiple cloud providers.

Computer Vision

Inside Walmart’s AI App Factory: Walmart’s Element platform for industrial-scale AI app development — a progress report

The world’s biggest retailer by revenue revealed new details about its cloud- and model-agnostic AI application development platform.

Computer Vision

Robotic Beehive For Healthier Bees: Beewise’s robotic beehive uses AI to save pollinators.

An automated beehive uses computer vision and robotics to help keep bees healthy and crops pollinated.

Meta Aria Gen 2 smart glasses for AI research, equipped with cameras, microphones, and other sensors for real-time data capture.

Computer Vision

Meta’s Smart Glasses Come Into Focus: Meta reveals further details of Aria Gen 2 smart glasses for multisensory AI research

Meta revealed new details about its latest Aria eyeglasses, which aim to give AI models a streaming, multisensory, human perspective.

3D scene comparison of human-object interaction for ZeroHSI, LINGO, and CHOIS models in a synthetic indoor environment.

Computer Vision

Human Action in 3D: Stanford researchers use generated video to animate 3D interactions without motion capture

AI systems designed to generate animated 3D scenes that include active human characters have been limited by a shortage of training data, such as matched 3D scenes and human motion-capture examples. Generated video clips can get the job done without motion capture.

AI model leaderboard comparing performance across tasks like math, vision, and document analysis.

Computer Vision

Alibaba’s Answer to DeepSeek: Alibaba debuts Qwen2.5-VL, a powerful family of open vision-language models

While Hangzhou’s DeepSeek flexed its muscles, Chinese tech giant Alibaba vied for the spotlight with new open vision-language models.

GIF of two humanoid robots walking, one on grass and the other on a paved surface.

Computer Vision

Humanoid Robot Price Break: Unitree and EngineAI showcase affordable humanoid robots

Chinese robot makers Unitree and EngineAI showed off relatively low-priced humanoid robots that could bring advanced robotics closer to everyday applications.

X-CLR loss: training models to link text captions and image similarity.

Computer Vision

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Contrastive loss functions make it possible to produce good embeddings without labeled data. A twist on this idea makes even more useful embeddings.

Table comparing model performance on Mathvista, MMMU, ChartQA, DocVQA, and other tasks.

Computer Vision

Mistral’s Vision-Language Contender: Mistral unveils Pixtral Large, a rival to top vision-language models

Mistral AI unveiled Pixtral Large, which rivals top models at processing combinations of text and images.

Grounding DINO animation depicting object detection with bounding boxes on images.

Computer Vision

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

An open source model is designed to perform sophisticated object detection on edge devices like phones, cars, medical equipment, and smart doorbells.

Computer Vision

Landmine Recognition: AI supports specialists in battlefields by detecting landmines and other unexploded ordnance.

An AI system is scouring battlefields for landmines and other unexploded ordnance, enabling specialists to defuse them.

Computer Vision

Earth Modeled in 10-Meter Squares: Google’s AlphaEarth Foundations tracks the whole planet’s climate, land use, potential for disasters, in detail and at scale

Better Image Processing Through Self-Supervised Learning: Meta’s DINOv3 gets an updated loss term and improved vision performance

Robot Antelope Joins Herd: Chinese scientists disguise modified robot dog as antelope to study herd behavior

Proactive AI Assistance for Phones: Inside Magic Cue, Google’s new AI assistant for Pixel 10

Robot Surgeon Cuts and Clips: Doctors at Stanford, Johns Hopkins, and Optosurgical operate on animal organs without human intervention

Inside Walmart’s AI App Factory: Walmart’s Element platform for industrial-scale AI app development — a progress report

Robotic Beehive For Healthier Bees: Beewise’s robotic beehive uses AI to save pollinators.

Meta’s Smart Glasses Come Into Focus: Meta reveals further details of Aria Gen 2 smart glasses for multisensory AI research

Human Action in 3D: Stanford researchers use generated video to animate 3D interactions without motion capture

Alibaba’s Answer to DeepSeek: Alibaba debuts Qwen2.5-VL, a powerful family of open vision-language models

Humanoid Robot Price Break: Unitree and EngineAI showcase affordable humanoid robots

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Mistral’s Vision-Language Contender: Mistral unveils Pixtral Large, a rival to top vision-language models

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

Landmine Recognition: AI supports specialists in battlefields by detecting landmines and other unexploded ordnance.

Subscribe to The Batch