Arize AI

Software Development

San Francisco, CA 26,769 followers

Ship Agents that Work. Arize AI & Agent Engineering Platform - one place for development, observability, and evaluation.

See jobs Follow

About us

The AI engineering platform for teams shipping reliable AI agents and LLM applications. Ship agents that work.

Website: http://www.arize.com
External link for Arize AI
Industry: Software Development
Company size: 51-200 employees
Headquarters: San Francisco, CA
Type: Privately Held

Locations

Primary

San Francisco, CA, US

Get directions

Employees at Arize AI

See all employees

Updates

Arize AI

26,769 followers
39m
Report this post
Model swaps look like configuration changes, but they behave more like product migrations. The product question is harder: if you change only the model, does the system still behave the way users expect? We tested 7 model targets under the same agent harness: same tasks, same fixture repo, same tools, same evaluator setup. Only the model changed. In the harnessed sweep, correctness stayed relatively close: 79.6% to 85.1%. The models landed in a similar correctness band, but they did not behave the same operationally. Can you swap models safely? Yes, sometimes. But only when the eval shows the behavior still meets the product bar. Nancy Chauhan wrote up what changed when we tested 7 models under the same agent harness. https://lnkd.in/gq2_B2qf

What we learned testing 7 models under the same agent harness arize.com

Like Comment Share
Arize AI

26,769 followers
5h
Report this post
Always excited to partner with Google Cloud—this is a fun one!
Google Cloud Partners

139,434 followers
3d

The era of "just answering questions" is over. It’s time to build AI that gets things done. 🛠️ Join the Building Agents for Real-World Challenges hackathon! Combine Gemini’s reasoning with exclusive tools from our partners to build autonomous agents that execute and solve real problems. Are you building, or just talking? Let’s see what you’ve got → https://goo.gle/4eLNKQR
Like Comment Share
Arize AI

26,769 followers
5h
Report this post
Excited to partner with GCG for this session on AI observability in production. Join us Tuesday May 26 at 7:00 AM PST to learn how enterprise teams can tackle model degradation, fragmented observability, and evaluation at scale. *Session is in Spanish*

Great Challenge Group

279 followers
1d Edited

Te invitamos a un webinar junto a Arize AI donde vamos a hablar sobre el costo oculto de la #IA sin observabilidad. Veremos cómo los equipos enterprise pueden monitorear, evaluar y mejorar modelos y agentes en producción, evitando degradación silenciosa, falta de trazabilidad y costos difíciles de anticipar. 🚀 También compartiremos demos en vivo y un caso real de una compañía de servicios financieros operando con más de 50M de spans mensuales. 🙌 Sumate al encuentro aqui: https://lnkd.in/dmeFU5b8 #AI #EnterpriseAI #AIObservability #LLMOps #MLOps #Arize #GCG

El costo oculto de la IA sin observabilidad

www.linkedin.com

Like Comment Share
Arize AI

26,769 followers
6h
Report this post
Hot off the presses, Gemini 3.5 Flash is now available in the Prompt Playground and throughout Arize AX! https://app.arize.com
Like Comment Share
Arize AI

26,769 followers
7h
Report this post
The agent framework space has gotten busy fast. Sam Bhagwat of Mastra is joining Observe to talk about what production teams actually need from a TypeScript-first agent stack. If you're a JS/TS shop trying to decide where to anchor your agent code, this conversation will save you a quarter of trial and error. June 4, SF → https://arize.com/observe
1 Comment

Like Comment Share
Arize AI

26,769 followers
9h
Report this post
🛠️ One AI Question with Elizabeth Hutton We asked our Senior Software Engineer: Why should you learn about evals? Her answer: Complex AI needs more trust, not less. As systems get smarter, evaluations are the only way to verify performance and ensure your AI is actually doing what it's supposed to do. Evals aren't optional—they're the foundation. #AI #AIEvals #LLM

1 Comment

Like Comment Share
Arize AI

26,769 followers
10h
Report this post
LLM-as-a-judge only works in production when the judge knows exactly what it is judging. A fluent answer is not the same as correct system behavior. If a refund agent says “your refund was processed” but never called the refund tool, a “helpful” score is a bad eval. Instead, you should: - Use code for deterministic checks. - Use LLM judges for semantic checks. - Use humans to calibrate edge cases. - Use traces to explain where the failure came from. For agents, judging the final answer is not enough. The response may look right while the trajectory is wrong: bad tool choice, hallucinated arguments, ignored tool errors, redundant loops, or unsupported claims. A judge is good when it improves engineering decisions. More here: https://lnkd.in/dhXjtkkc

How to build LLM-as-a-Judge evaluators that hold up in production arize.com

2 Comments

Like Comment Share
Arize AI

26,769 followers
1d
Report this post
Your AI agent disagrees with your human reviewers all day. Most teams treat that as noise. They're the most useful training data in your stack — vendor relationships, deadline pressure, the way your CFO actually thinks. Jim Bennett wrote up how to mine the gap and feed it back to the agent. https://lnkd.in/gMq2Ygmu
4 Comments

Like Comment Share
Arize AI

26,769 followers
2d
Report this post
Docs aren't just for humans anymore. Every coding agent, RAG pipeline, and copilot is reading them too, and they read differently. They truncate, skip pages they can't parse, and trim content before it reaches the model. We built our docs to hold up for every agent that reaches for them. Find us near the top of the Mintlify agent score leaderboard: mintlify.com/score
1 Comment

Like Comment Share
Arize AI

26,769 followers
2d
Report this post
Can you get world-class agents using harness engineering instead of fine-tuning? OpenAI thinks so.

Laurie Voss
3d

OpenAI is shutting down its fine-tuning APIs. It doesn't mean fine-tuning is dead, but it's a strong signal that fine-tuning isn't what the average AI engineer wants to do. So what are they doing instead?

The end of fine-tuning Laurie Voss on LinkedIn

Like Comment Share

Affiliated pages

Arize Phoenix

Technology, Information and Internet

San Francisco, CA

Browse jobs

Funding

Arize AI 4 total rounds

Last Round

Series C Mar 20, 2025

US$ 70.0M

Investors

Adams Street Partners + 10 Other investors

See more info on crunchbase

Arize AI

Software Development

San Francisco, CA 26,769 followers

Ship Agents that Work. Arize AI & Agent Engineering Platform - one place for development, observability, and evaluation.

About us

Arize AX | AI and agent engineering platform; one place for development, observability, and evals

Data Science & Machine Learning Platforms

Arize Phoenix - Open Source (OSS)

Data Science & Machine Learning Platforms

Locations

Employees at Arize AI

Ashu Garg

Todd Graham

Dharmesh Thakker

Ajay Chopra

Updates

El costo oculto de la IA sin observabilidad

www.linkedin.com

Join now to see what you are missing

Affiliated pages

Arize Phoenix

Similar pages

Baseten

Sardine

Oddball

OpenSea

Pulley

Metronome

Daisy

Candid Health

Prepared

Linear

Browse jobs

Human Resources Specialist jobs

Human Resources Generalist jobs

Manager jobs

Human Resources Business Partner jobs

Engineer jobs

Human Resources Manager jobs

Operational Specialist jobs

Account Manager jobs

Recruiter jobs

Account Executive jobs

Analyst jobs

Director jobs

Product Manager jobs

Vice President jobs

Executive jobs

Machine Learning Engineer jobs

Developer jobs

Scientist jobs

Talent Acquisition Specialist jobs

Specialist jobs

Funding