Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10733 publications
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Preview abstract
This paper discusses the migration of data orchestration workflows from a legacy tool like Autosys to a modern, cloud - based solution, Google Cloud Composer. It explores the transition from traditional job scheduling to Directed Acyclic Graph (DAG) - based workflows using Apache Airflow, culminating in the deployment and management of these workflows in Cloud Composer. The benefits and challenges of this migration are examined, highlighting the advantages of scalability, flexibility, and cloud integration offered by Cloud Composer.
View details
Preview abstract
Several resource allocation settings involve agents with unequal entitlements represented by weights. We analyze weighted fair division from an asymptotic perspective: if m items are divided among n agents whose utilities are independently sampled from a probability distribution, when is it likely that a fair allocation exist? We show that if the ratio between the weights is bounded, a weighted envy-free allocation exists with high probability provided that m = Ω(n log n/ log log n), generalizing a prior unweighted result. For weighted proportionality, we establish a sharp threshold of m = n/(1 − μ) for the transition from non-existence to existence, where μ ∈ (0, 1) denotes the mean of the distribution. In addition, we prove that for two agents, a weighted envy-free (and weighted proportional) allocation is likely to exist if m = ω(√r), where r denotes the ratio between the two weights.
View details
PlanGEN: A Framework Utilizing Inference-Time Algorithms with LLM Agents for Planning and Reasoning
Hootan Nakhost
Mihir Parmar
Swaroop Mishra
Chitta Baral
Jindong Gu
2025
Preview abstract
Scaling inference-time computation in Large Language Models (LLMs) dramatically improves their capabilities for solving complex problems. While test-time scaling has shown promise in many tasks such as code generation and mathematical reasoning, integration of inference-time algorithms into multi-agent frameworks for planning and reasoning remains under-explored. To this end, we explore popular inference-time algorithms—Best of N, Tree of Thought (ToT), and REward BAlanced SEarch (REBASE)—with proposed feedback-driven refinement. Our feedback-driven refinement employs specialized agents: a constraint agent to enforce task instance-specific constraints, and a verifier agent to evaluate plan quality. Furthermore, we hypothesize that test-time scaling can be proportional to instance-level complexity. Thus, we propose an additional selection agent to dynamically optimize algorithm choice. We evaluate our proposed approaches on four different benchmarks, i.e., NATURAL PLAN, GPQA, OlympiadBench, and DocFinQA. Experimental results show that our methods outperform strong baselines, achieving state-of-the-art results in NATURAL PLAN, OlympiadBench , and DocFinQA. Our key findings demonstrate that constraint-guided iterative refinement and algorithm selection improves both planning and downstream reasoning in LLMs
View details
A Foot in the Backdoor
Richard Bondi
Ruben Barroso
Garrett Holthaus
John P. Thomas
(2025)
Preview abstract
We applied systems theory control loops to the 2024 cyberattack https://nvd.nist.gov/vuln/detail/CVE-2024-3094, in which a backdoor was inserted into Linux distros by modifying the xz utils compression package. Our work illustrates how to apply STAMP, CAST, and STPA to cyberattacks, and advantages over traditional threat modeling.
View details
Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications
Yanxiang Zhang
Zheng Xu
Yuanbo Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track) (2025)
Preview abstract
Error correction is an important capability when applying large language models (LLMs) to facilitate user typing on mobile devices. In this paper, we use LLMs to synthesize a high-quality dataset of error correction pairs to evaluate and improve LLMs for mobile applications. We first prompt LLMs with error correction domain knowledge to build a scalable and reliable addition to the existing data synthesis pipeline. We then adapt the synthetic data distribution to match the mobile application domain by reweighting the samples. The reweighting model is learnt by predicting (a handful of) live A/B test metrics when deploying LLMs in production, given the LLM performance on offline evaluation data and scores from a small privacy-preserving on-device language model. Finally, we present best practices for mixing our synthetic data with other data sources to improve model performance on error correction in both offline evaluation and production live A/B testing.
View details
Faster electronic structure quantum simulation by spectrum amplification
Guang Hao Low
Robbie King
Alec White
Rolando Somma
Dominic Berry
Qiushi Han
Albert Eugene DePrince III
arXiv (2025) (to appear)
Preview abstract
We discover that many interesting electronic structure Hamiltonians have a compact and close-to-frustration-free sum-of-squares representation with a small energy gap. We show that this gap enables spectrum amplification in estimating ground state energies, which improves the cost scaling of previous approaches from the block-encoding normalization factor $\lambda$ to just $\sqrt{\lambda E_{\text{gap}}}$. For any constant-degree polynomial basis of fermionic operators, a sum-of-squares representation with optimal gap can be efficiently computed using semi-definite programming. Although the gap can be made arbitrarily small with an exponential-size basis, we find that the degree-$2$ spin-free basis in combination with approximating two-body interactions by a new Double-Factorized (DF) generalization of Tensor-Hyper-Contraction (THC) gives an excellent balance of gap, $\lambda$, and block-encoding costs. For classically-hard FeMoco complexes -- candidate applications for first useful quantum advantage -- this combination improves the Toffoli gates cost of the first estimates with DF [Phys. Rev. Research 3, 033055] or THC [PRX Quantum 2, 030305] by over two orders of magnitude.
https://drive.google.com/file/d/1hw4zFv_X0GeMpE4et6SS9gAUM9My98iJ/view?usp=sharing
View details
Tighter Privacy Analysis for Truncated Poisson Sampling
Arun Ganesh
(2025)
Preview abstract
We give a new privacy amplification analysis for truncated Poisson sampling, a Poisson sampling variant that truncates a batch if it exceeds a given maximum batch size.
View details
Preview abstract
The integration of vector search into databases, driven by advancements in embedding models, semantic search, and Retrieval-Augmented Generation (RAG), enables powerful combined querying of structured and unstructured data. This paper focuses on filtered vector search (FVS), a core operation where relational predicates restrict the dataset before or during the vector similarity search (top-k). While approximate near neighbor (ANN) indices are commonly used to accelerate vector search by trading latency for recall, the addition of filters complicates performance optimization and makes achieving stable, declarative recall guarantees challenging. Filters alter the effective dataset size and distribution, impacting the search effort required. We discuss the primary FVS execution strategies – pre-filtering, post-filtering, and inline-filtering – whose efficiencies depend on factors like filter selectivity, cardinality, and data correlation. We review existing approaches that modify index structures and search algorithms (e.g., iterative post-filtering, filter-aware index traversal) to enhance FVS performance. This tutorial provides a comprehensive overview of filtered vector search, discussing its use cases, classifying current solutions and their trade-offs, and highlighting crucial research challenges and future directions for developing efficient and accurate FVS systems.
View details
Security Assurance in the Age of Generative AI
Tom Grzelak
Kara Olive
Moni Pande
Google, Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043 (2025)
Preview abstract
Artificial Intelligence (AI) is a rapidly growing field known for experimentation and quick iteration, qualities that can pose challenges for traditional enterprise security approaches. Because AI introduces unique assets and surfaces—AI-driven applications, agents, assistants, vast training datasets, the models themselves, and supporting infrastructure—we’re continually updating our security controls, guided by Google’s Secure AI Framework (SAIF).
To address the new challenges, we’ve expanded our traditional security approaches to cover the new attack surfaces by scanning for more types of vulnerabilities, analyzing more intel, preparing to respond to new kinds of incidents, and continually testing our controls in novel ways to strengthen our security posture.
This white paper is one of a series describing our approaches to implementing Google’s SAIF. In this paper we explain how we’re applying security assurance—a cross functional effort aiming to achieve high confidence that our security features, practices, procedures, controls, and architecture accurately mediate and enforce our security policies—to AI development. Security assurance efforts help to both ensure the continued security of our AI products and address relevant policy requirements.
Just as quality assurance (QA) in manufacturing meticulously examines finished products and the processes that create them to ensure they meet quality standards, security assurance serves a complementary role to the broader security efforts within an organization. Those broader security efforts span the design, implementation, and operation of controls to create secure software products; security assurance focuses on verifying and improving those efforts. Security assurance identifies gaps, weaknesses, and areas where controls may not be operating as intended, to drive continuous improvement across all security domains. It’s two-party review in action—security assurance helps build confidence that the software was not just built securely, but continues to run securely.
Since AI systems—those that use AI models for reasoning—present a combination of well understood and novel risks, AI technologies require a combination of both common and novel controls. No matter how strong these controls are, a security assurance program is essential to ensure they are working as intended and that they are continually updated and improved.
The paper opens with an overview of security assurance functions, covering several teams and capabilities that work together to ensure security controls are working across any software development lifecycle, including the AI development lifecycle. In particular, we focus on four functions—Red Teaming, Vulnerability Management, Detection & Response, and Threat Intelligence, and how those work together to address issues through Remediation.
We then describe the features specific to AI that affect assurance functions and give examples of how we’re adapting our approaches to account for AI-specific technologies and risks. We also include guidance for organizations considering creating their own AI assurance programs, including best practices for assuring training data, models, the AI software supply chain, and product integrations.
We intend this paper to be useful for a broad technical audience, including both assurance specialists who are new to AI technologies, and AI developers who are new to assurance practices.
View details
Collaborative Diffusion Model for Recommender System
Gyuseok Lee
Yaochen Zhu
Hwanjo Yu
Yao Zhou
Jundong Li
2025
Preview abstract
Diffusion-based recommender systems (DR) have gained increasing attention for their advanced generative and denoising capabilities. However, existing DR face two central limitations: (i) a trade-off between enhancing generative capacity via noise injection and retaining the loss of personalized information. (ii) the underutilization of rich item-side information. To address these challenges, we present a Collaborative Diffusion model for Recommender System (CDiff4Rec). Specifically, CDiff4Rec generates pseudo-users from
item features and leverages collaborative signals from both real and pseudo personalized neighbors identified through behavioral similarity, thereby effectively reconstructing nuanced user preferences. Experimental results on three public datasets show that CDiff4Rec outperforms competitors by effectively mitigating the loss of personalized information through the integration of item content and collaborative signals.
View details
Enhancing Performance of the Tesseract Decoder for Quantum Error Correction
DRAGANA GRBIC
Laleh Beni
Noah Shutty
2025
Preview abstract
In this paper I describe the performance enchantments I implemented in a quantum-error-correction decoder developed at Google. The decoder is an open-source project and I am documenting the speedups I achieved in this paper.
View details
Dynamical-generative downscaling of climate model ensembles
Tapio Schneider
John Anderson
Fei Sha
Proceedings of the National Academy of Sciences, 122 (2025), e2420288122
Preview abstract
Regional high-resolution climate projections are crucial for many applications, such as agriculture, hydrology, and natural hazard risk assessment. Dynamical downscaling, the state-of-the-art method to produce localized future climate information, involves running a regional climate model (RCM) driven by an Earth System Model (ESM), but it is too computationally expensive to apply to large climate projection ensembles. We propose an approach combining dynamical downscaling with generative AI to reduce the cost and improve the uncertainty estimates of downscaled climate projections. In our framework, an RCM dynamically downscales ESM output to an intermediate resolution, followed by a generative diffusion model that further refines the resolution to the target scale. This approach leverages the generalizability of physics-based models and the sampling efficiency of diffusion models, enabling the downscaling of large multimodel ensembles. We evaluate our method against dynamically downscaled climate projections from the Coupled Model Intercomparison Project 6 (CMIP6) ensemble. Our results demonstrate its ability to provide more accurate uncertainty bounds on future regional climate than alternatives such as dynamical downscaling of smaller ensembles, or traditional empirical statistical downscaling methods. We also show that dynamical-generative downscaling results in significantly lower errors than popular statistical downscaling techniques, and captures more accurately the spectra, tail dependence, and multivariate correlations of meteorological fields. These characteristics make the dynamical-generative framework a flexible, accurate, and efficient way to downscale large ensembles of climate projections, currently out of reach for pure dynamical downscaling.
View details
Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information
Dirk Bergemann
Marek Bojko
Paul Duetting
Haifeng Xu
EC '25: Proceedings of the 26th ACM Conference on Economics and Computation (2025), pp. 507
Preview abstract
We study mechanism design when agents have private preferences and private information about a common payoff-relevant state. We show that standard message-driven mechanisms cannot implement socially efficient allocations when agents have multidimensional types, even under favorable conditions.
To overcome this limitation, we propose data-driven mechanisms that leverage additional post-allocation information, modeled as an estimator of the payoff-relevant state. Our data-driven mechanisms extend the classic Vickrey-Clarke-Groves class. We show that they achieve exact implementation in posterior equilibrium when the state is either fully revealed or the utility is affine in an unbiased estimator. We also show that they achieve approximate implementation with a consistent estimator, converging to exact implementation as the estimator converges, and present bounds on the convergence rate.
We demonstrate applications to digital advertising auctions and large language model (LLM)-based mechanisms, where user engagement naturally reveals relevant information.
View details
Preview abstract
We conduct a theoretical analysis of techniques for preference-based RL from offline datasets annotated with pairwise preferences, such as DPO. We identify key properties of the learning objective that influence the quality of the learned policy, such as the coverage of the offline dataset, the presence or absence of a normalizing baseline and the choice of loss function. Informed by the theory, we further conduct an empirical analysis of some key variants to corroborate our theoretical findings.
View details