The Power Problem Behind AI Data Center Performance

By James Kolb

Director of Operations

hi-tequity

November 11, 2025

Blog

The Power Problem Behind AI Data Center Performance

While the drive to adopt AI tends to focus on semiconductor shortages and algorithm optimization, the larger constraint is the power infrastructure. According to a recent Deloitte survey of 120 US-based power company and data center executives, 72% identify power and cooling limitations as significant barriers to AI data center growth over the next three to five years.

Modern AI workloads demand three to five times higher power densities than traditional data center applications, with GPU clusters requiring up to 100kW per rack compared to the 10-15kW typical of conventional servers. The increased power consumption affects every part of the embedded system design, from chip-level thermal management to rack-level cooling distribution.

Having the power available isn't the only concern. AI workloads stress traditional power delivery systems in unconventional ways. Training large language models (LLMs) creates power spikes that require infrastructure capable of handling both sustained high loads and rapid transients. Faced with these kinds of challenges, engineers are having to rethink power distribution unit (PDU) sizing, uninterruptible power supply (UPS) capacity, and backup generator specifications.

Thermal Management at Breaking Point

The heat load generated by these high-density AI clusters exceeds the capabilities of conventional air cooling. Liquid cooling solutions are becoming necessary for AI deployments at the rack level. However, implementing them at scale necessitates a redesign of traditional board and rack architectures.

When processors operate at sustained high utilization rates, chip-to-chip thermal management becomes critical. The problem with AI is that there are no duty cycles that allow for thermal recovery to maintain consistent high-temperature operation. With AI, the need for sophisticated thermal interface materials, heat spreaders, and cooling distribution networks arises. The thermal design power (TDP) ratings that were adequate for traditional system design are now insufficient for sustained AI workloads.

Grid-Scale Bottlenecks

As the Deloitte survey revealed, power capacity is a primary point of competition in resource allocation, and this constraint impacts deployment timelines and system performance.

New power generation capacity faces significant challenges in terms of lead time, as power plant projects might not become available until the 2030s. Additionally, renewable energy projects with battery storage may face delays in transmission infrastructure that can extend over a decade. However, AI development cycles require a six-month sprint to completion.

A "Power-First" Methodology

Engineers must now balance computational throughput against power efficiency, implementing dynamic voltage and frequency scaling (DVFS) techniques optimized for AI workload requirements. This new power-first approach requires a budget allocation that adjusts processing intensity based on the available power headroom.

Sites based on traditional connectivity or real estate costs may not be the best locations for a successful AI deployment, as AI requirements necessitate different power infrastructure capacities. Alternatively, a power-first approach utilizes "stranded power" assets and innovative power purchase agreements (PPAs) to unlock capacity more quickly than traditional development timelines allow.

Supply Chain and Component Optimization

The challenge of constrained power ripples throughout the supply chain, affecting the availability of critical components, including transformers, switchgear, cooling distribution units (CDUs), and backup power systems. Long-lead equipment must be factored into the design process, even before final system specifications are complete. This means that factoring in more flexible, modular design approaches that allow for substitutions without compromising system performance needs to be in place.

More sophisticated power management integrated circuits (PMICs) that provide fine-grained control over power domains within AI accelerator chips are also called for. These solutions enable dynamic power gating, voltage regulation optimization, and thermal throttling that maximizes computational performance within available power budgets.

Conclusion

AI is forcing a fundamental rethink of embedded system design. Power must move from a constraint considered late in the cycle to the first design parameter.. This shift involves early collaboration between embedded system designers, power engineers, and facility planners to ensure optimal alignment between computational requirements and infrastructure capabilities.

Developers who master power-efficient AI deployment will have a competitive advantage in a world that will continue to face power constraints. Delivering on AI’s promises is not just about evolutionary chipsets; it also requires reimagining how we build and deploy computational infrastructure at scale.

James Kolb is Director of Operations at hi-tequity, specializing in AI-ready power infrastructure and mission-critical systems. With a background in aerospace, defense, and energy manufacturing, he leads operations and engineering initiatives that optimize reliability, capacity, and performance in high-density data center environments.

Embedded Computing Design

The Power Problem Behind AI Data Center Performance

By James Kolb

Thermal Management at Breaking Point

Grid-Scale Bottlenecks

A "Power-First" Methodology

Supply Chain and Component Optimization

Conclusion

Categories

HPC/Datacenters

Trending Articles

Advantech's High-Performance Workstation Leverages Intel Xeon W

Accelerating Innovation: Wi-Fi HaLow for Industrial IoT & CodeFusion Studio for AI Development

Microservice Store Launches Day 1 at embedded world North America

The Road to embedded world North America: BIWIN Introduces Efficient Memory for Embedded, Automotive, and Industrial Applications

NTX Embedded Introduces Octolux Industrial HMI Display Solution

Consumer

TDK Adds SmartMotion for Smart Glasses to its Custom Sensing Solutions for AI Glasses and Augmented Reality

Healthcare

Overcome Wi-Fi connectivity challenges for medical devices in hospitals and medical institutions

Open Source

Embedded Executive: Commercial RTOS Goes Open Source | Micro Digital

HPC/Datacenters

The Power Problem Behind AI Data Center Performance