BACK
BLOG

Liquid Cooling’s Hidden Risk: Why Commissioning, Controls, and Workforce Readiness Matter as Much as Design

By: Dave Farrell, Senior Vice President-Operations, T5 Services 

As AI data centers move faster, operational maturity is becoming the difference between liquid cooling success and avoidable risk.

That shift to liquid cooling is necessary, but it also introduces a new category of operational risk.

Much of the industry conversation still focuses on the design of liquid cooling systems. That includes the cooling architecture, CDU configuration, piping strategy, fluid type, thermal performance, and the broader engineering approach. These are important decisions. But they do not tell the full story. 

The harder question is what happens after the system is installed.

From my perspective, the industry is still underestimating the operational complexity created when fluid moves from the back-of-house mechanical environment into direct proximity with IT equipment. Historically, water treatment, fluid management, pressure control, and related systems were largely separated from the server environment. Liquid cooling changes that relationship. Disciplines that were once managed mostly behind the scenes are now directly tied to uptime, equipment protection, and customer performance expectations.

For customers deploying high-density AI infrastructure, this distinction matters. Liquid cooling is not just a design challenge. It is a commissioning, controls, workforce, and operations challenge. 

Liquid Cooling Moves Facilities Risk Closer to IT Risk

In traditional air-cooled data centers, many mechanical risks were buffered by physical separation, system redundancy, and mature operating procedures. Water systems, treatment processes, pumps, valves, and related infrastructure were critical, but they generally did not sit directly inside the server environment. 

Liquid cooling changes that model. 

When fluid is brought closer to the rack and, in some cases, directly to the chip, the margin for error becomes much smaller. Pressure management, fluid quality, air in the system, and chemistry control become front-line operational issues. A failure in one of these areas may not be isolated or gradual. It can affect equipment quickly and at scale. 

That is why liquid cooling requires a different operating mindset. Facilities teams are no longer managing support infrastructure in the background. They are managing systems that are directly connected to customer compute performance. 

This does not mean liquid cooling is too risky. It means the risk profile has changed. 

The organizations that succeed will be those that understand that change and build operating models around it. The organizations that struggle will be those that assume liquid cooling can be managed with legacy procedures, generalized training, or traditional commissioning models. 

Process Discipline Depends on Coordination, Not Just Documentation 

Procedures matter. But in liquid cooling environments, procedures are only effective if they are supported by communication, ownership, and real-time coordination. 

My view is that process breakdowns are often less about whether documentation exists and more about whether teams are aligned before work begins. Liquid cooling systems are tightly coupled. Small issues such as pressure fluctuations, trapped air, or incomplete visibility at the point of failure can escalate quickly. In some cases, racks can be affected within minutes.  

That creates a need for clear operating discipline across multiple parties. 

Facilities teams need to understand the system. IT vendors need to understand how their equipment interacts with the cooling environment. Customers need to understand what is being monitored, who owns which boundary, and how issues will be escalated. Operators need visibility as close to the point of failure as possible so they can respond before a minor condition becomes a broader event. 

This is where static procedures fall short. A written procedure is useful, but it cannot replace a coordinated operating model. 

Effective liquid cooling operations require defined communication pathways, clear escalation protocols, known ownership boundaries, and monitoring strategies that reflect how the system behaves in the field. 

The Workforce Gap Is Real 

The data center industry is also facing a workforce readiness problem. 

Most data center technicians were not trained to manage fluid dynamics, water chemistry, pressure control, and liquid cooling system behavior as front-line responsibilities. These disciplines existed in data centers before, but they were not historically embedded into the day-to-day work of rack-level or data hall operations. 

At the same time, customer expectations are becoming more granular. Uptime still matters, but it is no longer the only measure of performance. In liquid-cooled environments, service expectations may extend into temperature, pressure, environmental thresholds, and other system-level conditions that require deeper technical fluency. 

The challenge is compounded by the broader labor market. Traditional talent pipelines, including military backgrounds and skilled trades, are under strain. There is also no single standardized path for developing a data center operator capable of supporting liquid-cooled AI infrastructure at scale.  

This creates a practical question for customers – how is your operating partner building capability? 

T5’s approach is centered on hiring for aptitude and discipline, then developing system-specific expertise through structured onboarding, subject matter expert-led training, and continuous program updates. The emphasis is not on treating liquid cooling as a generic skill. It is on training teams for the systems they operate, then expanding their capability through cross-training and field feedback. 

That distinction matters. Generalized training may create familiarity. Site-specific training creates operational readiness. 

Variation Is More Than a Design Issue 

While many liquid cooling deployments are converging around direct-to-chip, single-phase systems, that does not mean the operating model is standardized. 

Even within similar technical architectures, site conditions can vary significantly. One deployment may have different CDU ownership boundaries than another. One customer may define T5’s responsibility at a different point in the loop. One project may have complete backup power and fully commissioned systems. Another may be brought online in phases while infrastructure is still being completed. 

This variation is not just technical. It is contractual, operational, and organizational. 

That makes leadership fluency especially important. Field technicians need deep knowledge of the specific systems they operate. But operating leaders also need to understand how different architectures, ownership models, and deployment conditions affect risk across multiple environments. 

The larger challenge is not simply learning one liquid cooling system. It is managing variation across real-world deployments while maintaining consistent standards for safety, communication, and performance. 

The Most Overlooked Risk: Commissioning and Controls 

One of the most important points and the least discussed is commissioning, especially controls commissioning, which is becoming a major risk area. 

Speed-to-market pressure is intense. AI customers need capacity quickly. Developers, builders, equipment providers, and operators are all being pushed to compress schedules. In that environment, mechanical systems, electrical systems, and controls may be commissioned independently instead of being fully validated as one integrated operating system. 

That is a problem. 

Liquid cooling depends on the interaction between systems. Mechanical performance, electrical availability, control logic, monitoring, alarms, and operational response all need to work together. If controls are underdeveloped or validated too late, operators can inherit a system that is technically live but not fully understood. 

The result is a prolonged live learning phase. 

During that period, operators are not just running the environment. They are also learning how the system behaves, identifying gaps in controls logic, compensating for incomplete integration, and managing customer expectations in real time. That increases risk, particularly during the first year of operations.  

For customers, this is a critical lesson. Fast deployment is valuable, but only if it does not shift unresolved commissioning risk into live operations. 

Operational Maturity Is a Mindset and a System 

Liquid cooling requires maturity at both the technician level and the program level. 

At the technician level, maturity means being methodical. Operators need to understand what they are doing and why they are doing it. They need to maintain discipline under pressure and return to procedure when conditions become unstable. 

At the program level, maturity means structured procedures, specialized expertise, integrated training, active playbook maintenance, and feedback loops from the field into the operating model. 

This is where many providers fall short. They rely on generalized training, thin staffing, or static documentation. That may be enough in a familiar air-cooled environment. It is not enough for liquid-cooled AI infrastructure. 

T5’s model is built around centralized technical operations support, continuously updated playbooks, system-specific training, and structured drills that help identify readiness before performance is tested in a live event.  

That matters because liquid cooling performance is not proven by design documents. It is proven in the field, under pressure, when systems behave unexpectedly and teams need to respond correctly. 

The Real Test Is What Happens After Deployment 

The industry is moving quickly, and it should. AI infrastructure demand is real, and liquid cooling is becoming essential to support the next generation of compute. 

But speed alone is not a strategy. 

As liquid cooling moves deeper into production environments, the market will increasingly separate providers that can install liquid cooling systems from those that can operate them reliably over time. 

The difference will come down to practical capabilities – commissioning discipline, controls validation, workforce training, communication, escalation, system-specific expertise, and the ability to manage live environments without losing operational control. 

For customers, the question should not be limited to whether a provider can support liquid cooling. The better question is whether they have the operating model to manage the risks liquid cooling introduces. 

That is where T5’s perspective is clear. Liquid cooling reliability is not achieved by infrastructure design alone. It is built through disciplined commissioning, trained teams, integrated controls, and an operating culture that understands the stakes. 

Because in high-density AI environments, the hidden risk is not the presence of fluid. The hidden risk is assuming the system is ready before the operating model is.

About Dave Farrell 
Dave Farrell is Senior Vice President of Operations at T5 Services, where he oversees and supports data center operations across the U.S. and international markets. He brings deep experience in critical facilities management, data center operations, process improvement, commissioning, procedure development, and infrastructure reliability. Before joining T5, Dave held senior operational and engineering roles with JLL supporting Union Pacific’s critical transportation network, CBRE supporting Wells Fargo data center environments, and Schneider Electric supporting a hyperscale Microsoft data center site. He began his career in the electrical trade, with hands-on experience across commercial electrical systems, high-voltage gear, fire alarm systems, PLCs, conduit, switchgear, and data center infrastructure. Dave holds several industry credentials, including OSHA 30, OSHA 10, Accredited Tier Specialist, and UST Class A & B Operator certifications. 

Notification Icon