Learning Resources
Cooling problems are often invisible until they impact uptime. This page brings together practical resources to help operators understand airflow, identify cooling risks, and validate that their systems are performing as expected.
What do you want to learn?
Choose a learning path that matches your skill level and goals.
1. Understand
Airflow
Beginner
Learn how air actually moves through your facility—and why small airflow issues quietly undermine cooling performance.
2. Diagnose
Cooling Problems
Intermediate
Pinpoint the real causes behind hot spots, uneven temperatures, and airflow breakdowns before they turn into bigger issues.
3. Improve
Performance
Advanced
Validate what’s happening at the rack level, improve consistency, and optimize cooling without unnecessary upgrades.
Module 1: Manage Your Airflow
Level: Beginner → Intermediate
Airflow is the foundation of cooling performance, but it’s often misunderstood. This path breaks down how air actually moves through a data center—what works, what fails, and why. The goal is to build a clear mental model so operators can recognize issues early and understand that most cooling problems are airflow problems, not capacity problems.
Topics: Airflow Management | Delta-T | Containment | Uniformity
Module 1.1 Airflow Fundamentals
Airflow in a data center is not just about moving cold air—it’s about directing it with intent. Supply air is only useful if it reaches server inlets, and hot exhaust must return without mixing back into the cold aisle. In most facilities, this balance breaks down in subtle ways: gaps in racks, poor layout decisions, or simply not knowing how air behaves once it leaves the CRAC. The resources in this section walk through how air actually moves through a live environment—not in theory, but in practice. The goal is to build a clear mental model so you can see airflow problems before they show up as temperature alarms.
Start with the fundamentals.
If airflow has ever felt unclear or overly theoretical, this guide breaks it down into simple, practical concepts—how supply, return, bypass, and recirculation actually behave in a real facility.Link: Airflow 101 Guide
Then pressure-test your environment with a simple checklist.
Use this checklist as a quick walkthrough of your data hall. It focuses on the common failure points—rack gaps, leakage, and layout issues—so you can spot problems in minutes, not hours.
Link: Keep Your Data Center Running Smoothly: A Simple Airflow Management ChecklistMany legacy data centers need to increase their IT Load but don’t think they have enough cooling capacity. This article examines a way to free up stranded capacity using the AUDIT-BUDDY system.
Link: Optimize Your Data Center: Mastering the Art of Airflow Supply!
See more articles on Airflow Management here.
Module 1.2 - What is Delta-T?
Delta-T is one of the most talked-about metrics in cooling—and one of the most misunderstood. A single number rarely tells the full story, especially when it’s averaged across a room. In reality, there are multiple Delta-T relationships, each revealing something different about airflow performance, mixing, and efficiency. This section breaks down what Delta-T is actually measuring, where it can mislead you, and how to use it correctly. The goal is to move beyond surface-level metrics and start using Delta-T as a diagnostic tool, not just a number on a dashboard.
Start with the full picture.
There isn’t just one Delta-T—there are four, and each tells you something different about how your cooling system is performing. This guide gives you a clear overview of how they connect.Link: Understanding the 4 Delta-Ts in Data Centers: A Complete Guide to Cooling Efficiency — Purkay Labs
Then focus on the server.
Server Delta-T shows how much heat is actually being removed at the rack level. It’s one of the most direct indicators of whether airflow is doing its job.
Link: What is Server Delta -T and why you should careNext, look at the cooling unit.
Cooling unit Delta-T reflects how effectively your CRAC/CRAH is removing heat from the environment—and whether it’s operating as expected.
Link: Data Center 101: Cooling Unit Delta-TUnderstand what’s returning.
Return air Delta-T helps identify mixing and inefficiencies before air reaches the cooling unit, making it a key area for improving performance.
Link: Delta-T 101: Server Exhaust to CRAC/CRAH Delta-TFinally, validate delivery.
Supply air to server inlet Delta-T shows whether cold air is actually reaching the servers at the right temperature—where it matters most.
See more all our articles on Delta-T here.
Module 1.3: Containment
Containment is one of the most effective ways to improve airflow—but only when it’s done correctly. Partial containment, gaps, or poorly sealed systems can limit or even reverse its benefits. In many environments, containment is installed but not fully optimized, leading to mixed air and inconsistent temperatures. This section focuses on what proper containment looks like in practice, how to identify weaknesses, and how to validate that it’s actually working. The goal is to determine whether your containment strategy is delivering real performance improvements.
Start with cold aisle containment.
This guide breaks down how cold aisle containment works, where it typically fails, and what to look for after installation to ensure it’s performing as expected.Link: Cold Aisle Containment Conundrum: Common Issues and Solutions
Then understand hot aisle containment.
Hot aisle containment follows a different approach, with its own set of challenges around airflow, pressure, and heat removal. This article walks through how to manage it effectively.Link: Mastering the Heat: Navigating the Hot Aisle Containment
Validate your installation.
Containment doesn’t stop at install—performance depends on how well it’s sealed and maintained. This guide outlines what to check post-installation to ensure it’s working as intended.See the impact in practice.
This case study shows how thermal surveys were used before and after containment installation to measure real performance improvements.Link: Case Study: Evaluating Cold Aisle Containment Efficiency
Check out all our containment articles here.
Module 1.4: Uniformity
A well-performing cold aisle should be consistent from top to bottom, but many are not. Temperature variation within the same aisle is usually a sign of airflow imbalance—caused by gaps, leakage, or recirculation. The Uniformity Metric provides a simple way to measure how evenly air is distributed across a rack or aisle. Instead of relying on averages, it shows how consistent conditions actually are. This section focuses on how to use that metric and what influences it in a real environment. The goal is to give you a quick, practical way to assess cooling quality without overcomplicating the process.
Start with the metric itself.
This article introduces the Uniformity Metric and explains how it can be used as a simple, reliable indicator of cooling performance.Link: Precision in Every Pixel: Introducing the Uniformity Evaluation Metric
Then understand what affects it.
Even with containment, small gaps—missing panels, open spaces, or airflow leakage—can reduce uniformity. This deep dive explains how these issues show up in your data and how to catch them early.See how it’s applied in the field.
This case study shows how uniformity was used to evaluate performance before and after containment, giving a clear picture of real-world impact.Link: Case Study: Evaluating Cold Aisle Containment Efficiency
Check out all our uniformity articles here.
Module 2: Diagnose Airflow
Level: Intermediate
Cooling issues rarely happen at random—they follow patterns. This path focuses on how to identify those patterns, from hot spots to airflow failures, and trace them back to their root cause. The goal is to move from reacting to symptoms to clearly understanding what’s going wrong and how to fix it.
Topics: Diagnostic | Troubleshoot Hot Spots | Overcooling | Monitoring | Beyond White Space
Module 2.1: Diagnostics
Diagnosing cooling issues starts with one thing: asking the right questions. Most problems aren’t random—they’re patterns. The challenge is knowing where to look, what to measure, and how to connect what you’re seeing to a root cause before it turns into a real issue. This section focuses on building a structured approach to diagnostics—so you can move from assumptions to real answers, quickly and confidently.
Start with the fundamentals.
If you can’t answer basic questions about your environment—temperatures, airflow behavior, load—you’re guessing. This guide outlines the core questions every operator should be able to answer before diving deeper.Link: Questions Every Data Center Manager Should Have Answers For
Then zoom out to the system.
Cooling doesn’t operate in isolation—power, IT load, and operations are all connected. Understanding how these pieces interact helps you avoid treating symptoms and start identifying real causes.Next, validate with real data.
Thermal surveys provide a live view of what’s actually happening in the white space, helping uncover issues that aren’t visible through standard monitoring or alarms.Link: Strategic Thermal Surveys: How to Elevate Data Center Sustainability for a Greener Future
For teams looking to systematize diagnostics.
Frameworks like Six Sigma can help turn one-off troubleshooting into repeatable processes, making it easier to identify, fix, and prevent issues over time.Link: Applying Six Sigma to Data Center Cooling Optimization
Finally, see the impact.
When diagnostics are done right, they don’t just identify problems—they drive measurable outcomes, from resolving performance issues to unlocking stranded cooling capacity.Link: How We Freed Up 25,000 CFM at a Tier 4 Colocation Facility
Module 2.2: Troubleshooting Hot Spots
Hot spots are rarely random. They tend to show up in the same places because of how air moves—or fails to move—through the room. In many cases, the root cause isn’t obvious and may not even be on the data center floor. Layout constraints, airflow imbalances, and upstream issues can all contribute to localized problems. This section focuses on how to recognize common hot spot patterns and trace them back to their source. The goal is to move from reacting to hot spots to understanding why they occur in the first place.
Start with how to respond.
If you’ve ever felt a warm aisle, dealt with a customer complaint, or responded to a temperature alarm, this guide walks through how to quickly assess and address a hot spot.Then see what gets missed.
In this case study, everything looked normal—until repeated failures revealed a hidden issue. Dynamic heat maps uncovered temperature patterns that static sensors couldn’t detect.Link: Finding the Hot Spot: How Dynamic Heat Maps Helped Solve UPS Battery Failures
Look beyond the white space.
Not all cooling issues originate in the data hall. This case study shows how a chiller-level problem impacted overall performance—and how it was diagnosed.Understand system-level impacts.
Changes in one part of the environment—like tile placement or airflow adjustments—can create unintended consequences elsewhere. This example shows how a cooling dispute was traced back to airflow decisions.
Check out all our troubleshooting articles here.
Module 2.3: Overcooling
Overcooling often feels like the safe choice—but it usually hides inefficiencies. Many facilities push more cold air than necessary to avoid risk, without realizing that excess airflow leads to bypass, mixing, and reduced effectiveness. This not only increases energy costs, it can also mask underlying airflow problems that never get addressed. This section focuses on how overcooling shows up in real environments, how to recognize it, and how to correct it without introducing risk. The goal is to strike the right balance between reliability and efficiency.
Start by challenging the assumption.
Lower temperatures don’t always mean safer operations. This article explains how overcooling can create new problems while hiding the real issue.Then understand the impact on capacity.
In colocation environments, overcooling directly affects how much usable cooling capacity you actually have. This resource shows how to identify and recover that lost capacity.Link: Combat Overcooling in Your Data Center with an AUDIT-BUDDY Thermal Audit
Validate your environment.
Use this checklist to quickly assess whether your facility is overcooled and identify the steps needed to correct it.Link: Are You Overcooling Your Data Center? Use This Checklist to Verify
Check out all articles on overcooling here.
Module 2.4: Monitoring the White Space
Monitoring has become more important—and more complex—than ever. As power densities increase and environments evolve, many legacy systems no longer provide the visibility needed to fully understand what’s happening on the floor. Tracking temperature alone isn’t enough. Effective monitoring means knowing where to measure, what to measure, and how to interpret it across different environments. This section focuses on how to expand visibility, use the right tools, and monitor both equipment and operator conditions. The goal is to move from passive monitoring to actionable insight.
Start by getting more from your tools.
If you already have monitoring systems in place, this guide shows how to use them more effectively to improve cooling performance and prevent issues.Link:So I Bought an AUDIT-BUDDY Years Ago, What Do I Do With It?
Then think beyond equipment.
Monitoring isn’t just about systems—it’s also about the people operating them. This article highlights how heat and environmental conditions impact staff on the floor.Link:Staying Cool: Lessons from the Golf Course to the Data Center
Understand heat stress.
Metrics like WBGT help quantify environmental stress, especially in high-temperature or outdoor environments.Measure what matters for safety.
The Heat Index provides another way to assess operator risk in hot aisle conditions and ensure safe working environments.Balance performance and safety.
As temperatures increase for efficiency, monitoring becomes critical to ensure both equipment and personnel remain within safe limits.Don’t overlook smaller spaces.
Network closets and distributed environments often lack proper monitoring, creating hidden risks if left unchecked.Track humidity and environmental balance.
Temperature alone isn’t enough—humidity plays a key role in both equipment performance and environmental stability.Validate your sensors.
Monitoring is only useful if the data is accurate. This case study shows how validating sensors led to better operational decisions.Choose the right monitoring approach.
Understanding the difference between permanent and temporary monitoring helps ensure you’re capturing the right data at the right time.Link:Two Types of Environmental Monitors
See more articles on Monitoring here.
Module 2.5: Monitoring Beyond the White Space
Airflow problems don’t stop at the data hall. Hot spots, bypass airflow, and recirculation can show up anywhere cooling and heat interact—often in the spaces operators check the least. Chiller plants, UPS rooms, battery rooms, and electrical rooms all play a role in overall system performance, and failures here can ripple back into the white space. These environments are typically monitored at a high level, if at all, which means issues can develop quietly over time. The resources in this section focus on how airflow behaves outside the white space, where problems tend to hide, and what to look for when validating these critical areas. The goal is to expand visibility beyond the data hall and ensure the entire cooling ecosystem is performing as expected.
Start at the source.
Chiller-level issues are easy to overlook, but they can quietly reduce performance and efficiency across the entire cooling system.Then look at critical infrastructure.
UPS environments can develop localized hot spots and airflow issues that lead to overheating and reduced equipment life.Link: https://www.purkaylabs.com/news/upsbatteryfailure?rq=optimize%20airflow
Don’t ignore battery rooms.
Small temperature variations and hidden airflow issues can accelerate degradation and impact long-term reliability.Check supporting spaces, like Electrical Rooms.
Electrical rooms often face the same airflow challenges as the white space, creating hidden risks if left unmonitored.Link: https://www.purkaylabs.com/news/airflowmanagementoutsidethewhitespace
See more articles on Monitoring Beyond the White Space here.
Module 3: Improve Performance
Level: Advanced
Once issues are identified, the next step is proving that fixes are working and maintaining performance over time. This path focuses on validating airflow, improving consistency, and using data to drive better decisions. The goal is to create a stable, efficient system that performs reliably without unnecessary upgrades or guesswork.
Topic: Efficiency | Operations | Preventative Maintenance | Resiliency
Module 3.1: Efficiency
Improving efficiency is about using cooling effectively, not just using less of it. Many facilities focus on set-points and energy reduction without addressing airflow, which often leads to higher energy use and reduced cooling capacity. When airflow is unmanaged, cold air doesn’t reach where it’s needed, and systems work harder to compensate. This section focuses on practical ways to improve efficiency by fixing airflow first, not just adjusting settings. The goal is to create a system that is both reliable and efficient—without sacrificing performance.
Start with quick wins.
These are simple, high-impact changes that can immediately improve cooling efficiency without major investment or disruption.Link: http://purkaylabs.com/news/5-quick-wins-for-immediate-cooling-efficiency
Then use data to guide decisions.
Small, targeted measurements can reveal where efficiency is being lost and help identify the most effective adjustments.Link: http://purkaylabs.com/news/changetosupplytempcontrol-ka7fl
Challenge common assumptions.
Many widely accepted “best practices” in cooling are based on outdated thinking. This article breaks down common myths and what to do instead.Link: http://purkaylabs.com/news/debunking-common-myths-about-data-center-cooling
Find hidden opportunities.
Thermal surveys are one of the fastest ways to uncover inefficiencies and identify areas for improvement across your facility.Link: http://purkaylabs.com/news/5o23ob6e9rmkg3g2aq5ov1dttjl1re
Check out all our articles on efficiency here.
Module 3.2: Operations
Cooling performance is maintained—or lost—through daily operations. Even well-designed systems can underperform if processes aren’t clear or consistently followed. Small changes—adjusting tiles, moving equipment, shifting loads—can have unintended impacts on airflow and temperature. Effective operations require coordination, clear procedures, and an understanding of how the system behaves as a whole. This section focuses on how to manage, monitor, and maintain cooling performance in day-to-day operations. The goal is to make airflow management consistent, predictable, and repeatable.
Start with a clear framework.
This playbook outlines how to approach cooling as an integrated system, helping operators make better decisions across airflow, power, and IT load.Then use data to operate with confidence.
With the right visibility, you can safely increase set-points without compromising reliability—improving efficiency while maintaining performance.Understand the cost of your decisions.
Cooling has a direct impact on operating costs. This guide provides a simple way to evaluate and manage cooling expenses in a live environment.Simple checklist in evaluating cooling cost in an operational data center
Link: The Cost of Cooling: How to Budget Smartly for Your Data Center
Check out all our articles on operations.
Module 3.3: Preventative Maintenance
Cooling performance doesn’t stay constant. As equipment is added, layouts change, and density increases, airflow patterns shift—often without being noticed. What worked six months ago may no longer be effective today. Preventative maintenance is about catching these changes early, before they turn into hot spots, inefficiencies, or alarms. This section focuses on what to check, how often to check it, and how to make airflow validation part of your routine. The goal is to maintain consistent performance and avoid reactive fixes.
Start with a simple checklist.
Use this as a quick way to review your environment and identify early signs of airflow or cooling issues before they escalate.Then maintain your cooling systems.
Your HVAC system plays a critical role in overall performance. This guide outlines what to monitor and maintain to keep it operating effectively.Make validation part of your process.
Thermal assessments shouldn’t be one-off efforts. This article explains how to integrate them into your regular maintenance routine.Link: http://purkaylabs.com/news/howimportantispreventativemaintenance
Use audits to stay ahead.
Thermal audits provide a proactive way to catch issues early and ensure your cooling system continues to perform as expected.
Check out all our preventative articles here.
Module 3.4: Resiliency
Data centers are designed with redundancy—but real-world changes make it difficult to know how much cooling capacity you actually have. As equipment is added, layouts shift, and loads increase, original design assumptions become less reliable. True resiliency comes from testing how your system performs under real conditions, not just trusting what it was designed to do. This section focuses on how to validate cooling capacity, plan for unexpected scenarios, and use data to strengthen your system over time. The goal is to ensure your environment can handle stress without failure.
Start by testing real performance.
This case study shows how cooling resiliency was evaluated in a live data center, helping quantify actual reserve capacity under real conditions.Link:Testing Cooling System Resiliency in a Live Data Center
Then plan for the “what if.”
Resiliency starts with asking the right questions. This guide walks through a simple framework for thinking through failure scenarios and planning ahead.Use data to stay ahead.
Operational data can be used to predict and prevent future issues. This article explores how advanced approaches can improve long-term resiliency.Link:Using Machine Learning to Protect Your Data Center from Future Thermal Runaway Situations
See more articles on Resiliency here.
Check out all our articles on resiliency
Need More Hands-On Support?
You don't have the time to do everything yourself? As experienced Cooling Consultants, we help teams unlock the platform's full potential for their specific needs.