Nov 4, 2024
ASUS Thermal Management Solutions for Data Centers
Cooling
In contemporary data centers, a significant portion of power consumption is devoted to heat dissipation. Ensuring the operational efficiency of AI data centers through effective cooling systems is a pressing challenge that ASUS, as a member of RE100, is committed to addressing.
According to IEA 50, a significant portion of energy—up to 20% of total electricity in buildings globally—is used to keep data centers cool. As the demand for data processing skyrockets, the need for innovative, efficient cooling solutions also grows. Cooling is not only a technical challenge but also a critical component of sustainable operations.
ASUS provides a comprehensive array of solutions tailored for data centers and servers. We offer ready-to-deploy liquid or air-cooled computing clusters that are compatible with NVIDIA HGX H100, H200, GB200 and other solutions. Our extensive cabinet-level liquid cooling solutions—from CPU cold plates to cooling distribution units and cooling towers—are designed to reduce power consumption in data centers.
Let’s delve into various solutions that encompass advanced cooling technologies, including liquid and air-cooling systems, energy management systems, and high-efficiency designs.
Effective Air Cooling Solutions
In traditional data centers, a combination of air conditioning units and server fans is typically employed to regulate the temperature within the server room.
Advantages of Air Cooling
1. Cost-Effectiveness: Air cooling systems are generally less expensive and easier to maintain than liquid cooling systems, making them particularly suitable for small to medium-sized data centers.
2. Flexibility: Air cooling systems are easy to install and can be adjusted based on equipment configurations.
3. Universality: The combination of fans and air conditioning is a traditional cooling method applicable to most equipment.
Limitations of Air Cooling
1. Efficiency Issues: As server density increases, air cooling alone may struggle to effectively manage heat output, leading to reduced cooling efficiency and increased energy costs.
2. Environmental Impact: In extreme weather conditions, the effectiveness of air cooling systems can be limited, necessitating consideration of environmental factors in cooling strategies.
3. Noise Concerns: The operation of fans generates noise, which may be unsuitable for environments requiring quiet conditions.
Revolutionizing Data Centers With Liquid Cooling Solutions
These systems are divided into two main types: liquid-to-air and liquid-to-liquid. Unlike traditional air-cooling solutions, liquid cooling significantly reduces energy consumption, minimizes heat hotspots, and allows for higher power densities. As a result, data centers can achieve improved operational efficiency, lower carbon footprints, and enhanced reliability.
1. Liquid-to-Air Solutions
Liquid-to-air solutions utilize liquid as the cooling medium, ultimately relying on fans for heat dissipation. In this configuration, a Cooling Device Unit (CDU) is installed in the server room, allowing for modifications without extensive reconstruction of existing racks.
In the cooling process for GPUs, a metal cold plate is mounted directly on the GPU. Heat is transferred to the coolant through tubes and a manifold. The heated liquid flows from the CDU into a dry cooler, where it is cooled before returning to the CDU, creating a continuous cycle.
For non-GPU cooling, the system uses cold aisles to draw in air, with GPU fans expelling warm air into hot aisles. This hot air is then cooled by the main air conditioning system, returning the cooled air to the cold aisles.
Considerations
-Cost-Effectiveness: Lower installation costs for existing servers.
- Compatibility: Suitable for various rack types, enhancing cooling efficiency.
Benefits
- Improved cooling efficiency, especially for high-density servers.
- Reduced energy consumption, enhancing overall operational efficiency.
Limitations
- Air circulation depends on fan operation, necessitating regular maintenance.
- Sensitive to ambient temperature, with effectiveness varying based on environmental conditions.
2. Liquid-to-Liquid Solutions
Liquid-to-liquid solutions are ideal for environments requiring high-efficiency cooling, particularly for large server clusters. These systems circulate liquid directly between chillers and servers.
In the cooling process, coolant flows from a chiller into the server cooling system, directly contacting internal heat sources. The heated liquid then returns to the chiller, maintaining a temperature around 40 degrees Celsius. In racks, cold plates are affixed to operational GPUs, with the CDU and manifold controlling liquid flow for effective heat dissipation.
Benefits
- Superior cooling performance capable of handling higher thermal loads.
- Achieves lower operational temperatures, enhancing server stability.
Limitations
- Complex system design requiring careful planning of pipeline layout and flow control.
- Higher installation and maintenance costs, with potential risks of liquid leakage necessitating robust maintenance protocols.
Liquid cooling architectures provide efficient thermal management solutions in data centers, with both liquid-to-air and liquid-to-liquid solutions offering unique advantages and challenges. Selecting the appropriate solution should consider practical requirements, budget constraints, and future scalability to ensure the efficient and stable operation of the data center.
Choosing the Right Cooling Solution for Your Data Center
With groundbreaking advancements in CPU and GPU technology, coupled with soaring data processing demands, the computational load on servers has surged, leading to a dramatic increase in heat generation. Power Usage Effectiveness (PUE) becomes critically important, serving as a key metric for evaluating energy efficiency in data centers. When selecting the most suitable cooling solution, several key factors must be carefully considered to ensure efficient operation and long-term sustainability.
1. Power Consumption: The energy demands of cooling systems directly affect operational costs and energy efficiency. Different cooling technologies—such as full liquid cooling, water cooling, and air cooling—exhibit significant variations in their power requirements. It's essential to strike a balance between initial investment and long-term operational costs.
2. Environmental Impact: The environmental implications of cooling systems cannot be overlooked. For instance, full liquid cooling systems may require more facilities and maintenance, while air cooling systems could increase noise and airflow issues. Understanding these impacts is vital for choosing the most environmentally compliant solution.
3. System Compatibility: Cooling solutions must be compatible with existing server architectures and infrastructure. When selecting cooling solutions, it's important to consider how well it integrates with current equipment to avoid additional retrofit costs and unnecessary complexity.
ASUS offers a diverse range of cooling solutions tailored to meet the unique needs of various data centers, complemented by extensive service support.
Cooling Technology | Advantages | Disadvantages | Performance | Cost |
Closed loop with cold-plate-water/glycol mix |
Extend cooling air cooled to customer |
Limited expansion of cooling 25-30% | 400w-600w @35°C | Moderate |
Open loop with cold-plate-water |
Easy service Reduces config limitation |
Limited for extreme TDP | 1500w@45°C | Medium |
Open loop with cold-plate-two phase dielectric |
High performance Easy service Reduce config limitation |
Manage GWP of fluid new technology at scale | 2000w+@45°C | High |
To ensure each server performs optimally in real-world environments, ASUS offers essential verification and acceptance services. These services include rigorous checks of power, network, GPU cards, voltage, and temperature to ensure smooth operation. Comprehensive testing before handover further ensures effective real-world performance.
Learn more about ASUS total infrastructure solution: AI Server and Infrastructure Solutions