液面下的微软 - 从水下数据中心到两相浸没液冷


微软正寻求采用沸腾液体以冷却数据中心服务器

To cool datacenter servers, Microsoft turns to boiling liquid


John Roach Apr 6, 2021



译 者 说

在数据中心可持续战略之下,微软正在寻求多条技术路线:燃料电池、海底数据中心、两相浸没液冷。后两者之间的区别在于:是把数据中心沉入海底还是把大海带进数据中心?




哥伦比亚河东岸一座数据中心内,一个装有计算机服务器的钢制液体箱。当微软员工之间发送电子邮件和其他通信数据进入其中,储罐内的液体便开始沸腾。

Emails and other communications sent between Microsoft employees are literally making liquid boil inside a steel holding tank packed with computer servers at this datacenter on the eastern bank of the Columbia River.


与水不同,沙发形液箱内的液体对电子设备无害,其沸点为 122 (50 ),比水的沸点低 90 。

Unlike water, the fluid inside the couch-shaped tank is harmless to electronic equipment and engineered to boil at 122 degrees Fahrenheit, 90 degrees lower than the boiling point of water.


服务器运行产生的沸腾效应,不断将计算机处理器的热量带走。低温沸腾使服务器能够以全功率持续运行,而不会因过热而出现故障。

The boiling effect, which is generated by the work the servers are doing, carries heat away from laboring computer processors. The low-temperature boil enables the servers to operate continuously at full power without risk of failure due to overheating.


在液箱内部,从沸腾的液体中产生的蒸汽与液箱盖中的冷凝器接触,使蒸汽变成液体并滴落回浸没式服务器上,形成一个闭环冷却系统。

Inside the tank, the vapor rising from the boiling fluid contacts a cooled condenser in the tank lid, which causes the vapor to change to liquid and rain back onto the immersed servers, creating a closed loop cooling system.


“我们是第一家在生产环境中运行两相浸没式冷却的云服务商,”位于华盛顿州雷德蒙德的微软数据中心高级开发团队的首席硬件工程师Husam Alissa 说。

“We are the first cloud provider that is running two-phase immersion cooling in a production environment,” said Husam Alissa, a principal hardware engineer on Microsoft’s team for datacenter advanced development in Redmond, Washington.




Azure 的首席软件工程师 Ioannis Manousakis(左)和微软数据中心高级开发团队的首席硬件工程师Husam Alissa(右)在微软数据中心检查两相浸没式冷却箱的内部。

Ioannis Manousakis, a principal software engineer with Azure (left), and Husam Alissa, a principal hardware engineer on Microsoft’s team for datacenter advanced development (right), inspect the inside of a two-phase immersion cooling tank at a Microsoft datacenter.



数据中心的摩尔定律

Moore’s Law for the datacenter



两相浸没式冷却的生产环境部署是微软长期计划的下一步,在计算机芯片空气冷却技术的可靠进步放缓之际,该部署可满足对更快、更强大的数据中心计算机的需求。

The production environment deployment of two-phase immersion cooling is the next step in Microsoft’s long-term plan to keep up with demand for faster, more powerful datacenter computers at a time when reliable advances in air-cooled computer chip technology have slowed.


几十年来,芯片的进步源于将更多晶体管封装到相同尺寸芯片上的能力,在不增加电力需求的情况下,计算机处理器的速度大约每两年增加一倍。

For decades, chip advances stemmed from the ability to pack more transistors onto the same size chip, roughly doubling the speed of computer processors every two years without increasing their electric power demand.


这种倍增现象在英特尔联合创始人戈登•摩尔之后被称为摩尔定律,他在1965 年观察到这一趋势并预测它将持续至少十年。摩尔定律一直持续到2010 年代,现在开始放缓。

This doubling phenomenon is called Moore’s Law after Intel co-founder Gordon Moore, who observed the trend in 1965 and predicted it would continue for at least a decade. It held through the 2010s and has now begun to slow.


那是因为晶体管的宽度已经缩小到原子尺度并达到了物理极限。与此同时,人工智能等高性能应用对更快计算机处理器的需求也在加速增长,Alissa指出。

That’s because transistor widths have shrunk to the atomic scale and are reaching a physical limit. Meanwhile, the demand for faster computer processors for high performance applications such as artificial intelligence has accelerated, Alissa noted.


为了满足对性能的需求,计算行业已在寻求能够处理更多电能的芯片架构。例如,中央处理单元或CPU 已从每个芯片的150 W增加到300 W。图形处理单元或GPU 已增加到每芯片700 W。通过这些处理器传输的电能越多,芯片就越热。为防止芯片增加的热量导致故障,冷却要求有也随之提高。

To meet the need for performance, the computing industry has turned to chip architectures that can handle more electric power. Central processing units, or CPUs, have increased from 150 watts to more than 300 watts per chip, for example. Graphics processing units, or GPUs, have increased to more than 700 watts per chip. The more electric power pumped through these processors, the hotter the chips get. The increased heat has ramped up cooling requirements to prevent the chips from malfunctioning.


“空气冷却是不够的,”微软雷德蒙德数据中心高级开发小组的杰出工程师兼副总裁Christian Belady 说。“这就是激励我们采用浸没式冷却的原因,我们可以直接蒸发掉芯片的表面产生的热量。”他指出,液体传热效率比空气高几个数量级。

“Air cooling is not enough,” said Christian Belady, distinguished engineer and vice president of Microsoft’s datacenter advanced development group in Redmond. “That’s what’s driving us to immersion cooling, where we can directly boil off the surfaces of the chip.” Heat transfer in liquids, he noted, is orders of magnitude more efficient than air.


此外,他补充说,寻求液体冷却为整个数据中心带来了类似摩尔定律的思维方式。“液体冷却使我们的设备密度可以更高,从而在数据中心级别延续摩尔定律的趋势,”他说。

What’s more, he added, the switch to liquid cooling brings a Moore’s Law-like mindset to the whole of the datacenter.“Liquid cooling enables us to go denser, and thus continue the Moore’s Law trend at the datacenter level,” he said.



微软数据中心高级开发小组的杰出工程师兼副总裁克里斯蒂安·贝拉迪(Christian Belady)站在微软数据中心的两相浸没式冷却箱旁边。

Christian Belady, distinguished engineer and vice president of Microsoft’s datacenter advanced development group, stands next to a two-phase immersion cooling tank at a Microsoft datacenter.



来自加密货币矿工的经验

Lesson learned from cryptocurrency miners



Belady指出,液体冷却是一项经过验证的技术。今天道路上的大多数汽车都依靠它来防止发动机过热。包括微软在内的几家科技公司正在试验冷板技术,其中液体通过金属板输送,以冷却服务器。

Liquid cooling is a proven technology, Belady noted. Most cars on the road today rely on it to prevent engines from overheating. Several technology companies, including Microsoft, are experimenting with cold plate technology, in which liquid is piped through metal plates, to chill servers.


加密货币行业的参与者开创了计算设备的液体浸没式冷却,用它来冷却那些记录数字货币交易的芯片。

Participants in the cryptocurrency industry pioneered liquid immersion cooling for computing equipment, using it to cool the chips that log digital currency transactions.


微软研究了液体浸没作为高性能计算应用(如AI)的冷却解决方案。此外,调查显示,两相浸没式冷却可将所有指定服务器的功耗降低5% 至15%。

Microsoft investigated liquid immersion as a cooling solution for high-performance computing applications such as AI. Among other things, the investigation revealed that two-phase immersion cooling reduced power consumption for any given server by 5% to 15%.


这些发现促使微软团队与数据中心IT 系统制造商和设计师Wiwynn合作开发两相浸没式冷却解决方案。第一个解决方案目前位于昆西运行的微软数据中心。那个沙发形的液箱里装满了3M 的工程流体。3M的液体冷却液具有介电特性,使其成为有效的绝缘体,可以让服务器能够在完全浸入液体中时正常运行。

The findings motivated the Microsoft team to work with Wiwynn, a datacenter IT system manufacturer and designer, to develop a two-phase immersion cooling solution. The first solution is now running at Microsoft’s datacenter in Quincy. That couch-shaped tank is filled with an engineered fluid from 3M. 3M’s liquid cooling fluids have dielectric properties that make them effective insulators, allowing the servers to operate normally while fully immersed in the fluid.


微软技术研究员兼企业副总裁兼Azure 计算首席架构师Marcus Fontoura 表示,这种向两相液体浸没式冷却的转变提高了云资源有效管理的灵活性。例如,管理云资源的软件可以将数据中心计算需求的短时峰值分配给液冷箱中的服务器。这是因为这些服务器可以在更高的功率下运行(这个过程称为超频),而不会有过热的风险。

This shift to two-phase liquid immersion cooling enables increased flexibility for the efficient management of cloud resources, according to Marcus Fontoura, a technical fellow and corporate vice president at Microsoft who is the chief architect of Azure compute. For example, software that manages cloud resources can allocate sudden spikes in datacenter compute demand to the servers in the liquid cooled tanks. That’s because these servers can run at elevated power – a process called overclocking – without risk of overheating.


“例如,我们知道使用 Teams(微软云会议软件) 时,当你到达1 点或2 点时,会出现巨大的峰值,因为人们同时加入会议,”Fontoura说。“浸没式冷却使我们能够更灵活地处理这些突发性工作负载。”

“For instance, we know that with Teams when you get to 1 o’clock or 2 o’clock, there is a huge spike because people are joining meetings at the same time,” Fontoura said. “Immersion cooling gives us more flexibility to deal with these burst-y workloads.”



沸腾的液体带走微软数据中心的计算机服务器产生的热量。微软是第一家在生产环境中运行两相浸没式冷却的云服务商。

Boiling liquid carries away heat generated by computer servers at a Microsoft datacenter. Microsoft is the first cloud provider to run two-phase immersion cooling in a production environment. Photo by Gene Twedt for Microsoft.



可持续数据中心

Sustainable datacenters



Fontoura 补充说,将两相浸没式冷却服务器添加到可用计算资源的组合中,还将实现机器学习软件在整个数据中心更有效地管理这些资源,包括电力、冷却、维护技术人员。

Adding the two-phase immersion cooled servers to the mix of available compute resources will also allow machine learning software to manage these resources more efficiently across the datacenter, from power and cooling to maintenance technicians, Fontoura added.


“我们不仅会对效率产生巨大影响,还会对可持续性产生巨大影响,因为您要确保不会产生浪费,我们部署的每一台IT 设备都将得到充分利用,”他说。

“We will have not only a huge impact on efficiency, but also a huge impact on sustainability because you make sure that there is not wastage, that every piece of IT equipment that we deploy will be well utilized,” he said.


液体冷却也是一种无水技术,它将帮助微软兑现其在本世纪末补充的水量超过其消耗量的承诺。

Liquid cooling is also a waterless technology, which will help Microsoft meet its commitment to replenish more water than it consumes by the end of this decade.


穿过液箱并使蒸汽冷凝的冷却盘管连接到一个单独的闭环系统,该系统使用流体将热量从液箱传递到液箱容器外的干式冷却器。Alissa解释说,由于这些盘管中的流体总是比环境空气更热,因此无需喷水来调节空气以进行蒸发冷却。

The cooling coils that run through the tank and enable the vapor to condense are connected to a separate closed loop system that uses fluid to transfer heat from the tank to a dry cooler outside the tank’s container. Because the fluid in these coils is always warmer than the ambient air, there’s no need to spray water to condition the air for evaporative cooling, Alissa explained.


微软与基础设施行业合作伙伴也在研究如何以减少流体损失且对环境几乎无影响的方式运行冷却箱。Azure首席软件工程师Ioannis Manousakis 说:“如果效果理想的话,两相浸没式冷却将同时满足我们所有的成本、可靠性和性能要求,而能耗仅仅是空气冷却的一小部分。”

Microsoft, together with infrastructure industry partners, is also investigating how to run the tanks in ways that mitigate fluid loss and will have little to no impact on the environment. “If done right, two-phase immersion cooling will attain all our cost, reliability and performance requirements simultaneously with essentially a fraction of the energy spend compared to air cooling,” said Ioannis Manousakis, a principal software engineer with Azure.



微软团队正在探索两相浸没式冷却技术。从左到右:数据中心运营管理人员Dave Starkenburg、微软数据中心高级开发团队杰出工程师兼副总裁Christian Belady、Azure首席软件工程师Ioannis Manousakis 和微软数据中心高级开发团队首席硬件工程师 Husam Alissa。

A Microsoft team is exploring two-phase immersion cooling technology. Pictured from left to right: Dave Starkenburg, datacenter operations management, Christian Belady, distinguished engineer and vice president of Microsoft’s datacenter advanced development group, Ioannis Manousakis, principal software engineer with Azure, and Husam Alissa, principal hardware engineer on Microsoft’s team for datacenter advanced development.



“我们把大海带到了服务器上”

‘We brought the sea to the servers’



微软对两相浸没式冷却的调查是该公司多路线发展战略的一部分,该战略旨在使数据中心的构建、运营和维护更加可持续和高效。例如,数据中心高级开发团队也在探索使用氢燃料电池代替柴油发电机在数据中心进行备用发电的潜力。

Microsoft’s investigation into two-phase immersion cooling is part of the company’s multi-pronged strategy to make datacenters more sustainable and efficient to build, operate and maintain. For example, the datacenter advanced development team is also exploring the potential to use hydrogen fuel cells instead of diesel generators for backup power generation at datacenters.


该液体冷却项目类似于微软的 Natick 项目,该项目正在探索水下数据中心的潜力,这些数据中心可以快速部署,并且可以在密封在类似潜艇的管道内,在海床上运行多年,而无需人工进行任何现场维护。

The liquid cooling project is similar to Microsoft’s Project Natick, which is exploring the potential of underwater datacenters that are quick to deploy and can operate for years on the seabed sealed inside submarine-like tubes without any onsite maintenance by people.



水下数据中心没有使用工程流体,而是充满了干燥的氮气。服务器通过风扇和换热管道系统冷却,该管道系统通过密封管泵送海水。

Instead of an engineered fluid, the underwater datacenter is filled with dry nitrogen air. The servers are cooled with fans and a heat exchange plumbing system that pumps piped seawater through the sealed tube.


Natick 项目的一个重要发现是,海底服务器的故障率是陆地数据中心同样服务器的八分之一。初步分析表明,避免了湿气和氧气的腐蚀,是服务器在水下具有卓越性能的主要原因。

A key finding from Project Natick is that the servers on the seafloor experienced one-eighth the failure rate of replica servers in a land datacenter. Preliminary analysis indicates that the lack of humidity and corrosive effects of oxygen were primarily responsible for the superior performance of the servers underwater.


Alissa 预计液体箱内的服务器将体验到类似的卓越性能。“我们把大海带到了服务器上,而不是把数据中心放在海底,”他说。

Alissa anticipates the servers inside the liquid immersion tank will experience similar superior performance. “We brought the sea to the servers rather than put the datacenter under the sea,” he said.



Azure的首席软件工程师Ioannis Manousakis 从微软数据中心的两相浸没式冷却箱中取出了一台刀片服务器。

Ioannis Manousakis, a principal software engineer with Azure, removes a server blade from a two-phase immersion cooling tank at a Microsoft datacenter. Photo by Gene Twedt for Microsoft.



未来

The future



如果浸没式液箱中的服务器按预期降低了故障率,微软可能会转向组件在发生故障时不会立即更换的模式。这样可以减少蒸汽损失,并可以在偏远、难以维修的地点部署液箱。

If the servers in the immersion tank experience reduced failure rates as anticipated, Microsoft could move to a model where components are not immediately replaced when they fail. This would limit vapor loss as well as allow tank deployment in remote, hard-to-service locations.


此外,Belady 指出,将服务器密集封装在液箱中的能力支持重新设想的服务器架构,该架构针对低延迟、高性能应用程序以及低维护操作进行了优化。例如,这样的液箱可以部署在城市中心的5G 基站铁塔下,用于自动驾驶汽车等应用。

What’s more, the ability to densely pack servers in the tank enables a re-envisioned server architecture that’s optimized for low-latency, high-performance applications as well as low-maintenance operation, Belady noted. Such a tank, for example, could be deployed under a 5G cellular communications tower in the middle of a city for applications such as self-driving cars.


目前,微软有一个在超大规模数据中心运行工作负载的液箱。在接下来的几个月里,微软团队将进行一系列测试,以证明液箱和技术的可行性。“第一步是让人们对这个概念以及我们可以运行生产工作负载的展示感到满意,”Belady说。

For now, Microsoft has one tank running workloads in a hyperscale datacenter. For the next several months, the Microsoft team will perform a series of tests to prove the viability of the tank and the technology. “This first step is about making people feel comfortable with the concept and showing we can run production workloads,” Belady said.



展开阅读全文

页面更新:2024-03-20

标签:微软   数据中心   流体   水下   液体   首席   芯片   液面   团队   高级   服务器   计算机

1 2 3 4 5

上滑加载更多 ↓
推荐阅读:
友情链接:
更多:

本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828  

© CopyRight 2008-2024 All Rights Reserved. Powered By bs178.com 闽ICP备11008920号-3
闽公网安备35020302034844号

Top