Friday 1 June 2018

Air v Liquid - Part 2 Data Centre Basics

So, continuing on from the previous post, we're going to look at the basics of a data centre, and for this I'm going to cover all types for all sorts of organisations.

As a user of digital services (as covered in the previous post) I don't really care about how my digital services get to me, as long as they do. But, just because I don't care doesn't mean that someone else doesn't.

Lets go back to basics. In order for me to access a digital service someone needs to provide 3 things, the first a network, a physical or wireless network that transmits my outbound and any inbound signals to a server somewhere, the second, that destination server or servers to process and deal with my request, be it, a financial transaction, a look at my bank balance or to view the latest news updates, and thirdly some power so both the server and the network can run.

At a most basic level, I could use domestic power points in my gargage to power my server and connect it to the internet via my wireless connection or broadband. Is that a data centre? Well, actually technically it isn't, although some organisations back in the early days did precisely that!

The official defination of a data centre (for me) is that contained within the EU Code of Conduct for Data Centres (Energy Efficiency) aka EUCOC guidance document, that can be downloaded from this link

It states "For the purposes of the Code of Conduct, the term “data centres” includes all buildings, facilities and rooms which contain enterprise servers, server communication equipment, cooling equipment and power equipment, and provide some form of data service (e.g. large scale mission critical facilities all the way down to small server rooms located in office buildings)

Very clear, insofar, that the building, facility or room must contain, servers, server communication equipment (network), power equipment and lastly cooling equipment and provide some sort of data service and that those buildings, facilities or rooms can range from large scale mission critical facilities all the way down to small server rooms located in office buildings.

Thus the EUCOC covers all types of communication cupboards, server rooms, machine rooms, mini data centres, medium data centres, telco switch locations, hyperscale data centres etc and those belonging to all types of organisations, in fact any organisation that operates in this increasingly digital world. The only thing I didn't discuss was the cooling element.

Cooling is actually a bit of a misnomer, what we're actually doing when we "cool" is to reject heat, that is to carry away the heat away from the servers and then cool it (the air) down and reinject that cooled air back into the loop. There is plenty of stuff available online if you really want to know about the airflow cycle BUT...

And, it is a BIG BUT...

What are we actually rejecting the heat from?

In essence, servers generate a lot of waste heat, and if we dont manage the air flow we can get thermal problems, but where is this heat coming from? 

In a server, there are 2 heat sources, the first is the processor itself (more on that later) and the power supply unit, this device converts the 220-240AC power into the micro DC voltages found on the motherboard and other components.

Server chipsets (the processors) undergo Computational Fluid Dynamics (CFD) modelling to channel the heat away from the core to the heatsinks, the surface of a chip can reach temperatures in excess of 140°C, by the time the heat gets to the end of the fins it can be around 50-60°C.

Most servers themselves undergo CFD modelling to determine the air flow across the heat producing components, listed above within the chassis of the server, so the air flow is optimised as it enters through the front of the device, passes over the processor/power supply and then is exhausted through the rear of the unit. The fans in the device actually pulls the air through the unit, assisted by any positive pressure provided by the AC units.

Most servers are actually designed to operate quite happily in ambient air, and as such can operate between 5° - 40°C,  Thermal monitoring controls the fan speed. It is only when we cluster a great many servers togther for instance in a rack that heat problems can appear.

When I teach the EU Code of Conduct for Data Centres (Energy Efficiency) I ask the students 3 questions, as follows:

1. What is the target temperature in your facility (server room etc) ?
2. What is the target humidity range in your facility?
3. Why these numbers specifically?  

The answers (with some exceptions) are:

1. 18-21℃
2. 50%RH +/-5%
3. Er, we dont know!

Well, those specific numbers relate to the use of paper tapes and cards back in the 50's and 60's and then when some old magnetic tapes needed cool and dry environments back in the 80's and 90's. (I could do a whole blog post on this!)

The thing is that these temperature and humidity ranges belong in the 20th Century, IT equipment can run at higher temperatures today so the question is, do we need cooling or just simply heat rejection?

Well, we'll cover that in the next blog post. 

Before we do though, we'll just go through the absolute basics for a server room and by definition the rest of the data centre ecosystem types as the only real difference is scale and risk profile/appetite.

Scale is clearly, the amount of IT equipment you are using, for smaller organisations, it may not be too big an IT estate, whilst for the hyperscale search engines or social media platforms it may number tens of thousands, if not millions of servers, storage units and network switches. This amount will create a great deal of heat which needs to be managed.

The risk profile is essentially, your own appetite to the risk of the IT going down, if it is absolutely mission critical that your IT systems stays up all the time, or as we say in the sector 24hours a day, 7 days a week, 365 days a year or 24/7/365 then you will require some duplicate systems to deal with any failure, and not just in the IT but your power, network and cooling solutions as well, this will add cost, complexity and an increased maintainance regime to your calculations. There are certain classifications that can be applied, such as the EN50600 (ISO22237) Classes, the Uptime Institute Tiers as well as others, I dont want to go into much detail on these at the present time, but will cover them later in this series.

Lets take a quick look at the minimums:

Power, we'll need electricity to power the servers, networking equipment and storage solutions.
The power train is, at its most basic a standard 13Amp socket on the wall, but if we have more than one server and you're using a rack it may be prudent to consider other options.

Space, computer racks come in a number of different sizes and configurations, most prevalent today are 800mm x 800mm and approximately 1.8 high for a 42U rack and you'll need to access the front and rear of the rack with the doors open so allow at least 1m around the rack for access. You could just use a table, and I've seen that in a number of locations!

Technically, thats it, but it may be prudent to allow the hot air to escape the room and a standard extract ventilator fan can do the job (bear in mind thought that outside air can also come in through this fan unit so some filtration equipment may be useful.)

If you want cooling, then you may want a false floor to allow the cool air to surround your rack and some tiles to direct it to where it needs to be (please refer to the EUCOC section 5 cooling for the myriad of best practices about air flow direction and the containment of that air to the right place). You can get away with not having any air flow management (indeed up until very recently (circa 2008) and in some places even today, but this will come at the risk of hotspots, thermal overload failures and increased energy bills.

You'll also need some sort of cooling unit (if so you dont really need the extract fan), the most basic of this is a standard domestic DX cooler, this unit provides cold air (it should have control to specify the exact temperature) and rejects the air via pipework and an external unit to the outside, again there are a multitude of cooling options available on the market today and some even optimised for data centres!

In the next post, I'll be looking at the cooling v heat rejection.