Building a Deep Learning Rig!

For a full-time deep learning practitioner who spends most of their time training neural networks, it is better to invest in a server rather than relying on cloud solutions. The main reason for this is cost, and the second reason is the learning that comes from building your own rig. I would never recommend going with a pre-built deep learning workstation like Lambda's, as it is expensive.

“It is actually faster, cheaper, and easier to use a local deep learning machine than going for cloud instances in the long run”

Since most neural network processing happens in a GPU, the first decision to make is the GPU choice. For most people, the freely available GPUs from Google's Colab or Kaggle will be sufficient for training most models. However, if you are working on video and training large models, the free options may not be sufficient.

Prioritize GPU memory over speed

The first parameter to look in a GPU is the memory if you work on large models, you can compromise memory over speed. Get a older generation card which is slower but has higher memory than the latest gen.

“What is right number of GPUs for a Deep Learning Workstation?. Ans: The number you can afford (or max you can fit in your rig)”

Choosing the parts

One of the biggest problems when building a rig with multiple GPUs is heat and GPU temperatures. It is important to optimize the build to keep the GPUs cooler when they are under full load. We chose the following components, which we consider is optimal for a 2 GPU build:

GPUs: We went with 2x Nvidia RTX A5000 (2x24=48gb in total). (We recommend A6000 instead, which has 48Gb memory per unit)
CPU: Since the CPU is mainly used by data pipelines to feed data to the GPU, we chose a mid-level processor. We selected the Ryzen 9 5900x 12 core, instead of thread-ripper.
RAM: 96GB 3200Mhz Hyper X Fury.
Motherboard: MSI x570s aorus master, a fanless board.
Case: Lian Li 011 Dynamic XL, the best case for airflow.
PSU: Corsair RM850 modular.
CPU cooler: Artic liquid freezer II 360 water cooler.
Case Fans: Artic P12 fans
SSD: WD Black SN850 PCIe Gen 4

Airflow

Airflow is very important if you have multiple GPUs in your build. GPUs will be running for days when training large models, and high temperatures can affect the lifespan of the GPUs. To ensure good airflow, it is important to choose a case that has ample space and good airflow. Additionally, the motherboard should not have exhaust fans that align with the air intake of the GPUs. As hot air rises, it is recommended to keep the exhaust fans on top and the intake fans at the bottom or sides. Pick a CPU water cooler instead of Air cooler, so that the GPUs will have breathing space above.

Final build

Performance

When we train a Video Vision Transformer for a 224x224, 32-frame, 2x16x16 patch size, the number of batches that can fit for a float32 bit model is 8, and for a float16 model, it is 16. This is lower than what we wanted. That's why we recommend an A6000 GPU, which has 2x more memory than an A5000 (but >2x cost).

Prioritize GPU memory over speed

Choosing the parts

Airflow

Final build

Performance

Comments