HPC Cluster Architecture

High Performance Computing cluster need to be architect in line with the computational needs of the business, research projects. It requires planning of different aspects of the HPC cluster in order to implement and operationalize. Hardware planning requires specification of master and compute nodes based on FLOPS, storage required today and in near future, specification of network based on the latency acceptable, specification of power required, specification of storage required and so on.

Basic architecture

Master node is also referred as frontend node and compute node is referred as Backend node. Typical architecture is master node connected with ‘n’ compute node is shown below.  Operating system, cluster management, workload and resource management software enables HPC cluster to provide functional requirements.

HPC Cluster Basic Architecture

On the compute nodes, one Ethernet interface is connected to the cluster’s Ethernet switch. This network is considered private, that is, all traffic on this network is physically separated from the external public network (e.g., the internet).

On the frontend, at least two Ethernet interfaces are required. The interface that Linux maps to eth0 should be connected to the same Ethernet network as the compute nodes. The interface that Linux maps to eth1 should be connected to the external network (e.g., the internet or your organization’s intranet).

Non functional requirements

Non-functional requirement such as scalability, availability, performance are important aspects of HPC cluster architecture. Scalability in terms of scale out and scale up. Availability in terms of point of failure etc. In the above architecture  master node or front end node is single point of failure. This may be addressed by adding additional master node with additional cost.  This is really required or not is based on the business need.

