11 Hardware
CloudLab can allocate experiments on any one of several federated clusters.
CloudLab has the ability to dispatch experiments to several clusters: three that belong to CloudLab itself, plus several more that belong to federated projects.
Additional hardware expansions are planned, and descriptions of them can be found at https://www.cloudlab.us/hardware.php
11.1 CloudLab Utah
The CloudLab cluster at the University of Utah is being built in partnership with HP. It consists of 200 Intel Xeon E5 servers, 270 Xeon-D servers, and 315 64-bit ARM servers and 270 Intel Xeon-D severs, for a total of 6,680 cores. The cluster is housed in the University of Utah’s Downtown Data Center in Salt Lake City.
m400 |
| 315 nodes (64-bit ARM) |
CPU |
| Eight 64-bit ARMv8 (Atlas/A57) cores at 2.4 GHz (APM X-GENE) |
RAM |
| 64GB ECC Memory (8x 8 GB DDR3-1600 SO-DIMMs) |
Disk |
| 120 GB of flash (SATA3 / M.2, Micron M500) |
NIC |
| Dual-port Mellanox ConnectX-3 10 GB NIC (PCIe v3.0, 8 lanes |
m510 |
| 270 nodes (Intel Xeon-D) |
CPU |
| Eight-core Intel Xeon D-1548 at 2.0 GHz |
RAM |
| 64GB ECC Memory (4x 16 GB DDR4-2133 SO-DIMMs) |
Disk |
| 256 GB NVMe flash storage |
NIC |
| Dual-port Mellanox ConnectX-3 10 GB NIC (PCIe v3.0, 8 lanes |
There are 45 nodes in a chassis, and this cluster consists of thirteen chassis. Each chassis has two 45XGc switches; each node is connected to both switches, and each chassis switch has four 40Gbps uplinks, for a total of 320Gbps of uplink capacity from each chassis. One switch is used for control traffic, connecting to the Internet, etc. The other is used to build experiment topologies, and should be used for most experimental purposes.
All chassis are interconnected through a large HP FlexFabric 12910 switch which has full bisection bandwidth internally.
We have plans to enable some users to allocate entire chassis; when allocated in this mode, it will be possible to have complete administrator control over the switches in addition to the nodes.
In phase two we added 50 Apollo R2200 chassis each with four HPE ProLiant XL170r server modules. Each server has 10 cores for a total of 2000 cores.
xl170 |
| 200 nodes (Intel Broadwell, 10 core, 1 disk) |
CPU |
| Ten-core Intel E5-2640v4 at 2.4 GHz |
RAM |
| 64GB ECC Memory (4x 16 GB DDR4-2400 DIMMs) |
Disk |
| Intel DC S3520 480 GB 6G SATA SSD |
NIC |
| Two Dual-port Mellanox ConnectX-4 25 GB NIC (PCIe v3.0, 8 lanes |
Each server is connected via a 10Gbps control link (Dell switches) and a 25Gbps experimental link to Mellanox 2410 switches in groups of 40 servers. Each of the five groups’ experimental switches are connected to a Mellanox 2700 spine switch at 5x100Gbps. That switch in turn interconnects with the rest of the Utah CloudLab cluster via 6x40Gbps uplinks to the HP FlexFabric 12910 switch.
A unique feature of the phase two nodes is the addition of eight ONIE bootable "user allocatable" switches that can run a variety of Open Network OSes: six Dell S4048-ONs and two Mellanox MSN2410-BB2Fs. These switches and all 200 nodes are connected to two NetScout 3903 layer-1 switches, allowing flexible combinations of nodes and switches in an experiment.
11.2 CloudLab Wisconsin
The CloudLab cluster at the University of Wisconsin is built in partnership with Cisco, Seagate, and HP. The cluster, which is in Madison, Wisconsin, has 270 servers with a total of 5,000 cores connected in a CLOS topology with full bisection bandwidth. It has 1,070 TB of storage, including SSDs on every node.
More technical details can be found at https://www.cloudlab.us/hardware.php#wisconsin
c220g1 |
| 90 nodes (Haswell, 16 core, 3 disks) |
CPU |
| Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
RAM |
| 128GB ECC Memory (8x 16 GB DDR4 1866 MHz dual rank RDIMMs) |
Disk |
| Two 1.2 TB 10K RPM 6G SAS SFF HDDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSDs |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c240g1 |
| 10 nodes (Haswell, 16 core, 14 disks) |
CPU |
| Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
RAM |
| 128GB ECC Memory (8x 16 GB DDR4 1866 MHz dual rank RDIMMs) |
Disk |
| Two Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Twelve 3 TB 3.5" HDDs donated by Seagate |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c220g2 |
| 163 nodes (Haswell, 20 core, 3 disks) |
CPU |
| Two Intel E5-2660 v3 10-core CPUs at 2.60 GHz (Haswell EP) |
RAM |
| 160GB ECC Memory (10x 16 GB DDR4 2133 MHz dual rank RDIMMs) |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Two 1.2 TB 10K RPM 6G SAS SFF HDDs |
NIC |
| Dual-port Intel X520 10Gb NIC (PCIe v3.0, 8 lanes |
NIC |
| Onboard Intel i350 1Gb |
c240g2 |
| 4 nodes (Haswell, 20 core, 8 disks) |
CPU |
| Two Intel E5-2660 v3 10-core CPUs at 2.60 GHz (Haswell EP) |
RAM |
| 160GB ECC Memory (10x 16 GB DDR4 2133 MHz dual rank RDIMMs) |
Disk |
| Two Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Two 1TB HDDs |
Disk |
| Four 3TB HDDs |
NIC |
| Dual-port Intel X520 10Gb NIC (PCIe v3.0, 8 lanes |
NIC |
| Onboard Intel i350 1Gb |
All nodes are connected to two networks:
The experiment network at Wisconsin is transitioning to using HP switches in order to provide OpenFlow 1.3 support.
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in CloudLab. A 10 Gbps Ethernet “experiment network”–each node has two interfaces on this network. Twelve leaf switches are Cisco Nexus C3172PQs, which have 48 10Gbps ports for the nodes and six 40Gbps uplink ports. They are connected to six spine switches (Cisco Nexus C3132Qs); each leaf has one 40Gbps link to each spine switch. Another C3132Q switch acts as a core; each spine switch has one 40Gbps link to it, and it has upstream links to Internet2.
Phase II added 260 new nodes, 36 with one or more GPUs:
c220g5 |
| 224 nodes (Intel Skylake, 20 core, 2 disks) |
CPU |
| Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz |
RAM |
| 192GB ECC DDR4-2666 Memory |
Disk |
| One 1 TB 7200 RPM 6G SAS HDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSD |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c240g5 |
| 32 nodes (Intel Skylake, 20 core, 2 disks, GPU) |
CPU |
| Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz |
RAM |
| 192GB ECC DDR4-2666 Memory |
Disk |
| One 1 TB 7200 RPM 6G SAS HDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSD |
GPU |
| One NVIDIA 12GB PCI P100 GPU |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c4130 |
| 4 nodes (Intel Broadwell, 16 core, 2 disks, 4 GPUs) |
CPU |
| Two Intel Xeon E5-2667 8-core CPUs at 3.20 GHz |
RAM |
| 128GB ECC Memory |
Disk |
| Two 960 GB 6G SATA SSD |
GPU |
| Four NVIDIA 16GB Tesla V100 SMX2 GPUs |
11.3 CloudLab Clemson
The CloudLab cluster at Clemson University has been built partnership with Dell. The cluster so far has 260 servers with a total of 6,736 cores, 1,272TB of disk space, and 73TB of RAM. All nodes have 10GB Ethernet and most have QDR Infiniband. It is located in Clemson, South Carolina.
More technical details can be found at https://www.cloudlab.us/hardware.php#clemson
c8220 |
| 96 nodes (Ivy Bridge, 20 core) |
CPU |
| Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
RAM |
| 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (PCIe v3.0, 8 lanes |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c8220x |
| 4 nodes (Ivy Bridge, 20 core, 20 disks) |
CPU |
| Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
RAM |
| 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs |
Disk |
| Eight 1 TB 7.2K RPM 3G SATA HDDs |
Disk |
| Twelve 4 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (PCIe v3.0, 8 lanes |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c6320 |
| 84 nodes (Haswell, 28 core) |
CPU |
| Two Intel E5-2683 v3 14-core CPUs at 2.00 GHz (Haswell) |
RAM |
| 256GB ECC Memory |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (X520) |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c4130 |
| 2 nodes (Haswell, 28 core, two GPUs) |
CPU |
| Two Intel E5-2680 v3 12-core processors at 2.50 GHz (Haswell) |
RAM |
| 256GB ECC Memory |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
GPU |
| Two Tesla K40m GPUs |
NIC |
| Dual-port Intel 1Gbe NIC (i350) |
NIC |
| Dual-port Intel 10Gbe NIC (X710) |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
There are also two, storage intensive (270TB each!) nodes that should only be used if you need a huge amount of volatile storage. These nodes have only 10GB Ethernet.
dss7500 |
| 2 nodes (Haswell, 12 core, 270TB disk) |
CPU |
| Two Intel E5-2620 v3 6-core CPUs at 2.40 GHz (Haswell) |
RAM |
| 128GB ECC Memory |
Disk |
| Two 120 GB 6Gbps SATA SSDs |
Disk |
| 45 6 TB 7.2K RPM 6Gbps SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (X520) |
There are three networks at the Clemson site:
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in CloudLab. A 10 Gbps Ethernet “experiment network”–each node has one interface on this network. This network is implemented using three Force10 S6000 and three Force10 Z9100 switches. Each S6000 switch is connected to a companion Z9100 switch via a 480Gbps link aggregate.
A 40 Gbps QDR Infiniband “experiment network”–each node has one connection to this network, which is implemented using a large Mellanox chassis switch with full bisection bandwidth.
Phase two added 18 Dell C6420 chassis each with four dual-socket Skylake-based servers. Each of the 72 servers has 32 cores for a total of 2304 cores.
c6420 |
| 72 nodes (Intel Skylake, 32 core, 2 disk) |
CPU |
| Two Sixteen-core Intel Xeon Gold 6142 CPUs at 2.6 GHz |
RAM |
| 384GB ECC DDR4-2666 Memory |
Disk |
| Two Seagate 1TB 7200 RPM 6G SATA HDs |
NIC |
| Dual-port Intel X710 10Gbe NIC |
Each server is connected via a 1Gbps control link (Dell D3048 switches) and a 10Gbps experimental link (Dell S5048 switches).
These Phase II machines do not include Infiniband.
11.4 Apt Cluster
The main Apt cluster is housed in the University of Utah’s Downtown Data Center in Salt Lake City, Utah. It contains two classes of nodes:
r320 |
| 128 nodes (Sandy Bridge, 8 cores) |
CPU |
| 1x Xeon E5-2450 processor (8 cores, 2.1Ghz) |
RAM |
| 16GB Memory (4 x 2GB RDIMMs, 1.6Ghz) |
Disks |
| 4 x 500GB 7.2K SATA Drives (RAID5) |
NIC |
| 1GbE Dual port embedded NIC (Broadcom) |
NIC |
| 1 x Mellanox MX354A Dual port FDR CX3 adapter w/1 x QSA adapter |
c6220 |
| 64 nodes (Ivy Bridge, 16 cores) |
CPU |
| 2 x Xeon E5-2650v2 processors (8 cores each, 2.6Ghz) |
RAM |
| 64GB Memory (8 x 8GB DDR-3 RDIMMs, 1.86Ghz) |
Disks |
| 2 x 1TB SATA 3.5” 7.2K rpm hard drives |
NIC |
| 4 x 1GbE embedded Ethernet Ports (Broadcom) |
NIC |
| 1 x Intel X520 PCIe Dual port 10Gb Ethernet NIC |
NIC |
| 1 x Mellanox FDR CX3 Single port mezz card |
All nodes are connected to three networks with one interface each:
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in Apt. A “flexible fabric” that can run up to 56 Gbps and runs either FDR Infiniband or Ethernet. This fabric uses NICs and switches with Mellanox’s VPI technology. This means that we can, on demand, configure each port to be either FDR Inifiniband or 40 Gbps (or even non-standard 56 Gbps) Ethernet. This fabric consists of seven edge switches (Mellanox SX6036G) with 28 connected nodes each. There are two core switches (also SX6036G), and each edge switch connects to both cores with a 3.5:1 blocking factor. This fabric is ideal if you need very low latency, Infiniband, or a few, high-bandwidth Ethernet links.
A 10 Gbps Ethernet “commodity fabric”. One the r320 nodes, a port on the Mellanox NIC (permanently set to Ethernet mode) is used to connect to this fabric; on the c6220 nodes, a dedicated Intel 10 Gbps NIC is used. This fabric is built from two Dell Z9000 switches, each of which has 96 nodes connected to it. It is idea for creating large LANs: each of the two switches has full bisection bandwidth for its 96 ports, and there is a 3.5:1 blocking factor between the two switches.
11.5 IG-DDC Cluster
This cluster is not owned by CloudLab, but is federated and available to CloudLab users.
This small cluster is an InstaGENI Rack housed in the University of Utah’s Downtown Data Center. It has nodes of only a single type:
dl360 |
| 33 nodes (Sandy Bridge, 16 cores) |
CPU |
| 2x Xeon E5-2450 processors (8 cores each, 2.1Ghz) |
RAM |
| 48GB Memory (6 x 8GB RDIMMs, 1.6Ghz) |
Disk |
| 1 x 1TB 7.2K SATA Drive |
NIC |
| 1GbE 4-port embedded NIC |
It has two network fabrics:
A 1 Gbps “control network”. This is used for remote access, and should not be used for experiments.
A 1 Gbps “experiment network”. Each node has three interfaces on this network, which is built from a single HP Procurve 5406 switch. OpenFlow is available on this network.