How to build a 10G 40G 100G Datacenter Switch?

Grab that Silicon, solder it on PCB

Reza Toghraee

I was heading a gigabit switch design and production couple years ago. In the beginning I thought its going to be a tough job, but after we started getting familiar with components, we realized that it's a simple process. everything starts with silicon vendor. once you sign the NDA , you will get access to a world of information. The thousands of pages of datasheets, schematic designs, PCB layouts, software specification, drivers, etc. 

Let's get back to our subject for building a switch. A 10G 40G or 100G top of the rack datacenter switch.

 An Ethernet switch hardware has a simple design and components. in simple terms a switch consists of the following components :

  • Chassis (just the metal part, with your choice of colors and assembly. A pink or lipstick red would make it sexy)

  • Power supplies (2 Redundant, its ready made available from many factories)

  • Fans (enough to cool down and make some Lamborghini noise to make your product sexier)

  • Fan control PCBA (PCBA where fans get connected)

  • CPU PCBA (The x86 or Power PC or ARM based processors with its RAM, FLASH and PCIe which runs the switch OS)

  • Switch main board PCBA (The board which hosts the main switch silicon, Interface cages, CPLDs, and PHYs (in case of RJ45 interfaces), this PCB is between 16 to 22 layers)

Have a look at below switch diagram. its a real picture of a Edge-Core AS5712 48 x 10G , 6 x 40G switch based on Broadcom Trident 2.

The middle fat heat sink is protecting the Broadcom Trident 2. Normally these BGA ASICS are large in size as they have many interface pins underneath the silicon.

The CPLDs (Complex Programmable Logic Devices)  are mostly Altera MAX and have duties for system startup, managing the LEDs, Fans, temperature sensors. The CPLDs are connected back to the CPU through the UART or I2C bus. I2C (IC to IC Interconnect) is a digital communication protocol used mostly in electronic systems. It is similar to RS485 where you can daisy chain multiple devices and each device will have a static ID.


Block Diagram

 The schematic diagram here shows the connectivity between different components in the switch.

As you can see the CPU which here is based on Intel C2538 (Atom) has a direct PCIe connection to the Broadcom Trident 2 chip. This connection requires a Driver which is provided through the Broadcom SDK to the NOS vendors and they include in their OS.  Also you can try using OFDPA on ONL (Open Network Linux) to drive this connection and establish the communication between the CPU and ASIC. 

There are 3 Altera MAX CPLDs on this switch. The CPLD 1 has the function to control the Fans, Temperature and LEDs. This CPLD has a exposed I2C interface which is connected to the CPU board.

The CPLD 2,3 are for controlling the SFPs. Remember that this switch doesn't have any on-board PHY, and uses the 10G / 1G SFPs for interfaces. The CPLD 2,3 controls the SFPs , TXFaults, TXDisable, RXLoss and Mode. Each of CPLD2 and 3 controls 24 SFPs.

Whats the difference between different switches ?

Aren't they have same specs??? So why NOS is not supported on ALL of them?

Now you know how to build a switch, there are certain components which are fixed in all the switches in same family. The switches from different ODMs (Accton, Mellanox, Quanta, Penguin, Invertec, etc) have some pieces in common and some different. For example if we look at Leaf switch from all of these ODMs, they are all based on Broadcom Triedent 2 platform, however may have some difference in CPU board, the CPLDs, and also the commands used to read and write the I2C devices on I2C bus. That's why the porting of NOS between platforms is not easy. you may install a NOS on a switch which is not officially supported and experience its working properly but the fans are running at full speed all the time. 

In general the CPLD commands and I2C addresses might be different between such platforms.