What’s next for AI Infrastructure?

Don’t worry if some of the terms below are new to you, I have linked my glossary here

Over the last year or so I’ve had so many meetings, read so many papers, watched hundreds of hours of educational videos, on where AI technology is going, in particular the technology that underpins AI functionality and infrastructure, so I thought I should really try and condense all this information into a high level post.

The Power Problem

Without doubt the biggest barrier to future AI evolution is power, both in terms of power required for the various operations, and moving data between chips, but also the physics involved in moving electrons across copper wires on silicon dies, not just within a GPU domain but between GPU domains comprising of clusters of thousands of GPU’s. So as the old adage goes, we need to do ‘more with the same, or ‘do the same with less’ and very smart people like those at Imec and TSMC are looking at both options, but what if eventually we could do a ‘double whammy’ of doing ‘more with less‘ at least when it comes to AI efficiency.

So lets have a very high level view of where we are, where we are going both short and longer term and the various trends driving it. Any one of the technologies I mention below could easily be the subject of a dedicated post in their own right, and may well be in the future, as I find this subject fascinating.

Trend 1 – Squeezing Silicon Harder

So when it comes to doing ‘more with the same’ or should I say ‘Moore with the same’ 🙂 we are approaching the practical limits of how many transistors we can fit on a die/chip cut from a silicon wafer, yes I know they’ve been saying this for years, but now I’m starting to believe them, for example NVIDIA’s Blackwell GPU has 208 billion transistors on a 4nm process. With the Rubin and Rubin Ultra GPUs on a 3nm process planed for release in 2026/7. Worth noting the ‘nm’ figure commonly quoted with chips no longer equates to any single physical measurement such as gate size, instead it should be understood as a ‘generational label’ indicating a new process node technology.

Beyond this, transistor optimisation technologies like FinFET then Gate-All-Around (GAA) provides optimisations in transistor density and lessens voltage leakage across the gates by surrounding the channel on all sides.

And as all real estate professionals know when you’ve built as much as you can on a piece of land and you’ve used up all your X & Y axis space, the solution is to start building up and scale into the vertical plane, CFET does exactly that by allowing stacking of transistor polarities (nFET & pFET) on top of each other rather than side by side as in GAA, thus essentially doubling the transistor capacity on a die, by adding this ‘2nd floor’. As of Q4 2025 CFET is still in early prototyping phase.

The table and diagram below compares the evolution from Planar → FinFET → GAA → CFET

Trend 2 – Beyond Electrons: The Photonic Future

So that buys a bit more time, but then the bottleneck and limitations start becoming the actual materials themselves, the silicon and the copper wires that the electrons flow along, in the words of Scotty from Star trek ‘Ye cannae change the laws of physics’ so the very materials that are used need to be re-evaluated to maintain the momentum of progression.

Once electrons over copper become power‑limited, the only path left is moving data using photons (light). Photonics eliminates resistive heat, delivers higher reach, and radically reduces power per bit.

So silicon possibly makes way for alternate materials like Graphene, and electrons over copper make way for photonics as the physics of electricity with regards to speed, distance and heat generation start to break down in favour of the properties of light. Manufactures like NVIDIA are already adding photonic capabilities into chips like the Rubin ultra GPU (due for release in 2027) by incorporating Co-Packaged Optic (CPO) modules adjacent to the GPUs.

NVIDIA has also introduced Quantum‑X and Spectrum‑X silicon photonic networking switches, leveraging CPO to overcome the scaling limits of traditional electrical signalling and optical transceivers in large‑scale AI deployments.

So what role do these photonic switches play? Well lets have a quick review of how GPUs interact with each other. At the node level, NVLink provides a high bandwidth, low latency mesh that connects GPUs (and CPUs) inside a single system, enabling very fast peer‑to‑peer communication and collective operations.

For systems designed for a distributed NVLink domain like, NVIDIAs NVLxxx systems, NVLink Switches extend this fabric to interconnect GPUs across multiple nodes in a rack to ‘Scale up‘ the GPU domain, creating a single, unified ‘GPU domain’ with coherent, all‑to‑all connectivity, allowing massive configurations like NVIDIA’s NVL576 where 576 Rubin ultra GPUs behave as one giant accelerator domain in a single Kyber rack.

Beyond rack scale, NVIDIA’s Quantum‑X Photonic InfiniBand switches provide photonic enabled ‘scale‑out‘ networking that links these NVLink GPU domains together across multiple racks. By integrating silicon photonics directly into the switch ASIC with CPO, Quantum-X reduces power consumption, improves signal integrity, and enables orders‑of‑magnitude scalability for multi‑rack AI fabrics compared with traditional electrical or pluggable optical approaches.

Likewise, Spectrum‑X Photonic switches bring the same CPO‑based silicon photonics to Ethernet networking, enabling cost and power efficient connectivity for AI “factories” comprised of millions of GPUs. The shift from electrical signalling and discrete optics to co‑packaged photonics is critical as AI clusters grow beyond what copper or classic optics can efficiently support

Trend 3 – Beyond Classical: The Quantum Horizon

And what about the ‘double whammy’ I mentioned above well that obviously requires a paradigm shift in computing which takes the form of Quantum computing which is no longer science fiction as Quantum computers are already with us with chips like ‘Willow’ from Google.

One of my previous blog posts was on possible options of how we can reduce the power requirements of AI, one of which was ‘Ternary computing’ where we use ‘Trits’ instead of ‘bits’ where a ‘bit ‘can have 3 states rather than the binary 2, thus giving 50% more information in every ‘Trit’. But Quantum takes that to the next level where each bit or ‘qubit’ can be a 1 or 0 at the same time.

Now that sounds an impossible concept . but how it was explained to me which kind of made sense, is think of a bit which can have 2 states, like a coin, it can be ‘heads’ or ‘tails’ but if you flip that coin in the air while it is spinning it can be said to be both ‘heads and tails’ or in Quantum terms is in ‘Superposition’ these qubits attach to each other via a process called ‘Entanglement’ which then allow processing to evaluate all possible outcomes simultaneously, as every combination can be evaluated in parallel due to the Superposition of all the qubits. Now it should be said that Quantum isn’t the answer to ‘life the universe and everything’ that, as we all know is 42,, and you are certainly not going to run Windows on it, but it does meet a very specific requirement when it comes to the fundamentals of linear equations like complex addition and multiplication functions, and that is where quantum absolutely leaves traditional compute in the dust.

This has obviously caused concern that encryption methods like RSA and ECC while nigh on impossible to break with conventional computing power, could be broken quite easily with the simultaneous evaluation properties of quantum. The day when all widely used public-key encryption methods could be vulnerable to commercially available quantum computing has been named ‘q day’ and could be as close as 2030 hence why many vendors are now offering post-quantum cryptography PQC solutions.

Quantum infrastructure is very different from the infrastructure we are used to with Silicon and copper , instead using materials like Aluminum (Al) or Niobium (Nb) and look more like chandeliers than computers.

These three trends form a pipeline of innovation: more silicon efficiency today, photonics to scale tomorrow, and quantum for the problems classical compute can’t touch

So the answer to the question at the top of this post ‘what’s next for AI Infrastructure’, is ultimately a combination of them all! with ‘Quantum AI’ for specific workloads which will no doubt compliment more traditional enterprise grade AI , with several pit stops along the way utilising the various other technologies mentioned in this post.

As always exciting times ahead.

If you are exploring how to turn AI into something real, not just hype, I’ll be diving deeper into the various vendor solutions I see, architectures and deployment patterns that are actually working in practice, and yes I’m sure our friend Cisco UCS will be featuring in some of them. So follow along if you want straight talk on what delivers outcomes versus what just sounds impressive. More to come at www.ucsguru.com.

Posted in Artificial Intelligence | Tagged , , , , , , , , , , , , , , , , , | Leave a comment

Can thinking in 3’s Solve AI’s Power Problem?

As a network specialist for over 35 years , I can ‘think in binary’ can make quite complex subnet / supernet calculations in my head, from years of looking at IP addresses and immediately identifying whether they are on the same network or need to route etc.. almost seeing the world in green 1’s and 0’s like the character ‘Neo’ in the film ‘The Matrix’

But what if computing wasn’t built on twos at all, but instead a 3 state architecture, where it was no longer just ‘On or Off’ but also a mythical 3rd state between the 2. So rather just 1 or 0, we would have -1, 0, and 1, a Traffic light compared to a light bulb if you like, equating to 3 voltage states low, mid & high. Adding this 3rd state drastically increases processing power while reducing the circuitry and power required for it.

3D Traffic Light with Light Bulb – Chalkboard Background – 3D Rendering

For example we all know a 32 BIT IP address provides 4.29 billion unique combinations of 1’s and 0’s  (2^32) but add a 3rd state and now that 32 TRIT IP address provides about 1.85 quadrillion unique addresses (3^32)

But obviously changing the fundamental building block on which practically all technology is based, would be a challenge, hence the much simpler solution of just extending the existing 2 state binary addressing scheme and call it IPv6 in order to overcome IPv4 scale limitations as well as introduce additional features.

So perhaps we have missed the boat with regards to introducing 3 state architecture as the IT standard, but there could well be a time where it makes absolute sense to adopt it within an AI pod to drastically improve performance and reduce power requirements, and after all it’s power which is generally the limiting factor on scaling out and locating AI solutions.

The concept of Ternary computing is nothing new, the Russians built a Ternary computer called ‘Setun’ as a research project back in 1958, but like the Beta Max / Blu-Ray of its time it just didn’t get the adoption against the far more popular and standard ‘VHS’ / Streaming of Binary based systems, despite being superior in many ways.

Another challenge with the 3 state architecture is being able to clearly identify each state, an easy situation in a ‘is it On or is it Off’ world but not so easy with 3 states, factor in noise, signal variables and hardware compatibility and the margin for errors are almost non existent, but with modern isolation materials like Gallium Nitride (GaN) or Gallium Arsenide (GaAs) within the transistors this is now far more efficient.

So in the future it maybe the case that the AI ‘Backend for GPU to GPU communication is Ternary based within the AI Pod, with the conversion to standard Binary signalling done at the Pod edge to maintain global connectivity standards. If you think about it this is pretty similar to how a GPU converts CPU instructions but at a Pod scale level. And if having a different type of backend network that wins out because of superior performance sounds unlikley, just look up InfiniBand 😊

Closing thought, If binary gave us the Internet, could ternary give us sustainable AI?

Posted in Artificial Intelligence | Tagged , , , , | 2 Comments

Essential AI Terms: Your Comprehensive Glossary

I spend a lot of time explaining AI terms and acronyms in conversations particularly network related terms, so I decided to create a single reference post with the most common ones. If you’re starting your AI journey or just need a quick refresher, I hope this glossary helps you cut through the jargon and understand the essentials.

AI Concepts

Chatbot: A computer program that uses AI to understand and respond to human language in a conversational way. i.e. ChatGPT

AI Agent: Can operate without human interaction , and has a specific objective i.e. book a flight, analise data

Hallucination: When an AI Chatbot generates incorrect or fabricated information but that sounds plausible.

Chat GPT (Generative Pre-trained Transformers) a popular Chatbot

Prompt Engineering: Intelligently crafting inputs to guide LLM responses to more accurately meet your intent, i.e. asking ChatGPT something in such a way that gives you exactly the response you were looking for.

Agentic AI: Advanced form of AI focused on autonomous decision making and goal driven behaviour with limited human intervention. Uses AI agents that can plan, adapt and execute tasks in dynamic environments.

Physical AI: AI that enables machines and robots to perceive, understand and interact with the physical world.

FP – Floating Points i.e. FP4, FP16 up to FP 32 the amount of bits used to store a large number, the more bits the higher the precision but slower, the sweet spot is generally FP16 , still accurate but faster training time. FP8 Superfast, is used in inferencing. 

Token: A small piece of text, usually a word or part of a word or punctuation i.e. the phase ‘AI is amazing.’ is 4 ‘tokens’ “AI”, “is”, “amazing”, “.” AI coverts all inputs and outputs to ‘tokens’ the number of ‘Tokens’ determines how much text you can process and how much it costs. Tokens are then converted to ‘Vectors’ to associate a numeric meaning and context to the string of tokens, which are then stored in the vector database for similarity search.

Vector Database: A vector database is a type of database used in AI that stores data as vectors (lists of numbers). Instead of searching for exact matches, it enables similarity search, finding items that are “close” in meaning or features. For example, if you like Product A, the system may suggest Product B because their vectors share several similarities.

Context Window: The maximum number of tokens an LLM can process at once.

Inferencing: Using a pre-trained AI model to generate predictions or responses based solely on its training data.
Key Point: No external updates; the model relies on internal knowledge.
Analogy: Answering a question from memory.

RAG:  Retrieval Augmented Generation.
An AI technique that combines a language model with real-time data retrieval from external sources before generating an answer.
Key Point: Adds fresh, domain-specific context to overcome model knowledge cutoff.
Analogy: Checking a reference book before answering.

LLMs Large language models: AI programs trained on massive datasets to understand and generate human language. 

LRM: Large Reasoning Model, A form of LLM that has undergone reasoning focused fine tuning, to handle multi-step reasoning tasks. Instead of just predicting the next word (like traditional LLMs)

Fine Tuning – Is the process of taking a pre‑trained AI model and adjusting it with new, specific data so it performs better on a particular task. Like teaching an AI model that already knows a lot of general stuff to specialise in your subject.

CUDA: A software platform developed by NVIDIA that enables programs to run thousands of tasks in parallel on GPUs, unlocking massive speedups for computing beyond graphics

MCP: Model Context Protocol: MCP is a specialised protocol designed for AI models. It standardises how models connect to external tools and data sources, making it easier to provide context and structured results. Unlike APIs, which vary by service, MCP acts as a universal adapter for AI systems, ensuring consistent integration across different environments

MoE: Mixture of Experts: An AI model design where multiple specialised “expert” networks are trained for different tasks or patterns, and a gating mechanism decides which expert(s) to activate for each input. This makes the model more efficient, since only a few experts are used at a time instead of the whole network.

Slurm: (Simple Linux Utility for Resource Management)
An open‑source workload manager that schedules and runs jobs on clusters of computers. It’s widely used in high‑performance computing (HPC) and AI to allocate resources, manage queues, and coordinate parallel tasks across many nodes.

BCM: NVIDIA Base Command Manager .A proprietary workload management and cluster orchestration platform from NVIDIA, built to schedule, monitor, and optimize AI and HPC jobs on GPU‑accelerated systems. It provides resource allocation, job scheduling, and Kubernetes integration, similar to Slurm but tailored for NVIDIA hardware.

.

Infrastructure & Hardware

Hyperscaler – Large Cloud Provider i.e. AWS, Azure, GCP

Neoscaler/Neocloud – Typically smaller more agile than a Hyperscaler Emerging providers focused on specialised AI infrastructure at scale i.e CoreWeave, Lambda Labs

RDMA (Remote Direct Memory Access): Enables direct memory access between nodes without CPU involvement.

InfiniBand: A high‑performance, low‑latency networking technology widely used in supercomputing and AI training clusters. It supports advanced features such as RDMA (Remote Direct Memory Access) and collective offload, enabling efficient data movement at scale. While InfiniBand is an open standard maintained by the InfiniBand Trade Association (IBTA), in practice the ecosystem is heavily dominated by NVIDIA, following its acquisition of Mellanox in 2020..

ROCEv2 A protocol that enables RDMA over Ethernet using IP headers for routability. It requires lossless / Converged Ethernet (PFC/DCQCN) and is common in HPC and storage networks

Ultra Ethernet: An open standard designed to evolve Ethernet for AI and HPC workloads, focusing on low latency, congestion control, and scalability. It aims to provide an Ethernet-based alternative to InfiniBand for GPU clusters

NVLink – NVIDIA’s High-speed GPU interconnect usually within a node

NVSwitch – An advanced switch fabric that connects multiple NVLink-enabled GPUs inside large servers (e.g., DGX systems). Connects up to 72 GPUs (NVIDIAs NL72 rackscale system)

NCCL (NVIDIA Collective Communication Library, pronounced “Nickel”) — A library developed by NVIDIA that enables fast, efficient communication between GPUs. It provides optimised routines for collective operations such as all‑reduce, broadcast, and gather, making multi‑GPU and multi‑node training in AI and HPC scalable and performant. NCCL runs over NVLink/NVSwitch or PCIe within a node, and InfiniBand or RoCEv2 between nodes, using CUDA under the hood..

DGX GB200: This system uses the NVIDIA GB200 Superchip, which combines the Grace CPU with Blackwell (B200) GPUs

DGX GB300: This system is based on the enhanced NVIDIA GB300 Superchip, featuring the Grace CPU with the Blackwell Ultra (B300) GPU

NVIDIA NVL4 Board – (GB200 Grace Blackwell ‘SuperChip’ 4 x Blackwell GPUs with 768 GB HBM3E memory, 2 Grace CPUs with 960 GB LPDDR5X memory connected via NVlink to perform like a single unified processor . Designed for hyperscale AI with trillions of parameters.

NVIDIA NVL72 -Essentially 18 NVL4 boards in a single rack connected via NVLink, with 72 Blackwell GPUs, 36 Grace CPUs sharing 13.8 TB of HBM3E and LPDDR5X memory and acting like a single giant GPU pool, definitely more than the sum of it’s parts. Liquid cooled and can draw ~120 kW.

NVIDIA NVL144 – NVIDIA Vera Rubin NVL144 CPX contains 144 Rubin GPUs and 36 Vera CPUs in a single rack providing 8 exaflops of AI performance.  provides 7.5x more AI performance than NVIDIA GB300 NVL72 systems: Availability Expected H2 2026

NVIDIA Kyber NVL576 – 576 Rubin Ultra GPUs and 144 Vera CPUs in a single cohesive GPU domain over 2 racks (1 GPU Kyber rack and 1 Kyber side car rack, for power conversion, chillers and monitoring) drawing up to 600kW., provides 15 exaflops at FP4 (Inference) and 5 exaflops at FP8 (Training) Availability Expected H2 2027

NVIDIA BasePOD A reference architecture for building GPU‑accelerated clusters using NVIDIA DGX systems. It provides validated designs for flexible AI and HPC deployments, allowing organisations to choose networking, storage, and supporting infrastructure from different vendors.

NVIDIA SuperPOD A turnkey AI data center solution from NVIDIA, delivering fully integrated GPU clusters at massive scale. It combines DGX systems, high‑speed networking, storage, and NVIDIA software into a complete package for enterprise and research workloads.

GPU: Graphics Processing Unit is a specialised processor originally designed for rendering graphics, now widely used in AI for its ability to perform thousands of parallel computations simultaneously. This architecture makes GPUs ideal for accelerating deep learning tasks such as training neural networks and running inference, dramatically reducing processing time compared to traditional CPUs

DPU: Data Processing Unit: is a specialised processor designed to offload and accelerate data-centric tasks such as networking, storage, and security from the CPU. DPUs typically combine programmable cores, high-speed network interfaces, and hardware accelerators to handle packet processing, encryption, and virtualisation, improving performance and freeing up CPU resources for application workloads. (e.g., NVIDIA BlueField).

TPU: Tensor Processing Unit, Purpose built chip by Google for AI and ML acceleration, significant speed and efficiency gains over GPU or CPU based systems, but not as flexible.as optimised for a narrower set of operations.

NPU (Native Processing Unit) is an emerging class of processor that uses photonic (light-based) technology instead of traditional electronic signalling to perform computations. By leveraging photons, NPUs achieve ultra-high bandwidth, dramatically lower power consumption, and minimal heat generation, making them ideal for accelerating AI workloads and large-scale data processing.

CPO: Co-Packaged Optics modules use silicon photonics to integrate optical components directly alongside high-speed electronic chips, like switch ASICs or AI accelerators, within a single package

Trainium: is a custom AI accelerator ASIC (so even more specialised than a TPU) developed by Amazon Web Services (AWS) to deliver high-performance training and inference for machine learning models at lower power consumption and cost compared to general-purpose GPUs. Designed for deep learning workloads, Trainium offers optimised tensor operations and scalability for large-scale AI training in the cloud.

Smart NIC: is an advanced network interface card that includes onboard processing capabilities, often using programmable CPUs or DPUs, to offload networking, security, and virtualisation tasks from the host CPU. This enables higher performance, lower latency, and improved scalability for data centre and cloud environments by handling functions such as packet processing, encryption, and traffic shaping directly on the NIC.

HBM: High Bandwidth Memory dedicated to a GPU, energy efficient, Wide bus up to 1024bits directly on the same package as the GPU minimising latency.

SXM (Server PCI Module) GPU module mounted directly on the motherboard offering superior performance and scalability for large-scale projects like AI training,  

PCIe (Peripheral Component Interconnect Express) Standard expansion card inserted into PCIe slots, provides flexibility and wider compatibility, making it suitable for smaller-scale tasks and cost-effectiveness.

CDU – Cooling Distribution Unit is a device used in liquid‑cooled data centers to circulate coolant between the building’s central cooling system and the liquid‑cooled IT equipment. It regulates coolant temperature, pressure, and flow to ensure safe, efficient heat removal from high‑density compute systems.

Think I’ve missed any important ones?, let me know in the comments.

Posted in Artificial Intelligence | Tagged , , , , | 1 Comment

Time to get blogging again!

Wow! just noticed it’s been a while since my last post. Took a blogging break during covid and then just never got back in the habit. Well time me-thinks to get back on the horse and start blogging again now we are in a new year! so watch this space!

Sorry I’ve been away so long.

Posted in General | Leave a comment

Cisco HyperFlex vSphere Cluster Expansion

In this video I expand our Cisco HyperFlex cluster from 5 to 6 converged nodes.

Posted in HyperFlex | Tagged , , , , , , | 3 Comments

Cisco Champions Radio: Hyperflex Gets Edgy!

Join me Darren Williams, Daren Fulwell and Lauren Friedman as we discuss the latest innovations with Cisco HyperFlex 4.0

Recorded for Cisco Champions Radio at Cisco Live Europe 2019  Barcelona.

Click the image below for the podcast.

 

Posted in HyperFlex | Tagged , , , , | Leave a comment

Cisco Intersight Setup and Configuration

I’m sure by now you have heard of Cisco Intersight. Intersight is Cisco’s SaaS offering for monitoring and managing all your Cisco UCS and Hyperflex platforms from a single cloud based GUI. And the best part is the base licence and functionality is completely free!

This video walks you through the simple steps for setting up your Cisco Intersight account and registering your devices!

 

Let me know in the comments if you are using Cisco Intersight and how you are finding it.  I for instance now have complete visibility of all our Cisco UCS and Hyperflex systems from my mobile phone and it doesn’t cost a penny!

Posted in HyperFlex | Tagged , , , , , | 1 Comment

Cisco Live Europe 2019

Another great Cisco Live Europe this year, as usual as the week progresses the dummer I feel, as I see there is still so many topics I don’t know enough about. But as always I come away wiser and with a huge list of topics to research further, as well as ideas for labs to stand up and play with.

I will be delving deeper into some of the below topics in future posts, but in the meantime here’s a high level list of topics I found interesting.

The running theme of many of the tech sessions that I attended was Anywhere. The flexibility to run workloads or extend policy anywhere you need to regardless of whether that be within a data center,  on the edge, across data centers out to a branch or into a public cloud or multiclouds while maintaining a consistent policy model. And managing it all from a unified UI that abstracts the different underlying technologies.

 Cisco ACI Anywhere

Cisco announced ACI Anywhere which essentially means being able to deploy or extend your policy and security requirements anywhere they are needed, whether on premises or in an public cloud (AWS or Azure). Most of the setup requirements for this are automated with the results being a central user interface Multisite Orchestrator (MSO) from which you can then create your policy and select the site or sites that you wish to deploy the policy to. All ACI to public cloud construct mappings are handled automatically with no knowledge of AWS or Azure required. I see this being of real interest to customers, and should accelerate the adoption of Cisco ACI. Cisco also have a “Cloud First” use case for ACI Anywhere, where there is no on premises location at all, just Cisco ACI deployed into the public cloud or clouds, normalising policy between them.

Additional  enablers for ACI Anywhere are:

Remote Leaf: Allows extending the ACI fabric out to a remote location or Co-Lo without having to also deploy ACI Spines or APICs there. This being a physical pair of leaf switches bare-metal and virtual workloads are supported.

Remote Leaf

Virtual Pod: vPod is similar to remote leaf however it is a software only solution. vPod is made up of virtual spines (vSpines), virtual leafs (vLeafs) and ACI Virtual Edges (AVEs) that are deployed on a hyper-visor infrastructure, thus designed for a virtual environment.

When I first saw vPod I did wonder whether this could be the first step of being able to run Cisco ACI on non Cisco hardware. When I asked this question, the answer was “It’s theoretically possible”

vPod

 

Cisco Hyperflex Anywhere

Cisco also announced numerous new updates in the soon to be released 4.0 code for its UCS based hyper-converged offering, Cisco Hyperflex (Cisco HX). Hyperflex Anywhere gives the ability to deploy workloads on an HX cluster anywhere they are required whether that be in the DC or out at the edge. Many customers have the requirement of moving the data closer to the users, the fact is the data center is no longer the center for data! Hyperflex edge allows a 2 – 4 node cluster to be deployed, with no Fabric Interconnects required and the flexibility of 1 or 10Gbs connectivity. And I know what you are thinking, a 2 node cluster? how would consensus work there to prevent a split brain scenario, well Cisco have thought about that, and use a virtual cloud VM as part of Cisco Intersight to act as a cloud witness, …clever!

This setup would give customers a significant cost saving by minimising the equipment required at the edge or remote location while providing a consistent platform and centralised management.

The deployment of a Hyperflex edge cluster can also be automated from Cisco Intersight to allow for zero touch provisioning (ZTP) from the factory to these remote locations, including incorporating SD-WAN virtual appliances if required.

The other significant updates announced with Cisco HX 4.0 were performance related.

All NVMe Node:

Cisco have partnered closely with Intel to develop the HX220c M5 All NVMe node untilising Intel Optane caching and all NVMe capacity drives. As we know compared to SSDs, NVMe is crazy fast, which has the potential to move the age old choke point in any system from the drives to the I/O bus, requiring I/O evolution or DIMM form factors.

HX NVMe

HyperFlex Acceleration Engine:

The HyperFlex Acceleration Engine is an optional PCIe I/O card which off loads the always on compression from the CPU, freeing up more of those valuable CPU cycles for workloads.

Cisco Intersight 

Intersight is Cisco’s SaaS management portal for UCS Servers and HX clusters. It automates monitoring, logging of TAC cases and collecting and uploading logs. There are 2 licence options available Basics and Essentials. Basics is free and gives you monitoring, automated call logging of all your UCS and HX servers. In addition Essentials gives the capability to KVM Servers, deploy and monitor the hypervisor OS and version check drivers against the vendors HCL. There is also an on prem Virtual Intersight appliance option for clients that for whatever reason cannot use the SaaS offering.

In many of the chats I had at Cisco Live it was repeatably mentioned that there is a huge amount of R&D going in to Intersight with much more functionality planned especially around orchestration and automation. So well worth setting your self up a free Intersight account and adding your Cisco UCS or Hyperflex Clusters to it. You could even add UCS Platform Emulator instances to it if you just want a play for now.

Evolution of the Network Engineering role.

Over the last several years the role of the network engineer has been rapidly evolving, moving from CLI to API configuration methods, and focusing on network programmability and automation of repetitive or tedious tasks. Cisco are certainly enabling this evolution with the myriad of classroom sessions and labs available around automating and orchestrating the network and have a huge amount of free training offerings at DevNet developer.cisco.com

As in previous years Cisco again raised the bar with the quality of the DevNet sessions at Cisco live, and giving some great real world examples of where automation can make such a difference.

Automating the network does not change the what it changes the how, so you still need to understand networking, automation just gives you more tools to get the same job done, but in a smarter more efficient and deterministic way.  It must be said, there is no single or magic recipe to automate the network, it requires consultation with the client, to determine their requirements, current skill set and tooling preferences.

Concepts that the Network Engineer would greatly benefit from include:

  • Understanding the requirements for a cloud native environment
  • Understanding automation tools like Ansible or Terraform
  • Understanding basic coding
  • Understanding version control (eg. GIT)

So that’s what I got up to last week, a great week in all, and  Save the Date for Cisco Live Europe 2020, back in Barcelona. January 27-31! Hope to see you there!

Posted in Cisco ACI, HyperFlex | Tagged , , , , , , , , , , , , , , , , | 1 Comment

VMware NSX-V Cross-VC Failover

Posted in VMware NSX | Tagged , , , , , , , , | 2 Comments

An Epyc new addition to the UCS Family!

 

Back in February of this year, when I read an article in The Register, announcing that Raghu Nambiar, the then chief technology officer for UCS servers had joined AMD. I didn’t think too much of it, but when I also saw that AMD were, for the first time (in my memory), exhibiting at Cisco Live, My right eyebrow rose in a particular “Roger Moore esque” manner, and I sensed something may well be afoot.

Some of you may well have noticed that even since 2009 there has always been an AMD CPU server qualification policy in Cisco UCS Manager , and several years ago I did bring this up with Cisco, as to why in an exclusively Intel based product would need such a policy, to which, if memory serves, the answer at the time was “never say never”

Well today that “prophecy”  was fulfilled with the announcement of the Cisco UCS C4200 chassis which can house up to 4 x C125 M5 server nodes which are exclusively AMD EPYC based.

C4200

C4200 containing 4 x C125 Server Nodes

Now I know what you are all probably thinking, a modular UCS server? didn’t Cisco already try this with the M-Series which they decided end of life back 2016. But the answer is NO! the M-Series was a completely different beast, which was geared around host “dis-aggregation” with larger numbers of much smaller spec hosts built upon the lower spec Intel XEON E3 CPUs, with shared I/O and shared disks not to mention the M-Series was UCSM managed only.

In contrast the C4200/C125 M5  has the following specs.

C4200 2 Rack Units chassis contains up to 4 x C125 M5 Server nodes
24 drives per C4200, 6 dedicated to each node, 2 of those 6 can be NVMe
2 x AMD EPYC 7100 Series CPUs up to 32 Cores each
Up to 2TB RAM per node
Up to 46.8 TB HDU per node (6 x 7.8 TB SSD)
2 x 2400W PSUs
Optional 4th Gen VIC 10/25/40/50/100Gbps  (to be released later this year.)
Plus the C125 can be managed by UCS Manager, UCS Central, from the Cloud with Cisco Intersight, Stand-a-lone CIMC, or 3rd Party tools.

C125 M5 Rack Server Node

C125 M5 Server Node

If there are 3 words that describe why Cisco have chosen the AMD EPYC CPU along with the modular form factor, they would be Density, Density and Density as it is possible to pack a whopping 128 Cores per Unit of rack space.  The graphic below compares density volumetrics against the UCS C220 rack mount server

2018-05-29 (4)

But all these “speeds and feeds” stats are great, but what business requirements will these new servers address? and what particular workloads or industries will particularly benefit from them? Well as can be seen in the below graphic, Cisco are positioning the C125 for any compute intensive applications or where an exceptional amount of compute density is required, as well as Gaming/E-Gaming. And interestingly Cisco also list High Frequency Trading (HFT) and enterprise High Performance Compute (HPC) as a particular use case for the C125 markets that up until now Cisco had never actively targeted,  Which would explain the addition of the Open Compute Project (OCP) 2.0 Mez slot supporting options such as InfiniBand for ultra low latency networking..

2018-05-29 (7)

Cisco UCS Portfolio

As ever with the Cisco UCS family it’s all about options and Flexability and while there are several “all rounder” options there are definitely sweet spots for certain UCS family members. Bill Shields of Cisco has produced a nice radar diagram below to guide you as to these sweet spots depending what use cases you are looking to address.

As you can see the C125 M5 wins out in the density areas, but if minimal cabling is a priority then Blades are a great option or the S3260 servers for maximum storage. The reality being that a combination of these servers may well be the best overall solution in many cases, hitting that optimised price point for each element of the solution.

 

Sharing that storage!

While Cisco have not announced any Software Defined Storage (SDS) option for the C125 I think it would also make a great Hyperconverged node and as Cisco already have HX Data Platform  in the portfolio it would make great sense to combine the 2. So who knows we may see Cisco “HyperFlex Up” the C125 M5 in the future. But in the meantime there is always the option to run an SDS solution like StorMagic or VMware VSAN if that’s the way you want to go. But of course traditional NAS and SAN solutions are also very valid storage options.

Closing thoughts

For me the big differentiation of Cisco UCS has always been the management Eco system.  It is a huge plus to be able to manage hundreds of servers as easy as one. And having that management platform available on premises or from the cloud and covering the whole UCS family regardless of whether they are blades, rack mounts, modular or Hyper-converged nodes  is a huge Cisco USP.

Links for further reading

For more information and data sheets on the C4200 and C125 click here

Rather than me call out the different pros and cons of AMD vs Intel, prices per watt and Thermal Design Power (TDP) stats etc..  AnandTech do a great job of an independent “Apples with Apples” comparison of how the AMD EPYC CPU compares to the Intel Skylake CPU Here.

As always let me know your thoughts in the comments!

Colin

 

Posted in Product Updates | Tagged , , , , , , | 10 Comments