Cisco UCS Generation 3 First Look.

I noticed last week that Cisco have just released the eagerly awaited Cisco UCS 3.1 code “Granada” on CCO.

Whenever a major or a minor code drop appears the first thing I always do is read the  Release Notes , to see what new features the code supports or which bugs it addresses. Well in reading the 3.1 release notes it was like reading a Christmas list. There are far too many new features and enhancements to delve into in this post, so I will just be calling out the ones that most caught my eye

Cisco UCS Gen 3 Hardware

At the top of the release notes is the announcement that 3.1 code supports the soon to be released Cisco UCS Generation 3 Hardware.

With the new Gen 3 products the UCS infrastructure now supports full end to end native 40GbE.

The Generation 3 Fabric Interconnects come in 2 models the 6332 which is a 32 Port 40G Ethernet/FCoE Interconnect. And the 6332-16UP which as its model number would suggest comes with 16 Unified ports, which can run as 1/10G Ethernet/FCoE or as 4,8,16G Native Fibre Channel.

The 6 x 40G Ports at the end of the Interconnects do not support breaking out to 4 x 10G ports, and are best used as 40G Network uplink ports.

The Gen 3 FI’s are a variant of the Nexus 9332 platform and although the Application Leaf Eungine (ALE ) ASIC which would allow them to act as a leaf node in a Cisco ACI Fabric is present, that functionality is not expected until the 4th Gen FI.



Cisco 6332 Fabric Interconnect




Cisco 6332-16UP Fabric Interconnect


6300 table

6300 Fabric Interconnect Comparison



Now that there is a native 40G Fabric Interconnect we obviously need a native 40G Fabric Extender / IO Module, well the UCS-IOM-2304 is it. 4 x Native 40G Network Interfaces (NIFs) and 8 x Native 40G Host Interfaces (HIFs), one 40G HIF to each blade slot. These 40G HIFs are actually made up of 4 x 10G HIFs but with the correct combination of hardware are combined to a single native 40G HIF.

Now as the purists will note in Network speak 1x native 40G is not the same as 4x10G, It’s a speed vs. bandwidth comparison. The way I like to explain it, is if you have a tube of factor 40 sun lotion, and 4 tubes of factor 10, they are not the same thing, 4 x 10 in this case does not equal 40, you just have 4 times as much 10 🙂 It’s a similar analogy when comparing Native 40G Speed & BW and 40G BW made up of 4 x 10G speed links. Hope that makes sense.


Cisco 2304 IO Module


As mentioned above depending on the hardware combination you have, will give differing speed and bandwidth options. The three diagrams below show the three supported options.

2304 all 3

One Code to Rule Them All.

Next up is that UCSM 3.1 is a Unified code, which supports B-Series, C-Series and M-Series all within the same UCS Domain as well as support for UCS Mini and all this in a single code release.

HTML5 Interface.

I’m sure you have all shared my pain with Java in certain situations, so you will be glad to hear that UCSM 3.1 offers an HTML5 user interface on all supported Fabric Interconnects. (6248UP,6296UP,6324,6332 and 6332 16UP) 6100 FI and UCS Gen 1 Hardware is no longer supported as of UCSM 3.1(1e) so do not upgrade to 3.1 you will need to stay on UCSM 2.2 or earlier until you have removed all of your Gen 1 hardware!. (refer to release notes for full list of depreciated hardware)

Support for UCS Mini Secondary Chassis.

I have deployed UCS mini in various situations mainly DMZ, Specialised Pods like PCI-DSS workloads or branch office, and I while 8 Blades and 7 rack mounts has always been enough in many cases an “all blade solution” would have been preferred , well now UCS mini can scale to 16 blades. , You can now daisy chain a second UCS Classic chassis from the 6324FI allowing for an additional 8 blades. You can either use the 10G ports for the secondary chassis or with the addition of the QSFP Scalability Port License you can break out the 40G scalability port to 4 x 10G, as shown below.

UCS Mini 2 Chassis

UCS Mini Secondary Chassis


Honourable Mentions.

  • There is now an Nvidia M6 GPU for the B200M4 which would provide a nice graphic intensive VDI optimized blade solution.
  • New “On Next Boot” maintenance policy that will apply any pending changes on an operating system initiated reboot.
  • Firmware Auto Install will now check the status of all vif paths before and after the upgrade and report on any difference.
  • Locator LED support for server hard-disks.

Well that about covers it in this update, am looking forward to having a good play with this release and the new hardware when it comes out.

Until next time




Posted in Product Updates | Tagged , , , , , , , , , , , , , | 19 Comments

CCIE Program Refresh, My thoughts.

The world of technology as we know is changing fast, faster than many predicted. This is certainly true in the data center. In the “Old days” (more than 3 years ago) most of my time was spent evangelising a particular product or adjudicating a bake off between two or more vendor platforms.

These days the infrastructure conversations tend to be far shorter, the fact is infrastructure these days is a given, and the true differentiator is how easy that infrastructure is to consume, automate and orchestrate in a cloud stack or converged solution.

Gone are the days of product led engagements (and rightfully so) these days it’s all about solution led engagements. Taking a business requirement and translating that into a technical solution which truly drives business outcomes.

This solutions led approach, inevitably leads to a closer collaboration between teams across all elements of the cloud stack, portal developers, applications developers, automation/orchestration, backup, infrastructure, storage etc.

The Human API

Just like all the above elements use various API’s to hook seamlessly into the other components of the solution, the true modern day Consultant needs “Human API’s” in their skill set to design deliver and integrate their elements into the overall business solution.

It is unrealistic to expect a networking consultant to be an expert in all the various solutions into which the network could integrate for example OpenStack, the automation/orchestration and Cloud Management Platforms like Cisco UCS Director, or to fully understand a culture like DevOps.

What the modern day networker does need to know however, is how all these various elements interact and consume the network. They need the “Human API” into each of these technologies, which means knowing enough about the other teams, to talk a common language in order to design and implement a truly holistic and optimised solution.

With this in mind Cisco have revamped the CCIE program to address this shift in skill set and the need to align certifications much closer to these evolving job roles. Cisco have done this by adding in these “Human API’s” across all CCIE Tracks. No longer can you afford to be isolated into a particular technology track with no consideration to the bigger picture, sure you still need to be an expert in your chosen technology but there is a framework common to all tracks which are essential skills in the modern day consultant or engineer.

So what’s changing?

Cisco is adding in a common framework around these evolving technologies (E.T’s) across all CCIE Tracks, this framework will make up 10% of the CCIE Written Exams with the other 90% focused on the specific track. The Lab exams will not however include these E.T’s and remain 100% focused on the particular track.


As Cisco always give at least 6 months notice of any blueprint changes, this additional common Evolving Technologies section will come into effect in the written exams in July 2016


CCIE Cloud?

I’m sure like me, when you see the Certification road map below, you cannot help but notice the missing box in the top right hand corner.


This has led to the speculation of an imminent CCIE Cloud track, incorporating Nexus 9000 and Cisco Application Centric Infrastructure (ACI). I for one was certainly hoping for one, in order to pursue adding a 3rd CCIE to my resume. This however is not the case, but I’m sure this news will come as a great relief to my wife.

However Nexus 9K and Cisco ACI will be added to the CCIE Data Center track in the CCIE DC 2.0 Refresh due July 2016, which make sense, and updates the DC Lab blueprint in-line with the skills required to design, implement and integrate a Cloud ready data center.

UCS Director, UCS Central, REST API, and Python are all now listed in the ‘Data Center Automation and Orchestration’ section of the Written and Lab Blueprints. Which up until now have never been covered in the CCIE DC program but do form a huge part of almost every discussion I have with my clients, so great to see they are now included, thus aligning the certification much closer to the actual modern day job role.

The Data Center Lab 2.0 exam will also add a 60min “Diagnostic” section in which no console access is given but you need to ascertain likely causes of issues from various evidence trails like E-mails, diagrams and screenshots. This will be followed by the 7hr Troubleshooting and Configuration section. You need at least a minimum score in both sections to pass the overall exam.

Security Everywhere

Scarcely a day goes by without the press reporting a hacking attempt or a compromise in security of house hold named companies.

The fact is, as we embrace the innovation that cloud brings along with the borderless network and IoT, security can no longer be thought of as a set of products but instead it needs to be a mind set and be holistically integrated throughout the entire solution.

The threat landscape is ever changing, cyber criminals along with deploying very advanced techniques now have widely distributed attack surfaces.

This coupled with the fact that there is a huge industry shortage of trained security professionals has led Cisco to also revamp the CCNA Security certification to include all these modern security concerns, by expanding the scope of the certification to cover topics like Cloud, Web and Virtualisation Security, BYOD, ISE, Advanced malware protection as well as including the FirePOWER and FireSIGHT product portfolio.

All this begins at the associate level and rightly so as Security needs to be woven into everything that we do, security needs to be everywhere.

Final Thought

I for one welcome this announcement from Cisco, as there are an awful lot of traditional networkers out there wondering what skills they will need to stay relevant in the new world order of software defined networking and Cloud.  And it’s great to see Cisco addressing the needs of its core advocates and bringing them on this new and exciting journey with them. Once again Cisco raises the bar for industry talent at every level.


Posted in CCIE DC, Cisco Champion, SDN | Tagged , , , , , , , , , , , , | 3 Comments

Why did “Renaming” an unused VLAN bring down my entire production management environment?

I was recently asked to investigate an issue which on the face of it sounded very odd.

We had installed a fairly large FlexPod environment running VMware NSX across a couple of datacentres

During pre-handover testing the environment suffered a complete loss of service to the vSphere and NSX management cluster.

When I asked if anything had been done immediately prior to the outage, the only thing they could think off was that a UCS admin (to protect his identity let’s call him “Donald”) had renamed an unused VLAN, which had no VM’s in it, so was almost certainly not the cause and just a coincidence.

Hmmmm I’ve never really been one to believe in coincidences, and armed with this information, I had a pretty good hunch where to start looking.

As I suspected both production vNICs (eth0 & eth1) of all 3 hosts in the management cluster were now showing as down/unpinned in UCS manager.

This was obviously why the complete vSphere production and VMware NSX management environment were unreachable, as all the Management VM’s including vCenter, NSX manager along with an NSX Edge Services Gateway (ESG) that protected the management VLAN resided on these hosts, all of which had effectively just had their networking cables pulled out.

So what had happened?

As you may know you cannot rename a VLAN in Cisco UCS so Donald had deleted VLAN 126 and recreated it with the same VLAN ID but a different name (“spare” in this case). This wasn’t perceived as anything important as there were not yet any VM’s in the port-group for VLAN 126.

Donald then went into the updating vNIC template to which the 3 vSphere management hosts were bound and added in the recreated VLAN 126.

And that is when all management connectivity was lost.

The issue was, as per best practice when using vPC’s on Cisco UCS with NSX, there were two port-channels northbound from each Fabric Interconnect one for all the production VLANs connected to a pair of Nexus switches running virtual port-channels vPC and the other a point to point port-channel to carry the VLAN used for the layer 3 OSPF adjacency between the Nexus switches and the virtual NSX Edge Services Gateways (ESG’s) as it is not supported to run a dynamic routing adjacency over a vPC.

So obviously VLAN groups have to be used to tell UCS which uplinks carry which VLANs (just like a disjointed L2 setup)

Cisco UCS then compares the VLANs on each vNIC to those tagged on the uplinks and thus knows which uplink to pin the vNIC to.

Unfortunately as Donald found out this is an “All or nothing” deal, unless ALL of the VLANs on a vNIC exist on a single uplink that entire vNIC and ALL its associated VLANs will not come up. Or as in this case will just shut down.

So when VLAN 126 was deleted and recreated with a new name, this new VLAN did not exist on the main vPC UCS uplinks (105 & 106) hence all hosts bound to that updating vNIC template immediately shut down all their production vNICs (eth0 & eth1) as there was no longer an uplink carrying ALL their required VLANs to which to Pin to. (Cisco UCS 101 really)

As soon as I added the recreated VLAN to the vPC uplink VLAN group, all vNICs re pinned, came up and connectivity was restored. (I could have also just removed this new VLAN from the vNIC template) either way the “All or nothing” rule was now happy.

As per best practice all the clients user workloads and supporting vSphere and NSX infrastructure were located on different vSphere clusters and thus were unaffected by this outage.

There are numerous ways to avoid the above issue, for example you could take out the vPC element and just have a singular homed port-channel carrying all VLANs from FI A to Nexus A and the same from FI B to Nexus B.

Or as was done in this case, the run book was updated and everyone informed that in this environment VLAN groups are in use, thus ensure that a newly created VLAN is added to the relevant VLAN group, before it is added to the vNIC template.

I would like to see a feature added to UCS that changes this behaviour to perhaps only isolate the individual VLAN(s) rather than the whole vNIC, but I can think of a few technical reasons as to why it likely is how it is. Or at least a warning added, if the action will result in a vNIC unpinning.

In a previous post UCS Fear the Power? I quoted Spider-man that “with great power comes great responsibility”

This was certainly true in this case, that seemingly minor changes can have major effects if the full ramifications of those changes are not completely understood.

And before, anyone comments. No I am not Donald :-), but I have done this myself in the past so knew of this potential “Gotcha”

But if this post saves just one UCS Admin from a RGE (Résumé Generating Event) then it was worthwhile!

Don’t be a Donald! and look after that datacentre of yours!


Click on the Images to enlarge.



Posted in General | Tagged , , , , , , , , , , | 8 Comments

Cisco UCS Integration with Cisco ACI (Manual Method)

This video gives a complete walkthrough of configuring a Cisco UCS domain into a Cisco ACI Fabric, and then extending the Cisco ACI Policy into a VMware vSphere environment within that UCS infrastructure.

There has also been an update to this video showing the same configurations using the Wizards and Canvas available in later versions of Cisco ACI (Link below)


Have fun!


Posted in Cisco ACI | Tagged , , , , , , , | 3 Comments

Where’s the Tyrrel gone?

Those that know me, will be well aware that I’m no stranger to the inside of a saloon, and it was during one of these evenings that I got into a debate on Formula 1, now I don’t really follow Formula 1, and don’t usually have an opinion on it, but one of my comments on this occasion was something like “I miss the good old days when the cars were all different, and that I always wanted the 6 wheeler to win”

Now some of my Formula 1 following colleagues are a little younger than myself, and had no idea what I was talking about, but with the aid of a smartphone 30 seconds later they were all amazed.

The car of course was the Tyrrel P34








Now-a-days all cars are computer developed and optimised in wind tunnels, and not surprisingly it turns out that there is only one optimum design, which is now why all the cars look practically identical.

Sadly gone are the days of wacky ideas and cars that truly differentiated from each other.

The same can be said of IT, not all that long ago it was easy to compare vendors, mock the “wacky” ones, and promote the features that we as independent consultants felt would truly benefit our clients

But as with race cars, in the science of IT there is generally a single optimised method of doing or making something, to which most hardware vendors are now aligned, hence the influx of merchant silicon products and commoditised integrated systems.

In the “old days” I could often be found at a white board, chatting speeds and feeds with a client, and while I occasionally still get dragged into those discussions, the fact is if we’re at that point the client is really barking up the wrong tree.

So the question is: How can we help our customers truly differentiate themselves from their competition in a world of ever increasing IT commoditisation?

Well the answer, as in Formula 1, is in the “Variables”

Taking out of the equation the odd technical failure, the key to winning most F1 races is the skill and consistency of the driver.
The driver in F1 equates to the Software stack in IT, and that is where most solutions can really differentiate themselves.

Most of my discussions with clients these days start with topics like, what they consider as their key differentiators from their competition, and the pain points and barriers that they currently experiencing. And from these discussions we can translate these business requirements into technical solutions.

An example being, telling a client we can improve the de-duplication within their storage array, may not mean much, but telling them we can increase the number of tenants they can host on the same infrastructure by 20% may sound a lot more compelling.

All this is just another example of how the IT industry is changing for the “Traditional Networker” the differentiation in the hardware is becoming less and less, and increasingly in the intelligence and software that consumes it.

Fun times ahead.

Posted in SDN | Tagged , , | 5 Comments

Cisco UCS New M4 Additions!

Following on from last weeks big announcements and the teaser on the new M4 line-up I am pleased to say I can now post the details of that new line-up.

The new M4 servers, are based on the Intel Grantley-EP Platform, incorporating the latest Intel E5-2600 v3 (Haswell EP) processors. These new processors are available with up to an incredible 18 Cores per socket and support memory speeds up to a blistering 2133MHz.

Which in real terms means far denser virtualised environments and higher performing bare metal environments, which equates to less compute nodes required for the same job, and all the other efficiencies having a reduced footprint brings.

The new models being announced today are:

New M4 line-up

New M4 line-up


The stand out details for me here, are that the two new members of the C-Series Rack Mount family now come with a Modular LAN on Motherboard (mLOM) the VIC 1227 (SFP) and the VIC 1227T (10GBaseT). Which means this frees up a valuable PCIe 3.0 slot.

The C220M4 has an additional 2 x PCIe 3.0 Slots which could be used for additional I/O like the VIC1285 40Gb adapter or the new Gen 3 40Gb VIC1385 adapter. The PCIe slots also support Graphic Processing Units (GPU) for graphics intensive VDI solutions as well as PCIe Flash based UCS Storage accelerators.



In addition to all the goodness you get with the C220 the C240 offers 6 PCIe 3.0 Slots, 4 of which are full height, full length which should really open up the 3rd party card offerings.

Also worth noting that in addition to the standard Cisco Flexible Flash SD cards, the C240 M4 also has an optional 2 internal small form factor (SFF) SATA drives. Ideal if you want to keep a large foot printed operating system physically separate from your front facing drives.



And now for my favourite member of the Cisco UCS Server family, the trusty, versatile B200, great to see this blade get “pimped” with all the M4 goodness, as well as some great additional innovation.


So on top of the E5-2600v3 CPU’s supporting up to 18 Cores per socket, the ultra-fast 2133MHz DDR4 Memory, as well as the new 40Gb ready VIC 1340 mLOM, what I quite like about the B200M4 is the new “FlexStorage” Modular storage system.

Many of my clients love the statelessness aspects of Cisco UCS and to exploit this to the Max, most remote boot. And while none of them have ever said, “Colin it’s a bit of a waste, I’m having to pay for, and power an embedded RAID controller, when I’m not even using it”, well now they don’t have to, as the drives and storage controllers are now modular and can be ordered separately if required, or omitted completely if not.

But if after re-mortgaging your house you still can’t afford the very pricey DDR4 Memory, worry ye not, as the M3 DDR3 Blades and Rack mounts certainly aren’t going anywhere, anytime soon.

Until next time.
Take care of that Data Center of yours!

Posted in Product Updates | 12 Comments

Cisco UCS: Major Announcement

Hi All
It’s finally September the 4th which means only one thing, the biggest single day of Cisco UCS announcements since the products launch 5 years ago.

The strapline of the launch is “Computing at every scale” And “Scale” both large and small is certainly the consistent messaging with all the new announcements.

UCS Mini

In a previous post (which can be found here) I did quite a comprehensive write up on the Cisco UCS 6324 Fabric Interconnect and Next Gen UCS Chassis, so I won’t go into the technical details again, but today Cisco officially unveil their vision for what they have now branded “UCS Mini”.

As mentioned the theme today is scale, and as we know, a significant percentage of servers in use today are outside of the Data Center, these use cases may be large companies with branch offices, retail outlets, remote data collection points or any use case where the Compute needs to be close to the demand.

And then there is another requirement where a smaller company simply wants a ready assembled and simplified “All-in-One” solution. In either case a more “non Data Center” friendly platform is required.

Cisco refer to these environments as “Edge-Scale” environments, and that is the use case that “UCS Mini” is designed for.

Cisco UCS Mini provides Enterprise Class compute power to these “Edge-Scale” environments without comprising management or visibility as UCS Mini fully integrates with Cisco UCS Central and UCS Director.

UCS Mini

OK so that’s the UCS Mini update covered, and at any other time, I’m sure you’d agree that’s a pretty comprehensive and cool update. But in the words of Bachman Turner Overdrive “you ain’t seen nothing yet!

Cloud Scale Computing

Ok so we have UCS Mini extending the Data Center to the Edge, Then we obviously have the UCS Data Center Core offerings which we are no doubt all familiar with.

But now, and certainly the element of the announcement that I find most interesting comes the “Cloud-Scale” computing environment.

Cloud Scale Computing

In the Enterprise we traditionally see either a 1 to 1 relationship between a server and an application or in the case of a Hypervisor a single physical server may host many applications.

In the world of “Cloud-Scale” Computing the reverse is true there is a single application utilising many servers. Examples of these Cloud-Scale models are Analytics, E-Gaming, eCommerice to name but a few.

The key with these applications is to be able to scale the compute while at the same time adding minimal overhead and things you don’t necessarily need, like fans, power supplies and peripherals etc… and even elements like storage and I/O if they are not the points of constraint.

I don’t need to tell you how much of this potentially unnecessary “overhead” would be in a rack of 16 1U servers, each with redundant NICs, Fans, Power supplies and so on.

True a Blade solution does alleviate this overhead to some degree, but is still isn’t designed specifically for the use case.

So if both C-Series and B-Series are not perfectly aligned to the task what is?

The answer is the new M-Series Modular Servers.

M Series Modular Servers

A single M-Series M4308 Modular Chassis, can give you the same CPU density as the 16 x 1U Rack Mount servers in the example above but with a fraction of the “overhead”, allowing for true Compute Cloud-Scaling and all within a 2RU chassis.

Each 2RU M-Series Chassis can contain up to 8 front loading UCS M142 “Compute Cartridges” and each Compute Cartridge contains 2 independent Compute Nodes, each with a single Intel XEON 4Core E3 processor and 32GB RAM (4 DIMM Slots), with no Network Adapters, No storage and no peripherals. Just raw Compute and Memory.

The Storage and I/O in the back of the Chassis is completely independent from the Compute Cartridges and acts as a shared resource available to them all. This separation is made possible by a innovation called “Cisco System Link Technology” This server “Disaggregation” negates the usual issues of sub-optimal CPU to Storage and IO ratios and allows both to be independently scaled to the requirement.

A 3rd Generation Cisco VIC provides the internal fabric through which all components communicate as well as providing the dual 40Gb external connectivity to the Fabric Interconnects.

The 4 SSD’s allow up to 6.4TB of local storage. which is configured in a RAID group and logically divided amongst the Compute Nodes within the cartridges, which just see a locally attached SCSI LUN.

At FCS it will not be possible to mix current B and C Series UCS servers with the M-Series which will need a dedicated pair of 6200 Fabric Interconnects.

A single UCS Domain can scale to 20 M-Series Chassis along with all the associated Cartridges and Compute Nodes (Giving the grand total of 320 Supported Servers).

At first glance the M-Series may look a bit “Nutanixy” however Nutanix is a “Hyper-converged” architecture rather than a “Disaggregated” Architecture, what’s the difference?
well that a fun post for another day.

NB) Earlier this month Cisco did announce a deal with Simplivity to address the “Hyper-converged” market

A better comparison to the Cisco UCS M-Series would be the HP Moonshot and Cisco are confident that the M-Series has many advantages over Moonshot.

C3000 Rack Storage Server

Lastly but certainly not least is the Cisco C3160, a Stand-a-lone Server completely optimised for storage. The C3160 would be ideal to provide the complimentary storage capacity for the M-Series compute nodes, but could equally provide storage to UCS B-Series Blades and UCS C-Series Rack mounts (up to 240TB per 4U Server at FCS utilising 4TB drives, ).

Where the M-series provides the transactional front end, the C3160 provides the storage for context and content.

Typical use cases for the C3160 in conjunction with the M-Series servers would be a Big Data type application. This combination is also well suited to an Open stack environment with the M-Series serving as the Compute Nodes (Nova) and the C3160 would serve as the Storage node running Cephs.

The management for the C3160 is provided by a Cisco IMC, just like using a stand-a-lone C-Series today, and while I don’t know, I would think UCS Manager integration would be a great and logical future update.

All storage within the C3160 is presented and owned locally by the server (Dual E5-2600v2, with up to 256GB DDR3 RAM at FCS), A mirrored pair of SFF SSD’s are available for an operating system which can then just farm out the storage via the protocol of choice.

A great point about the C3160 is that it is not only 4RU high but at 31.8 inches deep will fit into a standard depth rack.

C3160 Rack Server

Anyway, huge update this one, awesome job Cisco! and congratulations, and I for one am certainly looking forward to having a good play with all these new products.

And as a Teaser to next weeks official announcements of the new M4 line-up, I can give you a sneak peek below, but tune in on the 8th September, Same UCS Time, same UCS channel, when we’ll take a look under these covers as there are a few surprises lurking beneath.

New M4 line-up

Until next time
And take care of that Data Center of yours!


Posted in Product Updates | Tagged , , , , , , , , | 2 Comments

Unification Part 2: FCoE Demystified

As promised here is the 2nd part of the Unified Fabric post, where we get under the covers with FCoE.

The first and most important thing to clarify is as its name suggests Fibre Channel over Ethernet (FCoE) still uses the Fibre Channel protocol, and as such all the higher level processes that needed to happen in a Native Fibre Channel environment FLOGI/PLOGI etc., still need to happen in an FCoE environment.

So having a good understanding of Native Fibre Channel operations is key. So let’s start with a quick Native Fibre Channel recap:

For the IP Networker I have put some parentheses () and corresponding IP services that can be very loosely mapped to the Fibre Channel process to aid understanding.

Native Fibre Channel

Initiators/Targets contain Host Bus Adapters (HBA’s) which in Fibre Channel terms are referred to as Node ports (N ports).

These N Ports are connected to Fabric Ports (F ports) on the Fibre Channel switches.

Fibre Channel switches are then in turn connected together via Expansion (E) Ports, or if both Switches are Cisco you have the option of also Trunking multiple Virtual SANs (VSANs) over the E ports in which case they become Trunking Expansion Ports (TE Ports).

First the initiator (server) sends out a Fabric Login (FLOGI) to the well-known address FFFFFE, this FLOGI registers the unique 64bit World Wide Port Name (WWPN) of the HBA (Think MAC Address) with the Fibre Channel Name Server (FCNS).

The FCNS is a service that automatically runs on an elected “Principal switch” within the Fabric. By default the switch with the lowest Domain ID in the Fabric is elected the Principal Switch.
The Principal Switch is also in charge of issuing the Domain ID’s to all the other switches in the Fabric.

The FCNS then sends the initiator back a unique 24bit routable Fibre Channel Identifier (FC_ID) also referred to as an N_Port_ID (Think IP Address) the 24bit FC_ID is expressed as 6 Hexadecimal digits.

So the basic FLOGI conversation goes something like “Here’s my unique burned in address” send me my routable address (think DHCP)

The 24bit FC_ID is made up of 3 parts:

• The Domain ID, which is assigned by the Principal switch to the Fibre Channel switch, to which the host connects.
• The Area ID, which actually is the port number of the switch the HBA is connected to.
• The Port ID which refers to the single port address on the actual host HBA.

The above format ensures FC_ID uniqueness within the fabric.

Figure 1 Fibre Channel Identifier

Once the initiator receives its FC_ID, it then sends a Port Login (PLOGI) to well-known address FFFFFC which registers its WWPN and assigned FC_ID with the Fibre Channel Name Server (FCNS). (Think of the FCNS Server like DNS). The FCNS then returns all the FCID’s of the Targets the initiator has been allowed to access via the Zoning policy.

Once the PLOGI is completed, the initiator starts a discovery process, to find the Logical Unit Numbers (LUNs) it has access to.

The FLOGI database is locally significant to the switch and only shows WWPN’s and FC_ID’s of directly attached Initiators/Targets, the FCNS database on the other hand is distributed across all switches in the fabric, and shows all reachable WWPN’s and FC_ID’s within the Fabric.

Native Fibre Channel Topology
Figure 2 Native Fibre Channel Topology.

OK History lesson over.

The Fibre Channel protocol has long proven to be the best choice for block based storage (storage that appears as locally connected), and FCoE simply takes all that tried and tested Fibre Channel performance and stability, and offers an alternative layer one physical transport in this case Ethernet.

But replacing the Fibre Channel transport, did come with its challenges, The Fibre Channel physical layer creates a “lossless” medium by using buffer credits; think of a line of people passing boxes down the line, and if the next person does not have empty hands (available buffer), they cannot receive the next box, so the flow is “paused” until the box can again be passed.

Ethernet on the other hand expects drops and uses windowing by upper layer protocols in order to not over whelm the receiver, instead of a line of people passing a box from hand to hand, think of a conveyor belt with someone just loading boxes on it, at an ever increasing speed, until they hear shouts from the other end that boxes are falling off, at which point they slow their loading rate and gradually speed up again.

So the million dollar question was how to send a “lossless” payload over a “lossy” transport.

The answer to which, was several enhancements to the Ethernet Standard generally and collectively referred to as Data Centre Bridging (DCB)

Fibre Channel over Ethernet

OK so now we have had a quick refresher on Native Fibre Channel, let’s walk through the same process, in the converged world.

First of all let’s get some terminology out of the way,

End Node (E-Node) the End host in an FCoE network, containing the Converged Network Adapter (CNA) this could be a Server or FCoE attached Storage Array.

Fibre Channel Forwarder (FCF) Switches that understand both Ethernet and Fibre Channel protocol stacks.

NB) An FCF is required whenever FC encapsulation/de-encapsulation is required. But as an FCoE frame is a legal tagged Ethernet frame it could be transparently forwarded over standard Ethernet switches.

The next thing to keep in mind is that Fibre Channel and Ethernet work very differently, Ethernet is an open mulit-access medium, meaning that multiple devices can exist on the same segment and can all talk to each other without any additional configuration.

Fibre Channel on the other hand is a closed point to point medium , meaning that there should only ever be point to point links, and hosts by default cannot communicate with each other, without additional configuration called Zoning (Think Access Control List).

So if you keep in mind that in an FCoE environment we are creating 2 separate logical point to point Fibre Channel Fabrics (A&B) just like you have in a native Fibre Channel environment, you should be in pretty good shape to understand what config is required.

So as explained in the Native Fibre Channel refresher above, an N Port in a Host, connects to an F port in a switch and then that switch connects to another Switch via an E port, similarly in the FCoE world we have a Virtual N Port (VN_Port) which connects to a Vitrual F Port (VF_Port) in the FCF and then if two FCF’s need to be connected together this is done with Virtual E (VE_Ports).

As can also be seen in the below figure, as the FCF is fully conversant in both Ethernet and Fibre Channel as long as they have native FC ports configured they can quite happily have native FC initiators and Targets connected to them.

Figure 3: Multi-Hop Fibre Channel over Ethernet Topology

So as can be seen above an FCoE Network is a collection of virtual Fibre Channel links, carried over and mapped onto an Ethernet Transport, but what makes the logical links between the VN_Ports, VF_Ports and VE_Ports? Well a few control protocols are required, collectively known as FCoE Initialisation Protocol (FIP) and it is FIP which enables the discovery and correct formation of these virtual FC links.

Under each physical FCoE Ethernet port of the FCF a virtual Fibre Channel Port (vfc) is created, and it is the responsibility of FIP to identify and create the virtual FC link.

Each virtual FC link is identified by 3 values the MAC addresses at either end of the virtual circuit and the FCoE VLAN ID which carries the encapsulated traffic.

Every FC encapsulated packet must use a VLAN ID dedicated and mapped to that particular VSAN. No IP data traffic can co-exist on a VLAN designated on the Nexus switch as an FCoE VLAN. If multiple VSANs are in use, a separate FCoE VLAN is required for each VSAN.

As we know Ethernet has no inherent loss prevention mechanisms, therefore an additional protocol was required in order to prevent any loss of Fibre Channel packets traversing the Ethernet Links in the event of congestion. A sub protocol of the Data Centre Bridging standard called Priority Flow Control (PFC) IEEE 802.1Qbb ensures zero packet loss by providing a link level flow control mechanism that can be controlled independently for each frame priority. Along with Enhanced Transmission Selection (ETS) IEEE 802.1Qaz which enables the consistent management of QoS at the network level by providing consistent scheduling.

Fibre Channel encapsulated frames are marked with an Ethertype of 0x8906 by the CNA and thus can be correctly identified, queued and prioritised by PFC which places them in a prioritised queue with a Class of Service (CoS) value of 3 which is the default for encapsulated FC packets. FIP is identified by the Ethertype of 0x8914.

Before the FIP negotiation can start, the physical link needs to come up, this is a job for the Data Centre Bridging capabilities eXchange (DCBX) protocol, which makes use of the Link Layer Discovery Protocol (LLDP) in order to configure the CNA with the settings (PFC & ETS) as specified on the switch to which the CNA is connected.

FIP can then establish the virtual FC links between VN_Ports and VF_Ports (ENode to FCF), as well as between pairs of VE_Ports (FCF to FCF), since these are the only legal combinations supported by native Fibre Channel fabrics.

Once FIP has established the virtual FC circuit, it identifies the FCoE VLAN in use by the FCF then prompts the initialisation of FLOGI and Fabric Discovery.

The diagram below shows the FIP initialisation process, the green section is FIP which will identified with the Ethertype 0x8914 and the yellow section is FCoE identified with the Ethertype of 0x8906.


It is also worth noting that the E-Node uses different source MAC addresses for FIP and FCoE traffic, FIP traffic is sourced using the burned in address (BIA) of the CNA whereas the FCoE traffic is sourced using a Fabric Provided MAC Address (FPMA).

FPMAs are made up from the 24 bit Fibre Channel ID (FC_ID) assigned to the CNA during the FIP FLOGI process, this 24 bit value is appended to another 24 bit value called the FCoE MAC address prefix (FC-MAP) of which there are 256 predefined values, but as the FC_ID is unique within the fabric itself, Cisco apply a default FC-MAP of 0E-FC-00.

Figure 4 Make-up of the Fabric Provided MAC Address (FPMA)

The fact that FIP and FCoE make use of a tagged FCoE VLAN requires that each Ethernet port configured on the FCF is configured as a Trunk port, carrying the FCoE VLAN. Along with any required Ethernet VLANs. If the Server only requires a single VLAN, then this VLAN should be configured as the Native VLAN on the physical Ethernet port to which the ENode connects.

Ok, it would only be right for me to include a bit on how Cisco UCS fits in to all this.

Well as we know the Cisco UCS Fabric Interconnect by default is in End Host Mode for the Ethernet side of things and in N-Port Virtualisation (NPV) mode for the storage side of things.

This basically means the Fabric Interconnect appears to the servers as a LAN and SAN switch, but appears to the upstream LAN and SAN switches as just a big server with lots of HBA’s and NICs inside.

There are many reasons why these are the default values, but the main reasons are around scale, simplicity and safety. On the LAN side having the FI in EHM prevents the possibility of bridging loops forming between the FI and upstream LAN switch, And in the case of the SAN, as each FI is pretending to be a Host, the FI does not need a Fibre Channel Domain ID, neither does it need to participate in all the Fibre Channel Domain Services.

As can be seen from the below Figure in the default NPV mode the Cisco UCS Fabric Interconnect is basically just a proxy. Its server facing ports are Proxy F ports and its Fabric facing (uplink) ports are Proxy N ports.

Again note no FC Domain ID is required on the Fabric Interconnects.

Also that as we are using Unified Uplinks from the FI to the Nexus (FCF), we cannot use Virtual Port-Channels to carry the FCoE VLAN, as the FCoE VLAN and corresponding VSAN should only exist on a single Fabric. We could of course create an Ethernet Only vPC and then have a separate Unified Uplink carrying the FCoE VLAN to the local upstream Nexus, but if you’re going to do that, you may as well just have stuck with a vPC and Native Fibre Channel combo.

As would be the case with any multi-VSAN host, the Cisco Nexus ports which are connected to the UCS FI are configured as Trunking F (TF) ports.

Figure 5 FCoE with Cisco UCS Unified Uplinks.

Well hope you found this post useful, I’ll certainly be back referencing it myself during the Storage elements of my CCIE Data Center studies, as it is certainly useful having all elements of a multi-hop FCoE environment, along with the Native Fibre Channel processes all in a single post.

Until next time.


Posted in CCIE DC | Tagged , , , , , , , , , , | 5 Comments

Unification Part 1: The Rise of the Data Centre Admin.

This is the first of a 2-Part Post: Part one is a non-technical primer. Then in part two we have some fun sorting out your LLDP from your DCBX with sprinkles of ETS, covered in a PFC sauce topped off with a nice FIP cherry.


In this new world of convergence and unification, I seem to spend a lot of my time either teaching “Traditional Networkers” SAN principals and configuration, or on the other side of the coin teaching “Traditional Storage” people Networking principals and configuration.

These historically siloed teams are increasingly having to work together in order to create a holistic unified/converged network.

It is still quite common for me to get requests from clients to create separate “SAN Admin” and “LAN Admin” accounts on the same Cisco Nexus switch and enforce the privileges of each account via Role Based Access Control (RBAC), and there is by the way, absolutely, nothing wrong with that, especially if both the LAN and SAN are complex environments.

However there is an ever increasing overlap and grey area between the roles of the LAN and SAN administrator, and in a world which is ever focusing on increasing efficiency, simplicity and reduction in support costs, the Role of “Data Centre Administrator” is on the rise.

I’m glad to say that I very rarely ever get dragged into debates about the validity of FCoE these days, as it has undoubtedly proven to be a “no brainer” at the edge of the network, with the significant efficiencies in reduced costs, HBA’s, switch port counts, and all the associated power and cooling reductions that go along with it.

And once the transition to FCoE on the edge is complete, you have to really ask yourself is there any real benefit maintaining native FC links within the network core, or would it be simpler to just bring everything under the Ethernet umbrella.

While the efficiencies and savings of a multi-hop FCoE network are not as much of a “no brainer” as it is at the edge, in my book there’s a lot to be said for just having the same flavour SFP’s throughout the entire network, along with no need to allocate native FC ports in your Nexus switches or Cisco UCS Fabric Interconnects, (unless you have FC only Hosts/Arrays somewhere in the network)

In all my years in IT, this topic may well be the one which contains the most abbreviations, DCB, DCBX, LLDP, PFC, ETS, FIP to name just a few, which I think has led to the perception of complexity, however while there is certainly a lot of clever tech going on “under the hood” the actual configuration and business as usual tasks are actually quite simple.

So with all of the above in mind, Part 2 of this post will cover much of the information you need to know as the “Data Centre Admin” in order to survive in a unified Cisco Nexus Environment.

Posted in General | Tagged , , , , , , , , | Leave a comment

OTV doesn’t kill people, people kill people.

I was designing a Datacentre migration for one of our clients, they have two DC’s 10km apart connected with some dark fibre.

Both DC’s were in the south of England but the client needed to vacate the current buildings and move both Datacentres up north (Circa 300 miles / 500km away) as ever this migration had to be done with minimal disruption, and at no point should the client be without DR. Meaning we couldn’t simply turn 1 off, load it in a truck and drive it up north, then repeat for the other one.

The client also had the requirement to maintain 75% service in the event of a single DC going offline. Their current DC’s were active/active but could support 50% of the load of the other DC if required, meeting this 75% service availability SLA.

Anyway cutting a long story short I proposed that we located a pair of Cisco ASR 1000’s in one of the southern DC’s and a pair in each of the northern DC’s and use Cisco’s Overlay Transport Virtualisation (OTV) to extend the necessary VLANs between all 4 locations for the period of the migration.


As would be expected at this distance, the latency across the MPLS cloud connecting the Southern and Northern data centres (circa 20ms) was too great to vMotion the workloads, but the VMs could be powered off, cold migrated and powered back up again in the north. And doing this intelligently DR could be maintained.

The major challenge was that there were dozens of applications and services within these DC’s some of which were latently sensitive financial applications, along with all the internal fire walling and load balancing that comes along with them.

The client being still pretty much being a Cisco Catalyst house were unfamiliar with newer concepts like Nexus and OTV but immediately saw the benefit to this approach, as it allowed a staged migration and great flexibility, while protecting them from a lot of the issues they were historically vulnerable to, as they had traditionally extended layer 2 natively across the dark fibre between their two southern Data Centres.

Being a new technology to them, the client understandably had concerns about OTV, in particular around the potential for suboptimal traffic flows, which could cause their latency sensitive traffic going on unnecessary “field trips” up and down the country, during the migrationary time period that the North and South DC’s were connected.

I was repeatedly asked to re-assure the client about the maturity of OTV and lost count on how many times I had to whiteboard out the intricacies around how it works, and topics like First Hop Redundancy Protocol isolation and how broadcast and multi-Cast works over OTV.

My main message though being, “forget about OTV, it’s a means to an end. It’s does what it does, and it does it very effectively, however it does not replace your brain, there are lots of other considerations to take into account, all your concerns would be just as valid, if not more so, if I just ran a 500km length of fibre up the country and extended L2 natively, as the client was already doing, already comfortable with, and had accepted the risks associated with doing so.

This concept got the client thinking along the right lines, that while OTV certainly facilitated the migration approach, careful consideration as to what, when, how and the order in which workloads and services were migrated, would be the crucial factor, which actually had nothing to do with OTV at all.

The point being that an intelligent and responsible use of the technology was the critical factor, and not the technology itself.

So just remember OTV doesn’t kill people, people kill people.

Stay safe out there.

Posted in CCIE DC | Tagged , , , , , , , | Leave a comment