20Gb + 20Gb = 10Gb? UCS M3 Blade I/O Explained

There comes a time, when if I have to answer the same question a certain number of times, I think “this obviously requires a blog post”, so I can just tell the next person who asks to go and read it.

This is such a question.

“Ok so I have a VIC 1240 mLOM on my M3 Blade which gives me 20Gb of Bandwidth per Fabric, Correct?

Correct!

Cool, I also have a 2204XP IO module that gives me 20Gb of Bandwidth per Fabric to each of my blade slots, Correct?

Correct!

Fantastic, so if I use one with the other I get 20Gb of I/O per Fabric per Blade, Correct?

Wrong!

Huh?

Ok lets grab a white board marker and lets go!

I can really understand the confusion around this, because at first the above logic makes perfect sense, it’s only when you open the UCS Kimono that you see the reason for this behaviour.

So as we all know the M3 blades give us a nice Modular LAN on Motherboard (mLOM) which is a VIC 1240, this gives us 2 x 10Gb Traces (KR Ports) to each IO Module.

We also have a spare Mezzanine adapter slot which can be used for a Port-Expander (to effectively turn the VIC1240 into a VIC1280, or can be used for any other compatible UCS I/O Mezzanine card or an I/O Flash module like the IO Drive 2 from Fusion I/O.

This Mezzanine slot also provides 2 x 10Gb Traces (KR Ports) to each IO Module.

Ok, now the “issue” is that the ports of the I/O Module alternate between the on board VIC1240 and the Mezzanine Slot, So to use a Blade in Slot 1 as an example with a 2204XP I/O module. I/O  module backplane port 1 goes to the VIC1240, and Port 2 on the I/O module goes to the Mez slot. This is why you only get 10Gb of usable I/O with this combination.

Not sure why Cisco did not trace I/O Module ports 1 and 2 to the mLOM and 3 and 4 to the Mez, I guess the way they have done it allows you to always have access to the Mez slot even if using a 2204XP I/O module. (as mentioned above the Mez slot can be used for other cards not just CNA’s)

So as you can see, when using a 2204XP and the VIC1240 with no Mez adapter, only one of the two 10Gb traces actually matches up. (See Below) 

B200M3 VIC 1240, No MEZ, 2202XP

B200M3 VIC 1240, No MEZ, 2202XP

OK, so how do you get your extra bandwidth well one of two ways, either add a mezzanine adapter, or use the 2208XP IO Module or Both.

If you were using a 2208XP I/O module with your VIC 1240. Backplane port 1 on the I/O Module goes to the VIC1240, port 2 on the I/O module goes to the Mez slot, Port 3 on the I/O Module goes to the VIC1240 and port 4 on the I/O Module goes to the Mez. So as you can see, this comdination does give you the two 10Gb traces to your VIC1240.

2 B200M3 VIC1240 no Mez 2208

The other combinations of modules and resulting bandwidth are explained below.

For clarity only the backplane ports of the I/O Module that map to Blade slot 1 are shown.

2204XP Combinations

3 B200M3 with 2204XP

2208XP Combinations

4 B200M3 with 2208XP

2208XP Combinations

Note that while resulting bandwidth may be the same with certain combinations, the hardware based port-channels are different. Obviously the more ports in the same port-channel will make traffic distribution more efficient.

Also bear in mind that when using the port-expander UCS Manager sees the VIC1240 and the port-expander as a single VIC.

If a VIC1280 is used in conjunction with the VIC1240 they are completely independant adapters from each other, and are treated as such by UCS Manager.

 As ever comments most welcome.

About these ads

About ucsguru

Principal Consultant and Cisco UCS Subject Matter Expert
This entry was posted in General and tagged , , , , , , , , , , , . Bookmark the permalink.

26 Responses to 20Gb + 20Gb = 10Gb? UCS M3 Blade I/O Explained

  1. Renato Nodarse says:

    this worries me as i have today B200M2 and B230M2 with 2204XP with a 2-link chassy discovery…(FI had limited license ports so i couldn’t off the bat take advantage of the full 4-link). My setup is 2 chassis, 2 FI 6120s, 2x2204XP on each chassis, 4 B200M2 and 4 B230M2, but soon i will need to add more blades and i doubt i can get them with the Cisco Palo M81KR. Based on your post i will not be taking advantage then if i bump up the licenses of the FI and turn into a 4-link ?

    • ucsguru says:

      Hi Renato

      This post certainly shouldn’t worry you, just make you aware of exactly how much potential bandwidth you will have given a certain combination of VIC’s, Mez’s and I/O Modules.

      Also bear in mind with this I am talking about the usable bandwidth to the Blade NOT between the FI and the I/O Module which is a completely different conversation.

      The main consideration with your FI to IOM links would be that you do not have an immediate bottleneck from the server to the FI.

      When using one of the higher bandwidth VIC adapters you should be port-channeling your FI to IOM links (Not the default setting) to at least the usable bandwidth of your VIC. So in your case as you have the 2204XP I/O modules the most you can get out of any of your servers will be 20Gb per fabric (which would require the VIC1280 in your M2 Blades) with your M81KRs you obvisously only still get 10Gbs per fabric. (I say only this is still way more than enough for most use cases)

      You are right in saying that your new blades likley will not come with M81KRs ( The VIC1280 is the same cost so why would you want them anyway), you will still have a choice of certain M2 Servers and M3 Servers and this is where you have to choose the model that most suits your needs.

      If you go M3 then this post will help you understand that if you want access to the full 20Gb of I/O per server, per fabric possible with your 2204XP I/O Modules then you should also order the port-expander Mez card.

      From experience your 2 links between your FEX and FI will likley be sufficient, if you channel them you will have a single 20Gb pipe per fabric that all the blades in the chassis will map to. And when your new blades come with the VIC1280 or VIC1240 with the Port-Expander your 20Gbs per fabric to each blade will have the potential (al-be-it shared) use of the 20Gb of I/O between the IOM and the FI.

      Hope this all makes sense, if not fire back.
      Colin

  2. Richard says:

    Would I be correct in saying that if you where going for ASIC redundancy, aside from bandwidth, you would need the 1240 and 1280? Or are the port groups each individual adapter resilient?

    • ucsguru says:

      Hi Richard
      Great question, and one I did consider covering in the main post, but thought it would detract from the main point, so really glad you asked it here.
      You are right in the fact that if you want full VIC redundantcy yes you will need two VICs which are capable of operating without the other. A port-expander for example will not function if its VIC1240 fails.

      With regards to port group reduntantcy, I would be very very surprised if the loss of that port groups ASICs had any effect on the other port group. (Perhaps one of my Cisco employee readers can confirm)

      My View of this is this, I have now installed what must be (quick calculation) over 1000 VICs over the last 4 years and have never had one fail,(maybe I’m just lucky) Also Redundantcy is generally inforce at a highler layer anyway (VMware HA/FT) or Clustering for Bare Metal workloads etc..

      But if having Mezzanine reduntantcy within a Blade is a requirement in your environment then go for it, Thats one of the reasons the option is there. You just have to bare in mind there would be a bit of additional planing to do. i.e. create two vNICs use a placement policy to put one on the VIC 1240 and one on the VIC1280 (different vCons) and then Team them at the operating system level.

      Thanks again for the question.
      Colin

  3. tony says:

    Hi

    I have my FIs connected directly to the core nexus 7k. In addition, we have an iscsi appliance that is alo connected to the 7k. In order for ucs servers to use jumbo frames to the appliance, we have enabled jumbo frame mtu 9216 end to end

    on the ucs, we have defined the mtu on the iscsi vnics to 9216
    we have a qos policy for iscsi that is set at platinum with a cos of 5

    on the 7k, we have mtu 9216 on the iscsi appliance ports as well as the FI ports.

    Do I need to setup qos policies on the 7k for return traffic?

    thanks

    • ucsguru says:

      Hi Tony As I’m sure you know setiting the MTU size on your cos 5 traffic on the UCS only effects egress traffic and not return traffic. So you would need to ensure that traffic from your iSCSI Applicance has the mtu set and supported all the way to the UCS and marked with cos 5 )alternativley you could set the mtu on the UCS best effort class to 9216 which would then match return all return traffic regardless of cos value.

      I always use the old ping -l (mtu size) -f (Don’t Fragment) to confirm my mtu is fully supported end to end. i.e from the UCS Blade ping (address of iSCSI Appliance) -l 9000 -f

      If you get a reply your good, if you get a message saying somthing like “Packet Needs to be fragmented but DF set” then you know somthing in the path is not configured for jumbo frames correctly.

      Regards
      Colin

  4. danno81 says:

    Hi,

    anyone had oversubscription issues so far with 2204XP or 2208XP? In the end there is ja 4:1 oversubscription with backplane to fabric ports (16:4 or 32:8). Most of my customers don’t have such immense throughput requirements – but im curious of someone had seen such scenarios.

    • ucsguru says:

      Hi Dan
      I Certainly haven’t, most of my clients don’t touch the sides of a 2204XP yet let alone a 2208XP, even in a fully populated Chassis.
      Colin

  5. Kevin says:

    Great detail on this, thanks, but it doesn’t explain why port 1 can’t talk to port 32 on the internal connections of the FEX. By design, it seems like they should. Also, how is the traffic handled within the FEX. If 32 internal ports are connected and only port 1 and 4 externally are connected, is there any control, or is is like pumping 32 gallons of water through 2 spigots?

    • ucsguru says:

      Hi Kevin
      By design the FEX cannot switch traffic within it. The fraffic has to go up to the Fabric Interconnect (Controlling Bridge) to be switched at layer 2.
      (This is the way FEX Technology aka 802.1Qbh now called 802.1BR)

      So If a Blade in Slot 1 (using HIF Port 1 on the FEX) needs to have an East/West conversation with a Blade in Slot 8 (using HIF Port 32 on the FEX) then that is fine but would be switched at L2 within the FI (If same VLAN and Same Fabric) or be sent to the upstream LAN switch to be routed (if this is a L3 conversation) or L2 Switched across fabrics withion the same VLAN.

      Hope that makes sense.

      Re the second part of your question it is a simple case of contention ratios, divide the number of Host Interfaces (HIFs) you are using by the number of Network Interface (NIFs) you are using and you have your Maximum contention ratio, so if you have a Chassis full of 8 Blades all with VIC1280’s or VIC1240’s with Port-Expanders and you are using the 2208XP FEX and are using all 8 Network Interfaces then, you are correct you would have 32 x 10Gb ports all using the 8 x 10Gb uplinks (i.e. a 4:1 ratio)

      The reality though is that not all 32Gb ports will be running full line rate at the same time or even close to it, so a 4:1 or even an 8:1 will generally not be a cause of contention.

      Regards
      Colin

  6. Dom says:

    Thanks Colin, another very useful post

  7. linux vps says:

    I was wondering if you ever thought of changing the page layout of your blog? Its very well written; I love what youve got to say. But maybe you could a little more in the way of content so people could connect with it better. Youve got an awful lot of text for only having one or two pictures. Maybe you could space it out better?|

    • ucsguru says:

      Hi Thanks for the comment
      And yes, expecially the “Ask the Guru” section which has developed into a bit of a Monster :-)
      But as ever, it’s having the time, (and perhaps skill :-) I spend an hour or so a day answering questions, after my day job.
      Regards
      Colin

  8. Jason says:

    Are there a minimum number of required vnics / vhbas to take advantage of any given configuration? For instance; if I have 20Gb of bandwidth per fabric do I need 4 vnics to take advantage of the bandwidth (two to each fabric)?

    • ucsguru says:

      Hi Jason
      Each vNIC you create appears to the O/S as 1 10Gbs NIC, and Each vHBA appears at a 8Gbs HBA (Uses upto 10Gbs though if using FCoE)

      So you certainly would need to create a vNIC on Fabric A and a vNIC on Fabric B and Team them in the O/S Active/Active to have 20Gbs of available bandwidth, while at the same time having protection for a Fabric failure.

      Now back when I did some testing on this (about 12months ago), When I teamed 2 vNICs (regardless of which Fabric I put them on) I did indeed manage to see throughputs in the range of 17Gbs. (I’ll try and dig out my jperf screenshots) but when I added a 3rd vNIC it didn’t give me much extra perhaps another 1-2Gbs. Although I think this may have been more down to my ability to generate enough traffic perhaps.

      In general 10Gbs is more than enough for most of my use cases, a Typical ESXi Setup for example would be.

      1 Pair of vNICS for MGMT (Active/Standby) 10Gbs usable
      1 vNIC with Fabric Failover for vMotion 10Gbs usable
      1 Pair of vNICs for vDS/N1kv Uplinks Active/Active 20Gbs usable

      Regards
      Colin

      • Jason says:

        Thank you for the reply and this blog! I have a follow up question specifically on network side of bandwidth. I understand you need to team at least one vnic on the A side and one vnic on the B side to achieve 20Gb. If I have a hardware configuration that is capable of 20Gb per side (a 2208XP per side and 1240 w/ extender, etc) do I need 4 vnics teamed together to get to 40Gb total?

        I inherited a domain that has a b440 with dual 1280s but the chassis only has 2204XPs. I believe there is 40Gb bandwidth per side available in this configuration because of the full width blade. Is this correct? Someone created 4 vnics (2 on A, 2 on B, and all teamed together) to get 40Gb of network bandwidth total. Is this a correct setup or could the same 40Gb bandwidth be achieved with 2 vnics (1 on A, 1 on B, and teamed together)? I guess my specific question is: are the 10Gb vmnics within an ESXi host limited to 10Gb or since they are logical nics to begin with are they actually capable of greater bandwidth?

        I haven’t seen usage yet to justify this setup but it’s what I inherited…

      • ucsguru says:

        Hi Jason

        A B440 with 2 x VIC1280s, used in conjuction with a 2204XP will give you 20Gbs per server, per fabric, per Adapter. So with the 2 Adapters you will get 40Gb per server, per Fabric. i.e. 80Gbs Per Server Total.

        As you know when a VIC1280 is used in conjuction with a 2204XP, that lights up 2 of the 4 10Gb lanes (These 2 x 10Gb lanes are port-channeled in Hardware into a “uif” port) the vNIC just sees the logical channel so will show up as a 20Gbs NIC to the OS.

        But remember that any one FLOW is limited to 10Gbs, each FLOW from the vNIC will be distributed across this hardware port-channel but has to land on a single 10Gb trace, hense any single FLOW is limited to 10Gbs, but as there will hopefully be numberous flows from the vNIC, they will be distributed over the 2 x 10Gbs lanes.

        So in answer to your question in this setup a single vNIC should be capable of using all available bandwidth as long as your traffic is equally hashable across both traces. (vMotion for example would be considered a single flow and therefore not “sub-hashable”)

        Remember you have the above “twice” as you have 2 adapters, i.e 40Gbs available per adpater, 80Gbs per server (using your 2204XP, using a 2208XP would of course double that)

        so if you have 2 vNICs on your first VIC1280 ( 1 on A and 1 on B) and 2 vNICs on your second VIC1280 ( 1 on A and 1 on B) you have theoretical access to your full 80Gbs of bandwidth. But as mentioned it depends on how “Hashable” your traffic is whether you will actually get that throughput. You may also want to confirm they are indeed split across both your Adapters (different vconns)

        You can obviously use a load tool like iperf/jperf etc.. to test and confirm this.

        Or if you want to see what your actual load is with your live production flows you can attach directly to the adapters and see the load on each of these port-channels between the adapter and the Host interface (hif) on the FEX. These port-channeled 10Gb traces between the adapter and IOM on each Fabric are referred to as uifs (uif 0 goes to IOM 1 and uif 1 goes to IOM 2).

        Anyway perhaps getting a bit too deep now, I have been contemplating whether or not to do a comprehensive blog post on how all the above works and how to confirm and test each logical and physical element in the chain.

        I think now is probably the time.

        Anyway hope that clears everything up for you.

        Regards
        Colin

  9. John says:

    Hi Colin,
    First thanks for taking the time to create all this material. I’ve been working with a setup recently where each blade has both the VIC 1240 and 1280, with 2x 2208s and 6248up, with 8 links to each, and I’ve been trying to find the best config for max throughput.
    Right now, there are 8vNICs per blade that are spread across the fabrics and adapters, but It seems like I’m only saturating one of the 10Gb links per port channel (even when running netperf with multiple threads, multiple IPs).

    In other articles I have read, you discuss that each mezz card (or mLOM) shows up as one vCon, however each blade here shows 4 (and I assume that this is because there are 4x 20Gb port groups that actually exist here). In the graphics above, you reference that the traces coming out of the IOM don’t go sequentially to each interface, and instead go:
    Trace | VIC
    1 | 1240
    2 | 1280
    3 | 1240
    4 | 1280

    Is this also true for the 4 vCons that I’m seeing?

    To date I had been placing my 4 vNICS for Fab-A on vCon 1 & 2, and the 4 for Fab-B on vCon 3&4, and I’m able to peak just over 40Gbps unidirectional between blades, but I understand that I should be able to get so much more, I just can’t figure out how!

    Am I wrong with my assumption on the vCon placement? or should I be neglecting 3 & 4 altogether?

    Thanks,

    John

    • ucsguru says:

      Hi John

      Thanks for the question.

      The vCons do not relate to the traces but rather are assigned to the physical Adapters in the server.

      vCons will be assigned differently depending on how many Adapters you have in your server.

      By default (Round Robin)
      If you have 1 Adapter card then vCons 1-4 will all be assigned to that single adapter.
      If you have 2 Adapters then vCon 1&3 will be assigned to Adapter 1 and vCon 2&4 will be assigned to Adapter 2.
      If you have 3 Adapters then vCon1 is assigned to Adapter 1, vCon2 & vCon4 are assigned to Adapter 2 and vCon3 gets assigned to Adapter 3.

      You can change the above behaviour from Round Robin to Linear ordered in which case vCons are assigned in order I.e.

      1 Adapter vCons 1-4
      2 Adapters vCon 1&2 to Adapter1 and vCon 3&4 to Adapter2
      3 Adapters vCon 1 to Adapter1′ vCon 2 to Adapter2 and vCon 3&4 to Adapter3

      If you are maxing out at 10Gb on a vNIC that is functioning at 20Gb then I would confirm that you have your FI to IOM links configured as a port-channel (Not default) in the Global Policy.

      Without it a Single vNIC will get pinned to a single FI to IOM and thereby limited to 10Gb, also remember that once these links are port-channeling then each flow from the vNIC can be distributed over different links within the port-channel but each FLOW is limited to 10Gb, so you are correct to ensure your netperf if set for multiple flows.

      Hope that clears things up for you.
      Regards
      Colin

  10. Mike says:

    Hi Colin,
    Excellent explanation, and very easy to understand. Can you explain to this newbie how you were able to determine this? Maybe I need to look into the configuration guide?

  11. Brian K. says:

    Good article on back plane ports. What about frontplane, for example with a 2208 using four connections to a single FI, would you just use ports 1-4, or should you connect ports 1-2, and 5-6 ? Does it make any difference in redundancy or performance? TY.

  12. Tony says:

    Hello

    What happens if I use only 2 of the physical uplinks from the 2208xp iom to the fabric interconnect instead of 4 ? I’m using b200 m3 with the Vic 1240 only

    Will I have less bandwidth for my blades?

  13. Tony says:

    Hi

    So if I use a Vic 1240 with no mezzanine card. That will give me 40gb to my blade right? 20gb to fab A and 20gb to fabB.

    But is this 40gb actually active or is 20gb active and 20gb passive?

    I have the fis connected to 2 nexus 7k is a vpc

    According to the documentation for vpc it should be all active right?

    How will I see 40gb on a windows 2012 os?

    Will Each nic to each fab will be 20gb? And if I team ,2 nics(fab a and fab b) that will be 40 gb in the os level for each pair of vnics?

    Thank you

    • ucsguru says:

      Hi Tony
      Both fabrics are Active/Active if you want you can have a vNIC in Fab A and one in Fab B and team them at OS Level.

      If you have a VIC1240 with no MEZ then using a 2204XP will give you 10Gb per fabric, or if using the 2208XP will give you the 20Gb per fabric. (See my previous post entitled “20 + 20 = 10 M3 IO explained” as to why this is.

      To see a single 40Gb NIC in the OS would require a 2208XP with a VIC1240 with Port expander.

      Regards
      Colin

  14. Pingback: Cisco UCS Mini Bundles | Justin's IT Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s