Part 2 of 3 Video Series: RDMA Over Converged Ethernet Version 2 (ROCEv2)
Data Center Architectures Part 2: RDMA as the data center storage fabric
RDMA Over Converged Ethernet, version 2 (ROCEv2) offers the ideal option for boosting data center efficiency, reducing overall complexity, and increasing data delivery performance. Juniper’s Arun Gandhi and Michal Styszynski discuss why Juniper is proposing this new type of architecture for data centers.
You’ll learn
Can ROCEv2 coexist with the other type of flows that are not RDMA?
The benefits Juniper QFX series switches offer specifically for ROCEv2
How to implement RDMA over ROCEv2 as the data center storage fabric
Who is this for?
Host
Guest speakers
Transcript
00:00 [Music]
00:12 thanks for joining me for the second
00:13 session on the dc architectures in the
00:16 first session we discussed ib fabrics
00:19 becoming the de facto standard in many
00:22 tier 2 cloud providers and telco clouds
00:25 and bgp unnumbered specified in rfc 5549
00:30 becoming increasingly popular in the
00:32 data center ib fabric ecosystem
00:35 we also compared bgpr numbered with ebgp
00:39 and discussed how it's easier to enable
00:41 than the igps
00:43 today i'd like to pivot our discussion
00:45 to the storage fabrics as we all know in
00:48 today's world of ever increasing data
00:51 the speed at which this this data gets
00:54 transferred quickly and reliably is
00:57 critical to the effective use of the
00:59 information
01:00 interconnect based on remote direct
01:03 memory access or rdma
01:06 offers the ideal option for boosting
01:08 data center efficiency reducing overall
01:11 complexity and increasing data delivery
01:14 performance
01:16 i have the pleasure of chatting with
01:17 michael here again
01:19 and to shed more light on the
01:21 technologies used in the data centers
01:23 michael thanks for joining me once again
01:26 hi aaron thanks for the invitation bye
01:28 everybody
01:30 so michael the first question i have for
01:32 you is
01:33 we recently introduced
01:36 remote direct memory access or over
01:39 converged ethernet version 2 or rocky v2
01:42 as we call it
01:44 so with this are we proposing new type
01:47 of architecture for the data centers so
01:49 arun as a matter of fact in case of
01:51 rocky v2 version 2 of the standard we
01:54 can use a uh
01:56 the ip fabrics three stage five stage
01:59 architecture same as for any compute
02:01 node traffic right
02:03 um
02:04 the thing that is specific for rocky v2
02:06 is that
02:07 we need to make sure that the ip fabric
02:09 infrastructure is not introducing any uh
02:13 situations where the storage traffics
02:15 are lost right in the middle of the
02:17 fabric so on top of the three stage five
02:20 stage type of architectures we're adding
02:22 specific uh capabilities on our switches
02:26 in order to make sure that the traffic
02:27 is always delivered without any frame
02:29 loss and is delivered at the very low
02:31 latency right
02:33 so outside of the traditional
02:35 architectures with three stage five
02:36 stage it's worth to mention that uh in
02:39 case of rocky v2 these architectures are
02:41 just making perfect sense because uh the
02:44 the volumes of data inside the data
02:47 center ecosystem are just growing right
02:49 they're growing every month right so in
02:52 case of a a an ip fabric architecture we
02:55 can just increase
02:57 uh the um the bandwidth of the fabric in
03:00 much easier way comparing to uh to the
03:02 traditional architectures and then last
03:05 point regarding rocky v2 type of
03:07 architectures outside of the uh
03:10 three-stage fights the hip fabric
03:11 architectures is it's just the
03:13 back-to-back collapsed use case where
03:16 rocky v2 is just running from the smart
03:18 sneak to the top of the rack and then
03:20 when needed to communicate with any
03:22 other smart sneak it just uses the
03:24 back-to-back link right so for smaller
03:27 very small infrastructures for example
03:29 in case of edge computing we can just
03:32 use two nodes and then connect back to
03:34 back and run the
03:35 rocket v2 uh specific features between
03:38 the two switches and obviously have the
03:40 rocket v2 stack on the smart links right
03:43 so we are pretty much consistent with
03:45 the industry standards for ip class
03:47 fabrics but additionally we have these
03:50 you know smaller use cases with
03:52 collapsed spines
03:54 awesome
03:55 awesome this is so this is great
03:58 the other question i have is so can
04:00 rocky be to coexist with the other type
04:02 of flows that are not rdma
04:06 exactly so on on these three stage five
04:08 stage or collapsed architectures we can
04:10 run a regular uh
04:12 compute traffic right to our customers
04:14 right so if the customer is needs some
04:17 type of traffic it can still access the
04:19 same infrastructure
04:21 the only thing is that we need to make
04:22 sure that uh the the rocky v2 the rdma
04:26 traffic gets some uh privilege comparing
04:28 to the other type of traffic which
04:30 potentially uses the tcp right in case
04:33 of rocket v2 we use udp and so we need
04:35 to make sure that this traffic is always
04:38 correctly leveraged on the ip fabric
04:41 right so uh so from what i understand
04:44 then michael the infinity band inside
04:47 the rocky v2 frame is behind the udp
04:50 so this traffic should be taken as a
04:53 regular traffic for ib
04:54 fabric but are there any other benefits
04:57 that the juniper qfx 5220 switches offer
05:01 uh
05:02 specific to rocky v2 a good point aaron
05:05 so as a matter of fact outside of the
05:07 performances right of the of the 50 to
05:09 20s right in terms of low latency very
05:12 important to to make sure that we are
05:14 writing a very fast
05:17 on the remote and hosts
05:19 we add two uh principal ingredients that
05:22 are uh
05:23 defined in the annex 17 of the standard
05:26 which are the explicit congestion
05:28 notification and priority flow control
05:31 on the ipdscp
05:33 right
05:34 so when it comes to the explicit
05:35 congestion notification
05:38 the switch such as the 5220 you
05:40 mentioned can actually send
05:42 continue to send these rocky v2 frames
05:45 and inform the
05:46 destination host
05:49 nik card that there are some congestions
05:51 right in case there are some congestions
05:53 obviously and so it means that the end
05:55 host that receives the traffic right
05:58 will take an action and will send back
06:00 the information to the originator of the
06:02 traffic which is another smart nic card
06:05 on the server
06:06 and so it means that the originator will
06:08 get the information about the congestion
06:11 and will just slow down a little bit to
06:13 to reduce the rates that are that were
06:15 initially used in order to write for
06:18 example some data directly on the memory
06:21 of the server right
06:23 and then second mechanism is is the
06:26 priority flow control so comparing to
06:28 the legacy priority flow control which
06:31 was used entirely on the l2 right on the
06:34 on the
06:35 dot one p from the quas point of view in
06:39 case of rocky v2 we are actually using
06:41 priority flow control on ipd scp
06:45 so these two mechanisms
06:47 first the explicit congestion
06:49 notification and then the second one
06:51 which actually acts on every segment of
06:54 the architecture
06:56 are two important features in order to
06:58 make sure that the different type of
07:00 traffic could exist and we always give
07:03 the priority to the to the rdma traffic
07:06 in order to write properly the data or
07:09 read properly the data from the server
07:11 excellent
07:12 thank you thank you michael for
07:14 very interesting discussion on rocky
07:16 view it sure has
07:18 has been a phenomenal session for me
07:20 i've been learning a lot
07:22 and i'm sure our viewers too are uh too
07:25 so for all the viewers uh please stay
07:27 tuned for our next video
07:29 thank you
07:31 [Music]