The Wayback Machine - https://web.archive.org/web/20091221035434/http://www.multicorepacketprocessing.com:80/

We have released a PR about the official launch of Multicore Packet Processing Forum.

Please read here the PR including quotes from the main contributors to the Forum.

By Kin-Yip Liu, Director, Customer Solutions Architecture at Cavium Networks

Up to only a few years ago, the advancements for networks and networking equipment had been focusing on how to move more data across the networks at much higher line rates. Networks were primarily just pipes for transporting data, and translators between different networking protocols. Today, when we look at the current and next generation networks, we demand intelligence. What does this mean? Why are networks becoming intelligent, and how?

Three key attributes describe intelligent networks: application-aware, content-aware, and secure.

  • Intelligent networks are application-aware. Intelligent networking equipment process and prioritize traffic differently based on application types such as Voice, Video and Data.
  • Intelligent networks are content-aware. Intelligent networking equipment inspect and process the contents of a packet in order to enable the network to apply policies, or prioritization rules, for routing and transformation.
  • Intelligent networks are secure. Intelligent networking equipment secure connectivity and perimeter protection from Layer 3 to Layer 7 using secure protocols, access control, policy enforcement and IDS/IPS techniques.

Why must networks become intelligent? Today, networks carry more than just data, and network management and security is much more sophisticated beyond just opening and closing ports. Many applications use the networks as the media to communicate digitized information with various attributes and priorities. Information like voice and streamed video requires different quality of service than emails or file transfers. Large file transfers should not head-of-line block other real-time traffic. Some information is compressed and/or encrypted. In addition, networks need to be managed. For example, IT can define policies on bandwidth usage based on application types, users, time of the day, geographies, etc, and can enforce these policies on usage of the networks. A service provider can analyze the underlying protocols and application types of the data that move across its networks, and ensure the proper level of quality of service and corresponding service charge is applied. For example, a service provider can detect that certain data correspond to some voice-over-IP application, provides the right level of quality, and charges for it. Policy enforcement also applies from security perspective. For example, can certain user access certain kind of information? Should certain information be blocked from leaving or entering the networks? Networks must become intelligent, and we as users demand it.

In order to be intelligent, networking equipment must expand beyond the traditional layer 2 to layer 4 processing, and must provide high performance and capability for processing up to layer 7.

Networking equipment works with packets. From the perspective of the individual packets level, networking equipment needs to work through potentially tunneled traffic, and reassemble individual IP packets into individual layer 4 flows. The networking equipment then normalizes the flows which may be based on TCP, UDP, or other layer 4 protocols. At this point, the network equipment must continue to process up the protocol layers to detect the protocol and application behind the data. In addition to utilizing information like ports and addresses, this process requires checking the high layer data against known protocol formats and signatures. Moreover, this process may utilize information like bit rates and traffic patterns of the flow being processed, comparing against heuristics of potentially matching protocols and applications.

Once the network equipment detects the underlying application, protocol, and corresponding data being communicated, it can enforce the right polices, provide the right level of quality of services, and perform relevant transformation on the data as required.

Intelligent networking equipment requires high performance, low power, and programmable processors. Developing such processor solutions has been Cavium Networks’ mission and core competency.

By Marc DeVinney, VP of Engineering at Interphase Corporation

Multicore processors have become an essential technology of next-generation telecommunications network infrastructure because of their ability to scale to address the growing requirements for managing wire-speed gigabit processing of packet traffic. Applications driving the use of multicore packet processors include wireless access platforms such as the ASN Gateway in WiMAX networks, GGSN in 3G networks and Packet Access Gateways in LTE-based networks, security appliances, secure traffic processing in IMS nodes such as the x-CSCF functions, VPN and Firewall appliances, Wi-Fi and WiMAX access concentrators and DPI solutions.

GTP is a protocol used for encapsulating packet data for exchange between the SGSN and GGSN elements in 3GPP UMTS WCDMA and GPRS wireless networks and between the RNC and SGSN in the 3GPP UMTS WCDMA wireless network. GTP allows multi-protocol packets to be tunneled through the GPRS Backbone between GPRS Support Nodes over the Gn and Gp interfaces and also between SGSN and UTRAN over the Iu interface (only in the case of UMTS). In the user plane, GTP uses a tunneling mechanism, GTP-U, to provide a service for carrying user data packets. This protocol is used by SGSNs and GGSNs in the UMTS/GPRS Backbone and by Radio Network Controllers in the UTRAN.

Interphase Corporation recently released a GTP-U fast path module for use in GPRS and 3G network elements for very high performance packet data transfer. It was developed in compliance with GPRS and UMTS 3GPP specifications, but enhancements are planned to support extensions for LTE protocol processing between an eNodeB and Packet Data Access Gateway. The GTP-U compliant protocol was implemented as a component module of the 6WINDGate framework and relies upon the IPv4/IPv6 forwarding and L2 fast path modules and the SDS Linux module stack in the 6WINDGate framework to provide a complete GTP-U processing function for use within a network element. GTP tunnel creation, modification, and deletion are supported. A GTP tunnel has two modes: buffering (inactive) and active. All the T-PDUs arriving for encapsulation in a GTP tunnel in inactive mode are buffered (waiting for GTP state modification). GTP tunnel statistics are also supported.

Along with this supported software module, Interphase offers customization and optimization services to provide customers and channel partners market advantages by leveraging turn-key packet accelerator products in the exploding IP-based communications infrastructure market. This GTP example shows that solutions now exist in the multi-core software ecosystem to develop complete solutions to decrease the time-to-market for our customers and reduce the costs to integrate critical building block functions.

By Vincent Jardin – 6WIND CTO

I read here and there that it is not possible to have an efficient portable solution for multicore packet processing. In other words, only the chip-set provider should be able to provide packet processing for the following reasons:

  1. My CPU is too complex, so complex that only me – the CPU vendor – can master it!
  2. No other company can have enough skills to understand how to develop properly on my CPU under my environment,
  3. Networking protocols are themselves so complex and so deeply integrated with the processor that I am the only one to be able to develop such networking protocols,
  4. As the CPU provider delivers excellent performance benchmarks on basic networking protocols, I am the most qualified one to develop complex protocols.

The history of software is full of examples that simply demonstrate these assumptions are not true:

  • How can we get efficient compilers knowing they were not developed from scratch by the CPU vendor?
  • How can we get efficient OS knowing they were not developed by the CPU’s architects?
  • How can we get efficient 3D games that were not developed by the GPU’s architects?

In fact, it is just a standard cycle for new technologies. New technologies need to be integrated in an ecosystem. At the very beginning, it is led by the provider of this technology. Then, when this technology becomes popular, the ecosystem takes the lead. All the software we mentioned here above has been developed thanks to a strong ecosystem sponsored by the CPU vendors simply because the CPU vendor is not able to bear all the costs for software development.

So, why would it be different for multicore and especially for packet processing engines?

At 6WIND, we of course think that developing an efficient portable solution for multicore packet processing is possible… How can it be done knowing the Fast Path part is developed outside the OS?

First of all, the software architecture has to be designed to be portable. An abstraction API layer (we call it FPN – Fast Path Networking SDK ) has to be defined and used by all the Fast Path modules.

Then a dedicated process has to be defined to effectively port the software:

  1. Study the CPU in detail
  2. Do a development breakdown for this CPU
  3. Do the porting phases using the FPN SDK; all the hardware accelerators of this CPU have to be included into the FPN SDK (crypto-engines, HW queues for QoS, inter-core communication)
  4. Validate the porting (refer here under).
  5. Then profile your code (checking all the profiling and debugging counters, count the instructions, check the assembly code from the compilers) and read the CPU specs again with the help of the CPU vendor in order to understand what you observe.
  6. Validate again…

Validation is of course important. A real packet processing engine integrates a very large number of complex protocols. Both protocol behaviour and performance have to be tested as a whole. The best solution is to have a robot that can test periodically all the protocols of your protocol engines.

It is also very important to develop specific tools to speed up the development and validation process. At 6WIND, we developed the Virtual Fast Path concept that runs the Fast Path in the user land on a PC or QEMU environment.

Once a Fast Path module is available under the Virtual Fast Path, the validation robot will automatically check on every platform that it cross compiles and that it runs without any regressions (protocol and performance regressions).

Using this development process, we have successfully ported our software on market-leading multicore platforms. A classical question is about the performance penalty of a portable solution compared to a per-CPU optimized one. Our answer is:

  • There could be performance penalty on very simple protocols,
  • This performance penalty becomes negligible as soon as packet processing integrates more complex protocols because the software architecture itself is more important than low-level optimizations,
  • Portability provides the end user with a much more flexible solution,
  • Compared to low-level optimisations, a generic solution scales better when protocols are stacked because it avoids redesign;
  • It is better to have generic packet processing software from a provider that guarantees evolutions because it is its core business.

By Eric Carmes – 6WIND Founder and CEO

In this Forum, we’ve discussed convergence. The IP convergence is finally on its way for telecoms and the enterprise. Does it mean that convergence is driving telecoms into an IT model? Does it mean that common platforms will be available to address both requirements?

Let us review the different technologies that could be used for these common platforms:

  • 10G Ethernet to transport information
  • Carrier-grade Operating System with Linux as a predominant solution among others
  • Standard hierarchical hardware architectures such as AdvancedTCA or BladeCenter
  • Multicore processor technology to process 10G+ traffic
  • Middleware including, high performance packet processing, networking stacks, HA framework, network management, etc.
  • Virtualization to share cores between packet processing and applications as well as between applications

In theory, it should be easy to build either a router or a server for enterprise or ISP applications using and configuring the right pieces of this puzzle. All the IT giants are following the same strategy – only tactics differs:

  • Cisco is adding AXP (Application Extended Platform) running on Linux in their existing chassis and promoting FCoE (Fibre Channel over IP)
  • Dell is partnering with Juniper to integrate a packet processing oriented OS to address both server and router markets
  • HP’s Procurve Intel-based equipment is targeting the networking market
  • And so on

If we come back to the main topic of the Forum, a single solution for packet processing will not address all the requirements. Standard operating systems that today are used for the server market will not sustain expected performance for traffic. Thus, specific software architecture for packet processing is required to offload it from the OS. According to the level of performance, several implementations of network offload engines can be proposed:

  • Dedicated packet processing front-ends using mezzanine cards in an ATCA or Blade Center chassis – packet processing software will be executed in a lightweight environment to benefit from the performance of multicore processors
  • For less-demanding applications, cores can be shared between packet processing and applications. Virtualization techniques provide solutions to have different execution environments such as a micro-kernel for packet processing and a standard OS for networking middleware and applications

Beyond performance, both solutions provide an interesting migration path to multicore. Existing applications can be reused even if they have not yet been ported to multicore and they can share a single offload engine if load balancing features are provided.

Of course, packet processing is becoming a critical element of the architecture and it should also provide high availability capabilities.