772 lines
30 KiB
ReStructuredText
772 lines
30 KiB
ReStructuredText
|
.. SPDX-License-Identifier: GPL-2.0+
|
|||
|
|
|||
|
=================================================================
|
|||
|
Linux Base Driver for the Intel(R) Ethernet Controller 700 Series
|
|||
|
=================================================================
|
|||
|
|
|||
|
Intel 40 Gigabit Linux driver.
|
|||
|
Copyright(c) 1999-2018 Intel Corporation.
|
|||
|
|
|||
|
Contents
|
|||
|
========
|
|||
|
|
|||
|
- Overview
|
|||
|
- Identifying Your Adapter
|
|||
|
- Intel(R) Ethernet Flow Director
|
|||
|
- Additional Configurations
|
|||
|
- Known Issues
|
|||
|
- Support
|
|||
|
|
|||
|
|
|||
|
Driver information can be obtained using ethtool, lspci, and ifconfig.
|
|||
|
Instructions on updating ethtool can be found in the section Additional
|
|||
|
Configurations later in this document.
|
|||
|
|
|||
|
For questions related to hardware requirements, refer to the documentation
|
|||
|
supplied with your Intel adapter. All hardware requirements listed apply to use
|
|||
|
with Linux.
|
|||
|
|
|||
|
|
|||
|
Identifying Your Adapter
|
|||
|
========================
|
|||
|
The driver is compatible with devices based on the following:
|
|||
|
|
|||
|
* Intel(R) Ethernet Controller X710
|
|||
|
* Intel(R) Ethernet Controller XL710
|
|||
|
* Intel(R) Ethernet Network Connection X722
|
|||
|
* Intel(R) Ethernet Controller XXV710
|
|||
|
|
|||
|
For the best performance, make sure the latest NVM/FW is installed on your
|
|||
|
device.
|
|||
|
|
|||
|
For information on how to identify your adapter, and for the latest NVM/FW
|
|||
|
images and Intel network drivers, refer to the Intel Support website:
|
|||
|
https://www.intel.com/support
|
|||
|
|
|||
|
SFP+ and QSFP+ Devices
|
|||
|
----------------------
|
|||
|
For information about supported media, refer to this document:
|
|||
|
https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf
|
|||
|
|
|||
|
NOTE: Some adapters based on the Intel(R) Ethernet Controller 700 Series only
|
|||
|
support Intel Ethernet Optics modules. On these adapters, other modules are not
|
|||
|
supported and will not function. In all cases Intel recommends using Intel
|
|||
|
Ethernet Optics; other modules may function but are not validated by Intel.
|
|||
|
Contact Intel for supported media types.
|
|||
|
|
|||
|
NOTE: For connections based on Intel(R) Ethernet Controller 700 Series, support
|
|||
|
is dependent on your system board. Please see your vendor for details.
|
|||
|
|
|||
|
NOTE: In systems that do not have adequate airflow to cool the adapter and
|
|||
|
optical modules, you must use high temperature optical modules.
|
|||
|
|
|||
|
Virtual Functions (VFs)
|
|||
|
-----------------------
|
|||
|
Use sysfs to enable VFs. For example::
|
|||
|
|
|||
|
#echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs #enable VFs
|
|||
|
#echo 0 > /sys/class/net/$dev/device/sriov_numvfs #disable VFs
|
|||
|
|
|||
|
For example, the following instructions will configure PF eth0 and the first VF
|
|||
|
on VLAN 10::
|
|||
|
|
|||
|
$ ip link set dev eth0 vf 0 vlan 10
|
|||
|
|
|||
|
VLAN Tag Packet Steering
|
|||
|
------------------------
|
|||
|
Allows you to send all packets with a specific VLAN tag to a particular SR-IOV
|
|||
|
virtual function (VF). Further, this feature allows you to designate a
|
|||
|
particular VF as trusted, and allows that trusted VF to request selective
|
|||
|
promiscuous mode on the Physical Function (PF).
|
|||
|
|
|||
|
To set a VF as trusted or untrusted, enter the following command in the
|
|||
|
Hypervisor::
|
|||
|
|
|||
|
# ip link set dev eth0 vf 1 trust [on|off]
|
|||
|
|
|||
|
Once the VF is designated as trusted, use the following commands in the VM to
|
|||
|
set the VF to promiscuous mode.
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
For promiscuous all:
|
|||
|
#ip link set eth2 promisc on
|
|||
|
Where eth2 is a VF interface in the VM
|
|||
|
|
|||
|
For promiscuous Multicast:
|
|||
|
#ip link set eth2 allmulticast on
|
|||
|
Where eth2 is a VF interface in the VM
|
|||
|
|
|||
|
NOTE: By default, the ethtool priv-flag vf-true-promisc-support is set to
|
|||
|
"off",meaning that promiscuous mode for the VF will be limited. To set the
|
|||
|
promiscuous mode for the VF to true promiscuous and allow the VF to see all
|
|||
|
ingress traffic, use the following command::
|
|||
|
|
|||
|
#ethtool -set-priv-flags p261p1 vf-true-promisc-support on
|
|||
|
|
|||
|
The vf-true-promisc-support priv-flag does not enable promiscuous mode; rather,
|
|||
|
it designates which type of promiscuous mode (limited or true) you will get
|
|||
|
when you enable promiscuous mode using the ip link commands above. Note that
|
|||
|
this is a global setting that affects the entire device. However,the
|
|||
|
vf-true-promisc-support priv-flag is only exposed to the first PF of the
|
|||
|
device. The PF remains in limited promiscuous mode (unless it is in MFP mode)
|
|||
|
regardless of the vf-true-promisc-support setting.
|
|||
|
|
|||
|
Now add a VLAN interface on the VF interface::
|
|||
|
|
|||
|
#ip link add link eth2 name eth2.100 type vlan id 100
|
|||
|
|
|||
|
Note that the order in which you set the VF to promiscuous mode and add the
|
|||
|
VLAN interface does not matter (you can do either first). The end result in
|
|||
|
this example is that the VF will get all traffic that is tagged with VLAN 100.
|
|||
|
|
|||
|
Intel(R) Ethernet Flow Director
|
|||
|
-------------------------------
|
|||
|
The Intel Ethernet Flow Director performs the following tasks:
|
|||
|
|
|||
|
- Directs receive packets according to their flows to different queues.
|
|||
|
- Enables tight control on routing a flow in the platform.
|
|||
|
- Matches flows and CPU cores for flow affinity.
|
|||
|
- Supports multiple parameters for flexible flow classification and load
|
|||
|
balancing (in SFP mode only).
|
|||
|
|
|||
|
NOTE: The Linux i40e driver supports the following flow types: IPv4, TCPv4, and
|
|||
|
UDPv4. For a given flow type, it supports valid combinations of IP addresses
|
|||
|
(source or destination) and UDP/TCP ports (source and destination). For
|
|||
|
example, you can supply only a source IP address, a source IP address and a
|
|||
|
destination port, or any combination of one or more of these four parameters.
|
|||
|
|
|||
|
NOTE: The Linux i40e driver allows you to filter traffic based on a
|
|||
|
user-defined flexible two-byte pattern and offset by using the ethtool user-def
|
|||
|
and mask fields. Only L3 and L4 flow types are supported for user-defined
|
|||
|
flexible filters. For a given flow type, you must clear all Intel Ethernet Flow
|
|||
|
Director filters before changing the input set (for that flow type).
|
|||
|
|
|||
|
To enable or disable the Intel Ethernet Flow Director::
|
|||
|
|
|||
|
# ethtool -K ethX ntuple <on|off>
|
|||
|
|
|||
|
When disabling ntuple filters, all the user programmed filters are flushed from
|
|||
|
the driver cache and hardware. All needed filters must be re-added when ntuple
|
|||
|
is re-enabled.
|
|||
|
|
|||
|
To add a filter that directs packet to queue 2, use -U or -N switch::
|
|||
|
|
|||
|
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
|
|||
|
192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
|
|||
|
|
|||
|
To set a filter using only the source and destination IP address::
|
|||
|
|
|||
|
# ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
|
|||
|
192.168.10.2 action 2 [loc 1]
|
|||
|
|
|||
|
To see the list of filters currently present::
|
|||
|
|
|||
|
# ethtool <-u|-n> ethX
|
|||
|
|
|||
|
Application Targeted Routing (ATR) Perfect Filters
|
|||
|
--------------------------------------------------
|
|||
|
ATR is enabled by default when the kernel is in multiple transmit queue mode.
|
|||
|
An ATR Intel Ethernet Flow Director filter rule is added when a TCP-IP flow
|
|||
|
starts and is deleted when the flow ends. When a TCP-IP Intel Ethernet Flow
|
|||
|
Director rule is added from ethtool (Sideband filter), ATR is turned off by the
|
|||
|
driver. To re-enable ATR, the sideband can be disabled with the ethtool -K
|
|||
|
option. For example::
|
|||
|
|
|||
|
ethtool –K [adapter] ntuple [off|on]
|
|||
|
|
|||
|
If sideband is re-enabled after ATR is re-enabled, ATR remains enabled until a
|
|||
|
TCP-IP flow is added. When all TCP-IP sideband rules are deleted, ATR is
|
|||
|
automatically re-enabled.
|
|||
|
|
|||
|
Packets that match the ATR rules are counted in fdir_atr_match stats in
|
|||
|
ethtool, which also can be used to verify whether ATR rules still exist.
|
|||
|
|
|||
|
Sideband Perfect Filters
|
|||
|
------------------------
|
|||
|
Sideband Perfect Filters are used to direct traffic that matches specified
|
|||
|
characteristics. They are enabled through ethtool's ntuple interface. To add a
|
|||
|
new filter use the following command::
|
|||
|
|
|||
|
ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \
|
|||
|
dst-port <port> action <queue>
|
|||
|
|
|||
|
Where:
|
|||
|
<device> - the ethernet device to program
|
|||
|
<type> - can be ip4, tcp4, udp4, or sctp4
|
|||
|
<ip> - the ip address to match on
|
|||
|
<port> - the port number to match on
|
|||
|
<queue> - the queue to direct traffic towards (-1 discards matching traffic)
|
|||
|
|
|||
|
Use the following command to display all of the active filters::
|
|||
|
|
|||
|
ethtool -u <device>
|
|||
|
|
|||
|
Use the following command to delete a filter::
|
|||
|
|
|||
|
ethtool -U <device> delete <N>
|
|||
|
|
|||
|
Where <N> is the filter id displayed when printing all the active filters, and
|
|||
|
may also have been specified using "loc <N>" when adding the filter.
|
|||
|
|
|||
|
The following example matches TCP traffic sent from 192.168.0.1, port 5300,
|
|||
|
directed to 192.168.0.5, port 80, and sends it to queue 7::
|
|||
|
|
|||
|
ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
|
|||
|
src-port 5300 dst-port 80 action 7
|
|||
|
|
|||
|
For each flow-type, the programmed filters must all have the same matching
|
|||
|
input set. For example, issuing the following two commands is acceptable::
|
|||
|
|
|||
|
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
|
|||
|
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
|
|||
|
|
|||
|
Issuing the next two commands, however, is not acceptable, since the first
|
|||
|
specifies src-ip and the second specifies dst-ip::
|
|||
|
|
|||
|
ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
|
|||
|
ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
|
|||
|
|
|||
|
The second command will fail with an error. You may program multiple filters
|
|||
|
with the same fields, using different values, but, on one device, you may not
|
|||
|
program two tcp4 filters with different matching fields.
|
|||
|
|
|||
|
Matching on a sub-portion of a field is not supported by the i40e driver, thus
|
|||
|
partial mask fields are not supported.
|
|||
|
|
|||
|
The driver also supports matching user-defined data within the packet payload.
|
|||
|
This flexible data is specified using the "user-def" field of the ethtool
|
|||
|
command in the following way:
|
|||
|
|
|||
|
+----------------------------+--------------------------+
|
|||
|
| 31 28 24 20 16 | 15 12 8 4 0 |
|
|||
|
+----------------------------+--------------------------+
|
|||
|
| offset into packet payload | 2 bytes of flexible data |
|
|||
|
+----------------------------+--------------------------+
|
|||
|
|
|||
|
For example,
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
... user-def 0x4FFFF ...
|
|||
|
|
|||
|
tells the filter to look 4 bytes into the payload and match that value against
|
|||
|
0xFFFF. The offset is based on the beginning of the payload, and not the
|
|||
|
beginning of the packet. Thus
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
flow-type tcp4 ... user-def 0x8BEAF ...
|
|||
|
|
|||
|
would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the
|
|||
|
TCP/IPv4 payload.
|
|||
|
|
|||
|
Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload.
|
|||
|
Thus to match the first byte of the payload, you must actually add 4 bytes to
|
|||
|
the offset. Also note that ip4 filters match both ICMP frames as well as raw
|
|||
|
(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 frame.
|
|||
|
|
|||
|
The maximum offset is 64. The hardware will only read up to 64 bytes of data
|
|||
|
from the payload. The offset must be even because the flexible data is 2 bytes
|
|||
|
long and must be aligned to byte 0 of the packet payload.
|
|||
|
|
|||
|
The user-defined flexible offset is also considered part of the input set and
|
|||
|
cannot be programmed separately for multiple filters of the same type. However,
|
|||
|
the flexible data is not part of the input set and multiple filters may use the
|
|||
|
same offset but match against different data.
|
|||
|
|
|||
|
To create filters that direct traffic to a specific Virtual Function, use the
|
|||
|
"action" parameter. Specify the action as a 64 bit value, where the lower 32
|
|||
|
bits represents the queue number, while the next 8 bits represent which VF.
|
|||
|
Note that 0 is the PF, so the VF identifier is offset by 1. For example::
|
|||
|
|
|||
|
... action 0x800000002 ...
|
|||
|
|
|||
|
specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of
|
|||
|
that VF.
|
|||
|
|
|||
|
Note that these filters will not break internal routing rules, and will not
|
|||
|
route traffic that otherwise would not have been sent to the specified Virtual
|
|||
|
Function.
|
|||
|
|
|||
|
Setting the link-down-on-close Private Flag
|
|||
|
-------------------------------------------
|
|||
|
When the link-down-on-close private flag is set to "on", the port's link will
|
|||
|
go down when the interface is brought down using the ifconfig ethX down command.
|
|||
|
|
|||
|
Use ethtool to view and set link-down-on-close, as follows::
|
|||
|
|
|||
|
ethtool --show-priv-flags ethX
|
|||
|
ethtool --set-priv-flags ethX link-down-on-close [on|off]
|
|||
|
|
|||
|
Viewing Link Messages
|
|||
|
---------------------
|
|||
|
Link messages will not be displayed to the console if the distribution is
|
|||
|
restricting system messages. In order to see network driver link messages on
|
|||
|
your console, set dmesg to eight by entering the following::
|
|||
|
|
|||
|
dmesg -n 8
|
|||
|
|
|||
|
NOTE: This setting is not saved across reboots.
|
|||
|
|
|||
|
Jumbo Frames
|
|||
|
------------
|
|||
|
Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
|
|||
|
to a value larger than the default value of 1500.
|
|||
|
|
|||
|
Use the ifconfig command to increase the MTU size. For example, enter the
|
|||
|
following where <x> is the interface number::
|
|||
|
|
|||
|
ifconfig eth<x> mtu 9000 up
|
|||
|
|
|||
|
Alternatively, you can use the ip command as follows::
|
|||
|
|
|||
|
ip link set mtu 9000 dev eth<x>
|
|||
|
ip link set up dev eth<x>
|
|||
|
|
|||
|
This setting is not saved across reboots. The setting change can be made
|
|||
|
permanent by adding 'MTU=9000' to the file::
|
|||
|
|
|||
|
/etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL
|
|||
|
/etc/sysconfig/network/<config_file> // for SLES
|
|||
|
|
|||
|
NOTE: The maximum MTU setting for Jumbo Frames is 9702. This value coincides
|
|||
|
with the maximum Jumbo Frames size of 9728 bytes.
|
|||
|
|
|||
|
NOTE: This driver will attempt to use multiple page sized buffers to receive
|
|||
|
each jumbo packet. This should help to avoid buffer starvation issues when
|
|||
|
allocating receive packets.
|
|||
|
|
|||
|
ethtool
|
|||
|
-------
|
|||
|
The driver utilizes the ethtool interface for driver configuration and
|
|||
|
diagnostics, as well as displaying statistical information. The latest ethtool
|
|||
|
version is required for this functionality. Download it at:
|
|||
|
https://www.kernel.org/pub/software/network/ethtool/
|
|||
|
|
|||
|
Supported ethtool Commands and Options for Filtering
|
|||
|
----------------------------------------------------
|
|||
|
-n --show-nfc
|
|||
|
Retrieves the receive network flow classification configurations.
|
|||
|
|
|||
|
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6
|
|||
|
Retrieves the hash options for the specified network traffic type.
|
|||
|
|
|||
|
-N --config-nfc
|
|||
|
Configures the receive network flow classification.
|
|||
|
|
|||
|
rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r...
|
|||
|
Configures the hash options for the specified network traffic type.
|
|||
|
|
|||
|
udp4 UDP over IPv4
|
|||
|
udp6 UDP over IPv6
|
|||
|
|
|||
|
f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
|
|||
|
n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
|
|||
|
|
|||
|
Speed and Duplex Configuration
|
|||
|
------------------------------
|
|||
|
In addressing speed and duplex configuration issues, you need to distinguish
|
|||
|
between copper-based adapters and fiber-based adapters.
|
|||
|
|
|||
|
In the default mode, an Intel(R) Ethernet Network Adapter using copper
|
|||
|
connections will attempt to auto-negotiate with its link partner to determine
|
|||
|
the best setting. If the adapter cannot establish link with the link partner
|
|||
|
using auto-negotiation, you may need to manually configure the adapter and link
|
|||
|
partner to identical settings to establish link and pass packets. This should
|
|||
|
only be needed when attempting to link with an older switch that does not
|
|||
|
support auto-negotiation or one that has been forced to a specific speed or
|
|||
|
duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
|
|||
|
and higher cannot be forced. Use the autonegotiation advertising setting to
|
|||
|
manually set devices for 1 Gbps and higher.
|
|||
|
|
|||
|
NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet
|
|||
|
Network Adapter XXV710 based devices.
|
|||
|
|
|||
|
Speed, duplex, and autonegotiation advertising are configured through the
|
|||
|
ethtool utility.
|
|||
|
|
|||
|
Caution: Only experienced network administrators should force speed and duplex
|
|||
|
or change autonegotiation advertising manually. The settings at the switch must
|
|||
|
always match the adapter settings. Adapter performance may suffer or your
|
|||
|
adapter may not operate if you configure the adapter differently from your
|
|||
|
switch.
|
|||
|
|
|||
|
An Intel(R) Ethernet Network Adapter using fiber-based connections, however,
|
|||
|
will not attempt to auto-negotiate with its link partner since those adapters
|
|||
|
operate only in full duplex and only at their native speed.
|
|||
|
|
|||
|
NAPI
|
|||
|
----
|
|||
|
NAPI (Rx polling mode) is supported in the i40e driver.
|
|||
|
For more information on NAPI, see
|
|||
|
https://wiki.linuxfoundation.org/networking/napi
|
|||
|
|
|||
|
Flow Control
|
|||
|
------------
|
|||
|
Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
|
|||
|
receiving and transmitting pause frames for i40e. When transmit is enabled,
|
|||
|
pause frames are generated when the receive packet buffer crosses a predefined
|
|||
|
threshold. When receive is enabled, the transmit unit will halt for the time
|
|||
|
delay specified when a pause frame is received.
|
|||
|
|
|||
|
NOTE: You must have a flow control capable link partner.
|
|||
|
|
|||
|
Flow Control is on by default.
|
|||
|
|
|||
|
Use ethtool to change the flow control settings.
|
|||
|
|
|||
|
To enable or disable Rx or Tx Flow Control::
|
|||
|
|
|||
|
ethtool -A eth? rx <on|off> tx <on|off>
|
|||
|
|
|||
|
Note: This command only enables or disables Flow Control if auto-negotiation is
|
|||
|
disabled. If auto-negotiation is enabled, this command changes the parameters
|
|||
|
used for auto-negotiation with the link partner.
|
|||
|
|
|||
|
To enable or disable auto-negotiation::
|
|||
|
|
|||
|
ethtool -s eth? autoneg <on|off>
|
|||
|
|
|||
|
Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
|
|||
|
on your device, you may not be able to change the auto-negotiation setting.
|
|||
|
|
|||
|
RSS Hash Flow
|
|||
|
-------------
|
|||
|
Allows you to set the hash bytes per flow type and any combination of one or
|
|||
|
more options for Receive Side Scaling (RSS) hash byte configuration.
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
# ethtool -N <dev> rx-flow-hash <type> <option>
|
|||
|
|
|||
|
Where <type> is:
|
|||
|
tcp4 signifying TCP over IPv4
|
|||
|
udp4 signifying UDP over IPv4
|
|||
|
tcp6 signifying TCP over IPv6
|
|||
|
udp6 signifying UDP over IPv6
|
|||
|
And <option> is one or more of:
|
|||
|
s Hash on the IP source address of the Rx packet.
|
|||
|
d Hash on the IP destination address of the Rx packet.
|
|||
|
f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
|
|||
|
n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
|
|||
|
|
|||
|
MAC and VLAN anti-spoofing feature
|
|||
|
----------------------------------
|
|||
|
When a malicious driver attempts to send a spoofed packet, it is dropped by the
|
|||
|
hardware and not transmitted.
|
|||
|
NOTE: This feature can be disabled for a specific Virtual Function (VF)::
|
|||
|
|
|||
|
ip link set <pf dev> vf <vf id> spoofchk {off|on}
|
|||
|
|
|||
|
IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC)
|
|||
|
------------------------------------------------------------
|
|||
|
Precision Time Protocol (PTP) is used to synchronize clocks in a computer
|
|||
|
network. PTP support varies among Intel devices that support this driver. Use
|
|||
|
"ethtool -T <netdev name>" to get a definitive list of PTP capabilities
|
|||
|
supported by the device.
|
|||
|
|
|||
|
IEEE 802.1ad (QinQ) Support
|
|||
|
---------------------------
|
|||
|
The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
|
|||
|
IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
|
|||
|
"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
|
|||
|
allow L2 tunneling and the ability to segregate traffic within a particular
|
|||
|
VLAN ID, among other uses.
|
|||
|
|
|||
|
The following are examples of how to configure 802.1ad (QinQ)::
|
|||
|
|
|||
|
ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
|
|||
|
ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
|
|||
|
|
|||
|
Where "24" and "371" are example VLAN IDs.
|
|||
|
|
|||
|
NOTES:
|
|||
|
Receive checksum offloads, cloud filters, and VLAN acceleration are not
|
|||
|
supported for 802.1ad (QinQ) packets.
|
|||
|
|
|||
|
VXLAN and GENEVE Overlay HW Offloading
|
|||
|
--------------------------------------
|
|||
|
Virtual Extensible LAN (VXLAN) allows you to extend an L2 network over an L3
|
|||
|
network, which may be useful in a virtualized or cloud environment. Some
|
|||
|
Intel(R) Ethernet Network devices perform VXLAN processing, offloading it from
|
|||
|
the operating system. This reduces CPU utilization.
|
|||
|
|
|||
|
VXLAN offloading is controlled by the Tx and Rx checksum offload options
|
|||
|
provided by ethtool. That is, if Tx checksum offload is enabled, and the
|
|||
|
adapter has the capability, VXLAN offloading is also enabled.
|
|||
|
|
|||
|
Support for VXLAN and GENEVE HW offloading is dependent on kernel support of
|
|||
|
the HW offloading features.
|
|||
|
|
|||
|
Multiple Functions per Port
|
|||
|
---------------------------
|
|||
|
Some adapters based on the Intel Ethernet Controller X710/XL710 support
|
|||
|
multiple functions on a single physical port. Configure these functions through
|
|||
|
the System Setup/BIOS.
|
|||
|
|
|||
|
Minimum TX Bandwidth is the guaranteed minimum data transmission bandwidth, as
|
|||
|
a percentage of the full physical port link speed, that the partition will
|
|||
|
receive. The bandwidth the partition is awarded will never fall below the level
|
|||
|
you specify.
|
|||
|
|
|||
|
The range for the minimum bandwidth values is:
|
|||
|
1 to ((100 minus # of partitions on the physical port) plus 1)
|
|||
|
For example, if a physical port has 4 partitions, the range would be:
|
|||
|
1 to ((100 - 4) + 1 = 97)
|
|||
|
|
|||
|
The Maximum Bandwidth percentage represents the maximum transmit bandwidth
|
|||
|
allocated to the partition as a percentage of the full physical port link
|
|||
|
speed. The accepted range of values is 1-100. The value is used as a limiter,
|
|||
|
should you chose that any one particular function not be able to consume 100%
|
|||
|
of a port's bandwidth (should it be available). The sum of all the values for
|
|||
|
Maximum Bandwidth is not restricted, because no more than 100% of a port's
|
|||
|
bandwidth can ever be used.
|
|||
|
|
|||
|
NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions
|
|||
|
per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says
|
|||
|
"add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than
|
|||
|
64 virtual functions (VFs).
|
|||
|
|
|||
|
Data Center Bridging (DCB)
|
|||
|
--------------------------
|
|||
|
DCB is a configuration Quality of Service implementation in hardware. It uses
|
|||
|
the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
|
|||
|
different priorities that traffic can be filtered into. It also enables
|
|||
|
priority flow control (802.1Qbb) which can limit or eliminate the number of
|
|||
|
dropped packets during network stress. Bandwidth can be allocated to each of
|
|||
|
these priorities, which is enforced at the hardware level (802.1Qaz).
|
|||
|
|
|||
|
Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and
|
|||
|
802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only
|
|||
|
and can accept settings from a DCBX capable peer. Software configuration of
|
|||
|
DCBX parameters via dcbtool/lldptool are not supported.
|
|||
|
|
|||
|
NOTE: Firmware LLDP can be disabled by setting the private flag disable-fw-lldp.
|
|||
|
|
|||
|
The i40e driver implements the DCB netlink interface layer to allow user-space
|
|||
|
to communicate with the driver and query DCB configuration for the port.
|
|||
|
|
|||
|
NOTE:
|
|||
|
The kernel assumes that TC0 is available, and will disable Priority Flow
|
|||
|
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
|
|||
|
enabled when setting up DCB on your switch.
|
|||
|
|
|||
|
Interrupt Rate Limiting
|
|||
|
-----------------------
|
|||
|
:Valid Range: 0-235 (0=no limit)
|
|||
|
|
|||
|
The Intel(R) Ethernet Controller XL710 family supports an interrupt rate
|
|||
|
limiting mechanism. The user can control, via ethtool, the number of
|
|||
|
microseconds between interrupts.
|
|||
|
|
|||
|
Syntax::
|
|||
|
|
|||
|
# ethtool -C ethX rx-usecs-high N
|
|||
|
|
|||
|
The range of 0-235 microseconds provides an effective range of 4,310 to 250,000
|
|||
|
interrupts per second. The value of rx-usecs-high can be set independently of
|
|||
|
rx-usecs and tx-usecs in the same ethtool command, and is also independent of
|
|||
|
the adaptive interrupt moderation algorithm. The underlying hardware supports
|
|||
|
granularity in 4-microsecond intervals, so adjacent values may result in the
|
|||
|
same interrupt rate.
|
|||
|
|
|||
|
One possible use case is the following::
|
|||
|
|
|||
|
# ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs-high 20 rx-usecs \
|
|||
|
5 tx-usecs 5
|
|||
|
|
|||
|
The above command would disable adaptive interrupt moderation, and allow a
|
|||
|
maximum of 5 microseconds before indicating a receive or transmit was complete.
|
|||
|
However, instead of resulting in as many as 200,000 interrupts per second, it
|
|||
|
limits total interrupts per second to 50,000 via the rx-usecs-high parameter.
|
|||
|
|
|||
|
Performance Optimization
|
|||
|
========================
|
|||
|
Driver defaults are meant to fit a wide variety of workloads, but if further
|
|||
|
optimization is required we recommend experimenting with the following settings.
|
|||
|
|
|||
|
NOTE: For better performance when processing small (64B) frame sizes, try
|
|||
|
enabling Hyper threading in the BIOS in order to increase the number of logical
|
|||
|
cores in the system and subsequently increase the number of queues available to
|
|||
|
the adapter.
|
|||
|
|
|||
|
Virtualized Environments
|
|||
|
------------------------
|
|||
|
1. Disable XPS on both ends by using the included virt_perf_default script
|
|||
|
or by running the following command as root::
|
|||
|
|
|||
|
for file in `ls /sys/class/net/<ethX>/queues/tx-*/xps_cpus`;
|
|||
|
do echo 0 > $file; done
|
|||
|
|
|||
|
2. Using the appropriate mechanism (vcpupin) in the vm, pin the cpu's to
|
|||
|
individual lcpu's, making sure to use a set of cpu's included in the
|
|||
|
device's local_cpulist: /sys/class/net/<ethX>/device/local_cpulist.
|
|||
|
|
|||
|
3. Configure as many Rx/Tx queues in the VM as available. Do not rely on
|
|||
|
the default setting of 1.
|
|||
|
|
|||
|
|
|||
|
Non-virtualized Environments
|
|||
|
----------------------------
|
|||
|
Pin the adapter's IRQs to specific cores by disabling the irqbalance service
|
|||
|
and using the included set_irq_affinity script. Please see the script's help
|
|||
|
text for further options.
|
|||
|
|
|||
|
- The following settings will distribute the IRQs across all the cores evenly::
|
|||
|
|
|||
|
# scripts/set_irq_affinity -x all <interface1> , [ <interface2>, ... ]
|
|||
|
|
|||
|
- The following settings will distribute the IRQs across all the cores that are
|
|||
|
local to the adapter (same NUMA node)::
|
|||
|
|
|||
|
# scripts/set_irq_affinity -x local <interface1> ,[ <interface2>, ... ]
|
|||
|
|
|||
|
For very CPU intensive workloads, we recommend pinning the IRQs to all cores.
|
|||
|
|
|||
|
For IP Forwarding: Disable Adaptive ITR and lower Rx and Tx interrupts per
|
|||
|
queue using ethtool.
|
|||
|
|
|||
|
- Setting rx-usecs and tx-usecs to 125 will limit interrupts to about 8000
|
|||
|
interrupts per second per queue.
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 125 \
|
|||
|
tx-usecs 125
|
|||
|
|
|||
|
For lower CPU utilization: Disable Adaptive ITR and lower Rx and Tx interrupts
|
|||
|
per queue using ethtool.
|
|||
|
|
|||
|
- Setting rx-usecs and tx-usecs to 250 will limit interrupts to about 4000
|
|||
|
interrupts per second per queue.
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 250 \
|
|||
|
tx-usecs 250
|
|||
|
|
|||
|
For lower latency: Disable Adaptive ITR and ITR by setting Rx and Tx to 0 using
|
|||
|
ethtool.
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
# ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 0 \
|
|||
|
tx-usecs 0
|
|||
|
|
|||
|
Application Device Queues (ADq)
|
|||
|
-------------------------------
|
|||
|
Application Device Queues (ADq) allows you to dedicate one or more queues to a
|
|||
|
specific application. This can reduce latency for the specified application,
|
|||
|
and allow Tx traffic to be rate limited per application. Follow the steps below
|
|||
|
to set ADq.
|
|||
|
|
|||
|
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
|
|||
|
The shaper bw_rlimit parameter is optional.
|
|||
|
|
|||
|
Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
|
|||
|
to 1Gbit for tc0 and 3Gbit for tc1.
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
# tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
|
|||
|
queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
|
|||
|
max_rate 1Gbit 3Gbit
|
|||
|
|
|||
|
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
|
|||
|
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
|
|||
|
|
|||
|
queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
|
|||
|
16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
|
|||
|
number of queues for all tcs is 64 or number of cores, whichever is lower.)
|
|||
|
|
|||
|
hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware
|
|||
|
offload mode in mqprio that makes full use of the mqprio options, the
|
|||
|
TCs, the queue configurations, and the QoS parameters.
|
|||
|
|
|||
|
shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
|
|||
|
Totals must be equal or less than port speed.
|
|||
|
|
|||
|
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
|
|||
|
monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
|
|||
|
|
|||
|
2. Enable HW TC offload on interface::
|
|||
|
|
|||
|
# ethtool -K <interface> hw-tc-offload on
|
|||
|
|
|||
|
3. Apply TCs to ingress (RX) flow of interface::
|
|||
|
|
|||
|
# tc qdisc add dev <interface> ingress
|
|||
|
|
|||
|
NOTES:
|
|||
|
- Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory.
|
|||
|
- ADq is not compatible with cloud filters.
|
|||
|
- Setting up channels via ethtool (ethtool -L) is not supported when the
|
|||
|
TCs are configured using mqprio.
|
|||
|
- You must have iproute2 latest version
|
|||
|
- NVM version 6.01 or later is required.
|
|||
|
- ADq cannot be enabled when any the following features are enabled: Data
|
|||
|
Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband
|
|||
|
Filters.
|
|||
|
- If another driver (for example, DPDK) has set cloud filters, you cannot
|
|||
|
enable ADq.
|
|||
|
- Tunnel filters are not supported in ADq. If encapsulated packets do
|
|||
|
arrive in non-tunnel mode, filtering will be done on the inner headers.
|
|||
|
For example, for VXLAN traffic in non-tunnel mode, PCTYPE is identified
|
|||
|
as a VXLAN encapsulated packet, outer headers are ignored. Therefore,
|
|||
|
inner headers are matched.
|
|||
|
- If a TC filter on a PF matches traffic over a VF (on the PF), that
|
|||
|
traffic will be routed to the appropriate queue of the PF, and will
|
|||
|
not be passed on the VF. Such traffic will end up getting dropped higher
|
|||
|
up in the TCP/IP stack as it does not match PF address data.
|
|||
|
- If traffic matches multiple TC filters that point to different TCs,
|
|||
|
that traffic will be duplicated and sent to all matching TC queues.
|
|||
|
The hardware switch mirrors the packet to a VSI list when multiple
|
|||
|
filters are matched.
|
|||
|
|
|||
|
|
|||
|
Known Issues/Troubleshooting
|
|||
|
============================
|
|||
|
|
|||
|
NOTE: 1 Gb devices based on the Intel(R) Ethernet Network Connection X722 do
|
|||
|
not support the following features:
|
|||
|
|
|||
|
* Data Center Bridging (DCB)
|
|||
|
* QOS
|
|||
|
* VMQ
|
|||
|
* SR-IOV
|
|||
|
* Task Encapsulation offload (VXLAN, NVGRE)
|
|||
|
* Energy Efficient Ethernet (EEE)
|
|||
|
* Auto-media detect
|
|||
|
|
|||
|
Unexpected Issues when the device driver and DPDK share a device
|
|||
|
----------------------------------------------------------------
|
|||
|
Unexpected issues may result when an i40e device is in multi driver mode and
|
|||
|
the kernel driver and DPDK driver are sharing the device. This is because
|
|||
|
access to the global NIC resources is not synchronized between multiple
|
|||
|
drivers. Any change to the global NIC configuration (writing to a global
|
|||
|
register, setting global configuration by AQ, or changing switch modes) will
|
|||
|
affect all ports and drivers on the device. Loading DPDK with the
|
|||
|
"multi-driver" module parameter may mitigate some of the issues.
|
|||
|
|
|||
|
TC0 must be enabled when setting up DCB on a switch
|
|||
|
---------------------------------------------------
|
|||
|
The kernel assumes that TC0 is available, and will disable Priority Flow
|
|||
|
Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
|
|||
|
enabled when setting up DCB on your switch.
|
|||
|
|
|||
|
|
|||
|
Support
|
|||
|
=======
|
|||
|
For general information, go to the Intel support website at:
|
|||
|
|
|||
|
https://www.intel.com/support/
|
|||
|
|
|||
|
or the Intel Wired Networking project hosted by Sourceforge at:
|
|||
|
|
|||
|
https://sourceforge.net/projects/e1000
|
|||
|
|
|||
|
If an issue is identified with the released source code on a supported kernel
|
|||
|
with a supported adapter, email the specific information related to the issue
|
|||
|
to e1000-devel@lists.sf.net.
|