332 lines
13 KiB
ReStructuredText
332 lines
13 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0+
|
||
|
||
=================================================================
|
||
Linux Base Driver for Intel(R) Ethernet Adaptive Virtual Function
|
||
=================================================================
|
||
|
||
Intel Ethernet Adaptive Virtual Function Linux driver.
|
||
Copyright(c) 2013-2018 Intel Corporation.
|
||
|
||
Contents
|
||
========
|
||
|
||
- Overview
|
||
- Identifying Your Adapter
|
||
- Additional Configurations
|
||
- Known Issues/Troubleshooting
|
||
- Support
|
||
|
||
Overview
|
||
========
|
||
|
||
This file describes the iavf Linux Base Driver. This driver was formerly
|
||
called i40evf.
|
||
|
||
The iavf driver supports the below mentioned virtual function devices and
|
||
can only be activated on kernels running the i40e or newer Physical Function
|
||
(PF) driver compiled with CONFIG_PCI_IOV. The iavf driver requires
|
||
CONFIG_PCI_MSI to be enabled.
|
||
|
||
The guest OS loading the iavf driver must support MSI-X interrupts.
|
||
|
||
Identifying Your Adapter
|
||
========================
|
||
|
||
The driver in this kernel is compatible with devices based on the following:
|
||
* Intel(R) XL710 X710 Virtual Function
|
||
* Intel(R) X722 Virtual Function
|
||
* Intel(R) XXV710 Virtual Function
|
||
* Intel(R) Ethernet Adaptive Virtual Function
|
||
|
||
For the best performance, make sure the latest NVM/FW is installed on your
|
||
device.
|
||
|
||
For information on how to identify your adapter, and for the latest NVM/FW
|
||
images and Intel network drivers, refer to the Intel Support website:
|
||
http://www.intel.com/support
|
||
|
||
|
||
Additional Features and Configurations
|
||
======================================
|
||
|
||
Viewing Link Messages
|
||
---------------------
|
||
Link messages will not be displayed to the console if the distribution is
|
||
restricting system messages. In order to see network driver link messages on
|
||
your console, set dmesg to eight by entering the following::
|
||
|
||
# dmesg -n 8
|
||
|
||
NOTE:
|
||
This setting is not saved across reboots.
|
||
|
||
ethtool
|
||
-------
|
||
The driver utilizes the ethtool interface for driver configuration and
|
||
diagnostics, as well as displaying statistical information. The latest ethtool
|
||
version is required for this functionality. Download it at:
|
||
https://www.kernel.org/pub/software/network/ethtool/
|
||
|
||
Setting VLAN Tag Stripping
|
||
--------------------------
|
||
If you have applications that require Virtual Functions (VFs) to receive
|
||
packets with VLAN tags, you can disable VLAN tag stripping for the VF. The
|
||
Physical Function (PF) processes requests issued from the VF to enable or
|
||
disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF,
|
||
then requests from that VF to set VLAN tag stripping will be ignored.
|
||
|
||
To enable/disable VLAN tag stripping for a VF, issue the following command
|
||
from inside the VM in which you are running the VF::
|
||
|
||
# ethtool -K <if_name> rxvlan on/off
|
||
|
||
or alternatively::
|
||
|
||
# ethtool --offload <if_name> rxvlan on/off
|
||
|
||
Adaptive Virtual Function
|
||
-------------------------
|
||
Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to
|
||
adapt to changing feature sets of the physical function driver (PF) with which
|
||
it is associated. This allows system administrators to update a PF without
|
||
having to update all the VFs associated with it. All AVFs have a single common
|
||
device ID and branding string.
|
||
|
||
AVFs have a minimum set of features known as "base mode," but may provide
|
||
additional features depending on what features are available in the PF with
|
||
which the AVF is associated. The following are base mode features:
|
||
|
||
- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
|
||
for Tx/Rx
|
||
- i40e descriptors and ring format
|
||
- Descriptor write-back completion
|
||
- 1 control queue, with i40e descriptors, CSRs and ring format
|
||
- 5 MSI-X interrupt vectors and corresponding i40e CSRs
|
||
- 1 Interrupt Throttle Rate (ITR) index
|
||
- 1 Virtual Station Interface (VSI) per VF
|
||
- 1 Traffic Class (TC), TC0
|
||
- Receive Side Scaling (RSS) with 64 entry indirection table and key,
|
||
configured through the PF
|
||
- 1 unicast MAC address reserved per VF
|
||
- 16 MAC address filters for each VF
|
||
- Stateless offloads - non-tunneled checksums
|
||
- AVF device ID
|
||
- HW mailbox is used for VF to PF communications (including on Windows)
|
||
|
||
IEEE 802.1ad (QinQ) Support
|
||
---------------------------
|
||
The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
|
||
IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
|
||
"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
|
||
allow L2 tunneling and the ability to segregate traffic within a particular
|
||
VLAN ID, among other uses.
|
||
|
||
The following are examples of how to configure 802.1ad (QinQ)::
|
||
|
||
# ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
|
||
# ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
|
||
|
||
Where "24" and "371" are example VLAN IDs.
|
||
|
||
NOTES:
|
||
Receive checksum offloads, cloud filters, and VLAN acceleration are not
|
||
supported for 802.1ad (QinQ) packets.
|
||
|
||
Application Device Queues (ADq)
|
||
-------------------------------
|
||
Application Device Queues (ADq) allows you to dedicate one or more queues to a
|
||
specific application. This can reduce latency for the specified application,
|
||
and allow Tx traffic to be rate limited per application. Follow the steps below
|
||
to set ADq.
|
||
|
||
Requirements:
|
||
|
||
- The sch_mqprio, act_mirred and cls_flower modules must be loaded
|
||
- The latest version of iproute2
|
||
- If another driver (for example, DPDK) has set cloud filters, you cannot
|
||
enable ADQ
|
||
- Depending on the underlying PF device, ADQ cannot be enabled when the
|
||
following features are enabled:
|
||
|
||
+ Data Center Bridging (DCB)
|
||
+ Multiple Functions per Port (MFP)
|
||
+ Sideband Filters
|
||
|
||
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
|
||
The shaper bw_rlimit parameter is optional.
|
||
|
||
Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
|
||
to 1Gbit for tc0 and 3Gbit for tc1.
|
||
|
||
::
|
||
|
||
tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
|
||
queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
|
||
max_rate 1Gbit 3Gbit
|
||
|
||
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
|
||
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
|
||
|
||
queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
|
||
16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
|
||
number of queues for all tcs is 64 or number of cores, whichever is lower.)
|
||
|
||
hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware
|
||
offload mode in mqprio that makes full use of the mqprio options, the
|
||
TCs, the queue configurations, and the QoS parameters.
|
||
|
||
shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
|
||
Totals must be equal or less than port speed.
|
||
|
||
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
|
||
monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
|
||
|
||
NOTE:
|
||
Setting up channels via ethtool (ethtool -L) is not supported when the
|
||
TCs are configured using mqprio.
|
||
|
||
2. Enable HW TC offload on interface::
|
||
|
||
# ethtool -K <interface> hw-tc-offload on
|
||
|
||
3. Apply TCs to ingress (RX) flow of interface::
|
||
|
||
# tc qdisc add dev <interface> ingress
|
||
|
||
NOTES:
|
||
- Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
|
||
- ADq is not compatible with cloud filters
|
||
- Setting up channels via ethtool (ethtool -L) is not supported when the TCs
|
||
are configured using mqprio
|
||
- You must have iproute2 latest version
|
||
- NVM version 6.01 or later is required
|
||
- ADq cannot be enabled when any the following features are enabled: Data
|
||
Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
|
||
- If another driver (for example, DPDK) has set cloud filters, you cannot
|
||
enable ADq
|
||
- Tunnel filters are not supported in ADq. If encapsulated packets do arrive
|
||
in non-tunnel mode, filtering will be done on the inner headers. For example,
|
||
for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
|
||
encapsulated packet, outer headers are ignored. Therefore, inner headers are
|
||
matched.
|
||
- If a TC filter on a PF matches traffic over a VF (on the PF), that traffic
|
||
will be routed to the appropriate queue of the PF, and will not be passed on
|
||
the VF. Such traffic will end up getting dropped higher up in the TCP/IP
|
||
stack as it does not match PF address data.
|
||
- If traffic matches multiple TC filters that point to different TCs, that
|
||
traffic will be duplicated and sent to all matching TC queues. The hardware
|
||
switch mirrors the packet to a VSI list when multiple filters are matched.
|
||
|
||
|
||
Known Issues/Troubleshooting
|
||
============================
|
||
|
||
Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
|
||
---------------------------------------------------------------------------------
|
||
If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
|
||
series based device, the VF slaves may fail when they become the active slave.
|
||
If the MAC address of the VF is set by the PF (Physical Function) of the
|
||
device, when you add a slave, or change the active-backup slave, Linux bonding
|
||
tries to sync the backup slave's MAC address to the same MAC address as the
|
||
active slave. Linux bonding will fail at this point. This issue will not occur
|
||
if the VF's MAC address is not set by the PF.
|
||
|
||
Traffic Is Not Being Passed Between VM and Client
|
||
-------------------------------------------------
|
||
You may not be able to pass traffic between a client system and a
|
||
Virtual Machine (VM) running on a separate host if the Virtual Function
|
||
(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled
|
||
on the VF. Note that this situation can occur in any combination of client,
|
||
host, and guest operating system. For information on how to set the VF to
|
||
trusted mode, refer to the section "VLAN Tag Packet Steering" in this
|
||
readme document. For information on setting spoof checking, refer to the
|
||
section "MAC and VLAN anti-spoofing feature" in this readme document.
|
||
|
||
Do not unload port driver if VF with active VM is bound to it
|
||
-------------------------------------------------------------
|
||
Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
|
||
Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
|
||
Once the VM shuts down, or otherwise releases the VF, the command will complete.
|
||
|
||
Using four traffic classes fails
|
||
--------------------------------
|
||
Do not try to reserve more than three traffic classes in the iavf driver. Doing
|
||
so will fail to set any traffic classes and will cause the driver to write
|
||
errors to stdout. Use a maximum of three queues to avoid this issue.
|
||
|
||
Multiple log error messages on iavf driver removal
|
||
--------------------------------------------------
|
||
If you have several VFs and you remove the iavf driver, several instances of
|
||
the following log errors are written to the log::
|
||
|
||
Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
|
||
Unable to send the message to VF 2 aq_err 12
|
||
ARQ Overflow Error detected
|
||
|
||
Virtual machine does not get link
|
||
---------------------------------
|
||
If the virtual machine has more than one virtual port assigned to it, and those
|
||
virtual ports are bound to different physical ports, you may not get link on
|
||
all of the virtual ports. The following command may work around the issue::
|
||
|
||
# ethtool -r <PF>
|
||
|
||
Where <PF> is the PF interface in the host, for example: p5p1. You may need to
|
||
run the command more than once to get link on all virtual ports.
|
||
|
||
MAC address of Virtual Function changes unexpectedly
|
||
----------------------------------------------------
|
||
If a Virtual Function's MAC address is not assigned in the host, then the VF
|
||
(virtual function) driver will use a random MAC address. This random MAC
|
||
address may change each time the VF driver is reloaded. You can assign a static
|
||
MAC address in the host machine. This static MAC address will survive
|
||
a VF driver reload.
|
||
|
||
Driver Buffer Overflow Fix
|
||
--------------------------
|
||
The fix to resolve CVE-2016-8105, referenced in Intel SA-00069
|
||
https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html
|
||
is included in this and future versions of the driver.
|
||
|
||
Multiple Interfaces on Same Ethernet Broadcast Network
|
||
------------------------------------------------------
|
||
Due to the default ARP behavior on Linux, it is not possible to have one system
|
||
on two IP networks in the same Ethernet broadcast domain (non-partitioned
|
||
switch) behave as expected. All Ethernet interfaces will respond to IP traffic
|
||
for any IP address assigned to the system. This results in unbalanced receive
|
||
traffic.
|
||
|
||
If you have multiple interfaces in a server, either turn on ARP filtering by
|
||
entering::
|
||
|
||
# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
|
||
|
||
NOTE:
|
||
This setting is not saved across reboots. The configuration change can be
|
||
made permanent by adding the following line to the file /etc/sysctl.conf::
|
||
|
||
net.ipv4.conf.all.arp_filter = 1
|
||
|
||
Another alternative is to install the interfaces in separate broadcast domains
|
||
(either in different switches or in a switch partitioned to VLANs).
|
||
|
||
Rx Page Allocation Errors
|
||
-------------------------
|
||
'Page allocation failure. order:0' errors may occur under stress.
|
||
This is caused by the way the Linux kernel reports this stressed condition.
|
||
|
||
|
||
Support
|
||
=======
|
||
For general information, go to the Intel support website at:
|
||
|
||
https://support.intel.com
|
||
|
||
or the Intel Wired Networking project hosted by Sourceforge at:
|
||
|
||
https://sourceforge.net/projects/e1000
|
||
|
||
If an issue is identified with the released source code on the supported kernel
|
||
with a supported adapter, email the specific information related to the issue
|
||
to e1000-devel@lists.sf.net
|