170 lines
6.3 KiB
ReStructuredText
170 lines
6.3 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0-only
|
|
.. Copyright (C) 2020 Google LLC.
|
|
|
|
===========================
|
|
BPF_MAP_TYPE_CGROUP_STORAGE
|
|
===========================
|
|
|
|
The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
|
|
storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
|
|
attach to cgroups; the programs are made available by the same Kconfig. The
|
|
storage is identified by the cgroup the program is attached to.
|
|
|
|
The map provide a local storage at the cgroup that the BPF program is attached
|
|
to. It provides a faster and simpler access than the general purpose hash
|
|
table, which performs a hash table lookups, and requires user to track live
|
|
cgroups on their own.
|
|
|
|
This document describes the usage and semantics of the
|
|
``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
|
|
Linux 5.9 and this document will describe the differences.
|
|
|
|
Usage
|
|
=====
|
|
|
|
The map uses key of type of either ``__u64 cgroup_inode_id`` or
|
|
``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::
|
|
|
|
struct bpf_cgroup_storage_key {
|
|
__u64 cgroup_inode_id;
|
|
__u32 attach_type;
|
|
};
|
|
|
|
``cgroup_inode_id`` is the inode id of the cgroup directory.
|
|
``attach_type`` is the the program's attach type.
|
|
|
|
Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
|
|
When this key type is used, then all attach types of the particular cgroup and
|
|
map will share the same storage. Otherwise, if the type is
|
|
``struct bpf_cgroup_storage_key``, then programs of different attach types
|
|
be isolated and see different storages.
|
|
|
|
To access the storage in a program, use ``bpf_get_local_storage``::
|
|
|
|
void *bpf_get_local_storage(void *map, u64 flags)
|
|
|
|
``flags`` is reserved for future use and must be 0.
|
|
|
|
There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
|
|
can be accessed by multiple programs across different CPUs, and user should
|
|
take care of synchronization by themselves. The bpf infrastructure provides
|
|
``struct bpf_spin_lock`` to synchronize the storage. See
|
|
``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
|
|
|
|
Examples
|
|
========
|
|
|
|
Usage with key type as ``struct bpf_cgroup_storage_key``::
|
|
|
|
#include <bpf/bpf.h>
|
|
|
|
struct {
|
|
__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
|
|
__type(key, struct bpf_cgroup_storage_key);
|
|
__type(value, __u32);
|
|
} cgroup_storage SEC(".maps");
|
|
|
|
int program(struct __sk_buff *skb)
|
|
{
|
|
__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
|
|
__sync_fetch_and_add(ptr, 1);
|
|
|
|
return 0;
|
|
}
|
|
|
|
Userspace accessing map declared above::
|
|
|
|
#include <linux/bpf.h>
|
|
#include <linux/libbpf.h>
|
|
|
|
__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
|
|
{
|
|
struct bpf_cgroup_storage_key = {
|
|
.cgroup_inode_id = cgrp,
|
|
.attach_type = type,
|
|
};
|
|
__u32 value;
|
|
bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
|
|
// error checking omitted
|
|
return value;
|
|
}
|
|
|
|
Alternatively, using just ``__u64 cgroup_inode_id`` as key type::
|
|
|
|
#include <bpf/bpf.h>
|
|
|
|
struct {
|
|
__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
|
|
__type(key, __u64);
|
|
__type(value, __u32);
|
|
} cgroup_storage SEC(".maps");
|
|
|
|
int program(struct __sk_buff *skb)
|
|
{
|
|
__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
|
|
__sync_fetch_and_add(ptr, 1);
|
|
|
|
return 0;
|
|
}
|
|
|
|
And userspace::
|
|
|
|
#include <linux/bpf.h>
|
|
#include <linux/libbpf.h>
|
|
|
|
__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
|
|
{
|
|
__u32 value;
|
|
bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
|
|
// error checking omitted
|
|
return value;
|
|
}
|
|
|
|
Semantics
|
|
=========
|
|
|
|
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
|
|
per-CPU variant will have different memory regions for each CPU for each
|
|
storage. The non-per-CPU will have the same memory region for each storage.
|
|
|
|
Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
|
|
for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
|
|
that uses the map. A program may be attached to multiple cgroups or have
|
|
multiple attach types, and each attach creates a fresh zeroed storage. The
|
|
storage is freed upon detach.
|
|
|
|
There is a one-to-one association between the map of each type (per-CPU and
|
|
non-per-CPU) and the BPF program during load verification time. As a result,
|
|
each map can only be used by one BPF program and each BPF program can only use
|
|
one storage map of each type. Because of map can only be used by one BPF
|
|
program, sharing of this cgroup's storage with other BPF programs were
|
|
impossible.
|
|
|
|
Since Linux 5.9, storage can be shared by multiple programs. When a program is
|
|
attached to a cgroup, the kernel would create a new storage only if the map
|
|
does not already contain an entry for the cgroup and attach type pair, or else
|
|
the old storage is reused for the new attachment. If the map is attach type
|
|
shared, then attach type is simply ignored during comparison. Storage is freed
|
|
only when either the map or the cgroup attached to is being freed. Detaching
|
|
will not directly free the storage, but it may cause the reference to the map
|
|
to reach zero and indirectly freeing all storage in the map.
|
|
|
|
The map is not associated with any BPF program, thus making sharing possible.
|
|
However, the BPF program can still only associate with one map of each type
|
|
(per-CPU and non-per-CPU). A BPF program cannot use more than one
|
|
``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
|
|
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
|
|
|
|
In all versions, userspace may use the the attach parameters of cgroup and
|
|
attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
|
|
APIs to read or update the storage for a given attachment. For Linux 5.9
|
|
attach type shared storages, only the first value in the struct, cgroup inode
|
|
id, is used during comparison, so userspace may just specify a ``__u64``
|
|
directly.
|
|
|
|
The storage is bound at attach time. Even if the program is attached to parent
|
|
and triggers in child, the storage still belongs to the parent.
|
|
|
|
Userspace cannot create a new entry in the map or delete an existing entry.
|
|
Program test runs always use a temporary storage.
|