164 lines
7.3 KiB
Plaintext
164 lines
7.3 KiB
Plaintext
|
=======================================
|
||
|
The padata parallel execution mechanism
|
||
|
=======================================
|
||
|
|
||
|
:Last updated: for 2.6.36
|
||
|
|
||
|
Padata is a mechanism by which the kernel can farm work out to be done in
|
||
|
parallel on multiple CPUs while retaining the ordering of tasks. It was
|
||
|
developed for use with the IPsec code, which needs to be able to perform
|
||
|
encryption and decryption on large numbers of packets without reordering
|
||
|
those packets. The crypto developers made a point of writing padata in a
|
||
|
sufficiently general fashion that it could be put to other uses as well.
|
||
|
|
||
|
The first step in using padata is to set up a padata_instance structure for
|
||
|
overall control of how tasks are to be run::
|
||
|
|
||
|
#include <linux/padata.h>
|
||
|
|
||
|
struct padata_instance *padata_alloc(const char *name,
|
||
|
const struct cpumask *pcpumask,
|
||
|
const struct cpumask *cbcpumask);
|
||
|
|
||
|
'name' simply identifies the instance.
|
||
|
|
||
|
The pcpumask describes which processors will be used to execute work
|
||
|
submitted to this instance in parallel. The cbcpumask defines which
|
||
|
processors are allowed to be used as the serialization callback processor.
|
||
|
The workqueue wq is where the work will actually be done; it should be
|
||
|
a multithreaded queue, naturally.
|
||
|
|
||
|
To allocate a padata instance with the cpu_possible_mask for both
|
||
|
cpumasks this helper function can be used::
|
||
|
|
||
|
struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq);
|
||
|
|
||
|
Note: Padata maintains two kinds of cpumasks internally. The user supplied
|
||
|
cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable'
|
||
|
cpumasks. The usable cpumasks are always a subset of active CPUs in the
|
||
|
user supplied cpumasks; these are the cpumasks padata actually uses. So
|
||
|
it is legal to supply a cpumask to padata that contains offline CPUs.
|
||
|
Once an offline CPU in the user supplied cpumask comes online, padata
|
||
|
is going to use it.
|
||
|
|
||
|
There are functions for enabling and disabling the instance::
|
||
|
|
||
|
int padata_start(struct padata_instance *pinst);
|
||
|
void padata_stop(struct padata_instance *pinst);
|
||
|
|
||
|
These functions are setting or clearing the "PADATA_INIT" flag;
|
||
|
if that flag is not set, other functions will refuse to work.
|
||
|
padata_start returns zero on success (flag set) or -EINVAL if the
|
||
|
padata cpumask contains no active CPU (flag not set).
|
||
|
padata_stop clears the flag and blocks until the padata instance
|
||
|
is unused.
|
||
|
|
||
|
The list of CPUs to be used can be adjusted with these functions::
|
||
|
|
||
|
int padata_set_cpumasks(struct padata_instance *pinst,
|
||
|
cpumask_var_t pcpumask,
|
||
|
cpumask_var_t cbcpumask);
|
||
|
int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
|
||
|
cpumask_var_t cpumask);
|
||
|
int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask);
|
||
|
int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask);
|
||
|
|
||
|
Changing the CPU masks are expensive operations, though, so it should not be
|
||
|
done with great frequency.
|
||
|
|
||
|
It's possible to change both cpumasks of a padata instance with
|
||
|
padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask)
|
||
|
and for the serial callback function (cbcpumask). padata_set_cpumask is used to
|
||
|
change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL,
|
||
|
PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use.
|
||
|
To simply add or remove one CPU from a certain cpumask the functions
|
||
|
padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or
|
||
|
remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL.
|
||
|
|
||
|
If a user is interested in padata cpumask changes, he can register to
|
||
|
the padata cpumask change notifier::
|
||
|
|
||
|
int padata_register_cpumask_notifier(struct padata_instance *pinst,
|
||
|
struct notifier_block *nblock);
|
||
|
|
||
|
To unregister from that notifier::
|
||
|
|
||
|
int padata_unregister_cpumask_notifier(struct padata_instance *pinst,
|
||
|
struct notifier_block *nblock);
|
||
|
|
||
|
The padata cpumask change notifier notifies about changes of the usable
|
||
|
cpumasks, i.e. the subset of active CPUs in the user supplied cpumask.
|
||
|
|
||
|
Padata calls the notifier chain with::
|
||
|
|
||
|
blocking_notifier_call_chain(&pinst->cpumask_change_notifier,
|
||
|
notification_mask,
|
||
|
&pd_new->cpumask);
|
||
|
|
||
|
Here cpumask_change_notifier is registered notifier, notification_mask
|
||
|
is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer
|
||
|
to a struct padata_cpumask that contains the new cpumask information.
|
||
|
|
||
|
Actually submitting work to the padata instance requires the creation of a
|
||
|
padata_priv structure::
|
||
|
|
||
|
struct padata_priv {
|
||
|
/* Other stuff here... */
|
||
|
void (*parallel)(struct padata_priv *padata);
|
||
|
void (*serial)(struct padata_priv *padata);
|
||
|
};
|
||
|
|
||
|
This structure will almost certainly be embedded within some larger
|
||
|
structure specific to the work to be done. Most of its fields are private to
|
||
|
padata, but the structure should be zeroed at initialisation time, and the
|
||
|
parallel() and serial() functions should be provided. Those functions will
|
||
|
be called in the process of getting the work done as we will see
|
||
|
momentarily.
|
||
|
|
||
|
The submission of work is done with::
|
||
|
|
||
|
int padata_do_parallel(struct padata_instance *pinst,
|
||
|
struct padata_priv *padata, int cb_cpu);
|
||
|
|
||
|
The pinst and padata structures must be set up as described above; cb_cpu
|
||
|
specifies which CPU will be used for the final callback when the work is
|
||
|
done; it must be in the current instance's CPU mask. The return value from
|
||
|
padata_do_parallel() is zero on success, indicating that the work is in
|
||
|
progress. -EBUSY means that somebody, somewhere else is messing with the
|
||
|
instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being
|
||
|
in that CPU mask or about a not running instance.
|
||
|
|
||
|
Each task submitted to padata_do_parallel() will, in turn, be passed to
|
||
|
exactly one call to the above-mentioned parallel() function, on one CPU, so
|
||
|
true parallelism is achieved by submitting multiple tasks. parallel() runs with
|
||
|
software interrupts disabled and thus cannot sleep. The parallel()
|
||
|
function gets the padata_priv structure pointer as its lone parameter;
|
||
|
information about the actual work to be done is probably obtained by using
|
||
|
container_of() to find the enclosing structure.
|
||
|
|
||
|
Note that parallel() has no return value; the padata subsystem assumes that
|
||
|
parallel() will take responsibility for the task from this point. The work
|
||
|
need not be completed during this call, but, if parallel() leaves work
|
||
|
outstanding, it should be prepared to be called again with a new job before
|
||
|
the previous one completes. When a task does complete, parallel() (or
|
||
|
whatever function actually finishes the job) should inform padata of the
|
||
|
fact with a call to::
|
||
|
|
||
|
void padata_do_serial(struct padata_priv *padata);
|
||
|
|
||
|
At some point in the future, padata_do_serial() will trigger a call to the
|
||
|
serial() function in the padata_priv structure. That call will happen on
|
||
|
the CPU requested in the initial call to padata_do_parallel(); it, too, is
|
||
|
run with local software interrupts disabled.
|
||
|
Note that this call may be deferred for a while since the padata code takes
|
||
|
pains to ensure that tasks are completed in the order in which they were
|
||
|
submitted.
|
||
|
|
||
|
The one remaining function in the padata API should be called to clean up
|
||
|
when a padata instance is no longer needed::
|
||
|
|
||
|
void padata_free(struct padata_instance *pinst);
|
||
|
|
||
|
This function will busy-wait while any remaining tasks are completed, so it
|
||
|
might be best not to call it while there is work outstanding.
|