298 lines
12 KiB
ReStructuredText
298 lines
12 KiB
ReStructuredText
|
===============================================
|
||
|
The irq_domain interrupt number mapping library
|
||
|
===============================================
|
||
|
|
||
|
The current design of the Linux kernel uses a single large number
|
||
|
space where each separate IRQ source is assigned a different number.
|
||
|
This is simple when there is only one interrupt controller, but in
|
||
|
systems with multiple interrupt controllers the kernel must ensure
|
||
|
that each one gets assigned non-overlapping allocations of Linux
|
||
|
IRQ numbers.
|
||
|
|
||
|
The number of interrupt controllers registered as unique irqchips
|
||
|
show a rising tendency: for example subdrivers of different kinds
|
||
|
such as GPIO controllers avoid reimplementing identical callback
|
||
|
mechanisms as the IRQ core system by modelling their interrupt
|
||
|
handlers as irqchips, i.e. in effect cascading interrupt controllers.
|
||
|
|
||
|
Here the interrupt number loose all kind of correspondence to
|
||
|
hardware interrupt numbers: whereas in the past, IRQ numbers could
|
||
|
be chosen so they matched the hardware IRQ line into the root
|
||
|
interrupt controller (i.e. the component actually fireing the
|
||
|
interrupt line to the CPU) nowadays this number is just a number.
|
||
|
|
||
|
For this reason we need a mechanism to separate controller-local
|
||
|
interrupt numbers, called hardware irq's, from Linux IRQ numbers.
|
||
|
|
||
|
The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of
|
||
|
irq numbers, but they don't provide any support for reverse mapping of
|
||
|
the controller-local IRQ (hwirq) number into the Linux IRQ number
|
||
|
space.
|
||
|
|
||
|
The irq_domain library adds mapping between hwirq and IRQ numbers on
|
||
|
top of the irq_alloc_desc*() API. An irq_domain to manage mapping is
|
||
|
preferred over interrupt controller drivers open coding their own
|
||
|
reverse mapping scheme.
|
||
|
|
||
|
irq_domain also implements translation from an abstract irq_fwspec
|
||
|
structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
|
||
|
be easily extended to support other IRQ topology data sources.
|
||
|
|
||
|
irq_domain usage
|
||
|
================
|
||
|
|
||
|
An interrupt controller driver creates and registers an irq_domain by
|
||
|
calling one of the irq_domain_add_*() or irq_domain_create_*() functions
|
||
|
(each mapping method has a different allocator function, more on that later).
|
||
|
The function will return a pointer to the irq_domain on success. The caller
|
||
|
must provide the allocator function with an irq_domain_ops structure.
|
||
|
|
||
|
In most cases, the irq_domain will begin empty without any mappings
|
||
|
between hwirq and IRQ numbers. Mappings are added to the irq_domain
|
||
|
by calling irq_create_mapping() which accepts the irq_domain and a
|
||
|
hwirq number as arguments. If a mapping for the hwirq doesn't already
|
||
|
exist then it will allocate a new Linux irq_desc, associate it with
|
||
|
the hwirq, and call the .map() callback so the driver can perform any
|
||
|
required hardware setup.
|
||
|
|
||
|
Once a mapping has been established, it can be retrieved or used via a
|
||
|
variety of methods:
|
||
|
|
||
|
- irq_resolve_mapping() returns a pointer to the irq_desc structure
|
||
|
for a given domain and hwirq number, and NULL if there was no
|
||
|
mapping.
|
||
|
- irq_find_mapping() returns a Linux IRQ number for a given domain and
|
||
|
hwirq number, and 0 if there was no mapping
|
||
|
- irq_linear_revmap() is now identical to irq_find_mapping(), and is
|
||
|
deprecated
|
||
|
- generic_handle_domain_irq() handles an interrupt described by a
|
||
|
domain and a hwirq number
|
||
|
|
||
|
Note that irq domain lookups must happen in contexts that are
|
||
|
compatible with a RCU read-side critical section.
|
||
|
|
||
|
The irq_create_mapping() function must be called *atleast once*
|
||
|
before any call to irq_find_mapping(), lest the descriptor will not
|
||
|
be allocated.
|
||
|
|
||
|
If the driver has the Linux IRQ number or the irq_data pointer, and
|
||
|
needs to know the associated hwirq number (such as in the irq_chip
|
||
|
callbacks) then it can be directly obtained from irq_data->hwirq.
|
||
|
|
||
|
Types of irq_domain mappings
|
||
|
============================
|
||
|
|
||
|
There are several mechanisms available for reverse mapping from hwirq
|
||
|
to Linux irq, and each mechanism uses a different allocation function.
|
||
|
Which reverse map type should be used depends on the use case. Each
|
||
|
of the reverse map types are described below:
|
||
|
|
||
|
Linear
|
||
|
------
|
||
|
|
||
|
::
|
||
|
|
||
|
irq_domain_add_linear()
|
||
|
irq_domain_create_linear()
|
||
|
|
||
|
The linear reverse map maintains a fixed size table indexed by the
|
||
|
hwirq number. When a hwirq is mapped, an irq_desc is allocated for
|
||
|
the hwirq, and the IRQ number is stored in the table.
|
||
|
|
||
|
The Linear map is a good choice when the maximum number of hwirqs is
|
||
|
fixed and a relatively small number (~ < 256). The advantages of this
|
||
|
map are fixed time lookup for IRQ numbers, and irq_descs are only
|
||
|
allocated for in-use IRQs. The disadvantage is that the table must be
|
||
|
as large as the largest possible hwirq number.
|
||
|
|
||
|
irq_domain_add_linear() and irq_domain_create_linear() are functionally
|
||
|
equivalent, except for the first argument is different - the former
|
||
|
accepts an Open Firmware specific 'struct device_node', while the latter
|
||
|
accepts a more general abstraction 'struct fwnode_handle'.
|
||
|
|
||
|
The majority of drivers should use the linear map.
|
||
|
|
||
|
Tree
|
||
|
----
|
||
|
|
||
|
::
|
||
|
|
||
|
irq_domain_add_tree()
|
||
|
irq_domain_create_tree()
|
||
|
|
||
|
The irq_domain maintains a radix tree map from hwirq numbers to Linux
|
||
|
IRQs. When an hwirq is mapped, an irq_desc is allocated and the
|
||
|
hwirq is used as the lookup key for the radix tree.
|
||
|
|
||
|
The tree map is a good choice if the hwirq number can be very large
|
||
|
since it doesn't need to allocate a table as large as the largest
|
||
|
hwirq number. The disadvantage is that hwirq to IRQ number lookup is
|
||
|
dependent on how many entries are in the table.
|
||
|
|
||
|
irq_domain_add_tree() and irq_domain_create_tree() are functionally
|
||
|
equivalent, except for the first argument is different - the former
|
||
|
accepts an Open Firmware specific 'struct device_node', while the latter
|
||
|
accepts a more general abstraction 'struct fwnode_handle'.
|
||
|
|
||
|
Very few drivers should need this mapping.
|
||
|
|
||
|
No Map
|
||
|
------
|
||
|
|
||
|
::
|
||
|
|
||
|
irq_domain_add_nomap()
|
||
|
|
||
|
The No Map mapping is to be used when the hwirq number is
|
||
|
programmable in the hardware. In this case it is best to program the
|
||
|
Linux IRQ number into the hardware itself so that no mapping is
|
||
|
required. Calling irq_create_direct_mapping() will allocate a Linux
|
||
|
IRQ number and call the .map() callback so that driver can program the
|
||
|
Linux IRQ number into the hardware.
|
||
|
|
||
|
Most drivers cannot use this mapping, and it is now gated on the
|
||
|
CONFIG_IRQ_DOMAIN_NOMAP option. Please refrain from introducing new
|
||
|
users of this API.
|
||
|
|
||
|
Legacy
|
||
|
------
|
||
|
|
||
|
::
|
||
|
|
||
|
irq_domain_add_simple()
|
||
|
irq_domain_add_legacy()
|
||
|
irq_domain_create_simple()
|
||
|
irq_domain_create_legacy()
|
||
|
|
||
|
The Legacy mapping is a special case for drivers that already have a
|
||
|
range of irq_descs allocated for the hwirqs. It is used when the
|
||
|
driver cannot be immediately converted to use the linear mapping. For
|
||
|
example, many embedded system board support files use a set of #defines
|
||
|
for IRQ numbers that are passed to struct device registrations. In that
|
||
|
case the Linux IRQ numbers cannot be dynamically assigned and the legacy
|
||
|
mapping should be used.
|
||
|
|
||
|
As the name implies, the \*_legacy() functions are deprecated and only
|
||
|
exist to ease the support of ancient platforms. No new users should be
|
||
|
added. Same goes for the \*_simple() functions when their use results
|
||
|
in the legacy behaviour.
|
||
|
|
||
|
The legacy map assumes a contiguous range of IRQ numbers has already
|
||
|
been allocated for the controller and that the IRQ number can be
|
||
|
calculated by adding a fixed offset to the hwirq number, and
|
||
|
visa-versa. The disadvantage is that it requires the interrupt
|
||
|
controller to manage IRQ allocations and it requires an irq_desc to be
|
||
|
allocated for every hwirq, even if it is unused.
|
||
|
|
||
|
The legacy map should only be used if fixed IRQ mappings must be
|
||
|
supported. For example, ISA controllers would use the legacy map for
|
||
|
mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ
|
||
|
numbers.
|
||
|
|
||
|
Most users of legacy mappings should use irq_domain_add_simple() or
|
||
|
irq_domain_create_simple() which will use a legacy domain only if an IRQ range
|
||
|
is supplied by the system and will otherwise use a linear domain mapping.
|
||
|
The semantics of this call are such that if an IRQ range is specified then
|
||
|
descriptors will be allocated on-the-fly for it, and if no range is
|
||
|
specified it will fall through to irq_domain_add_linear() or
|
||
|
irq_domain_create_linear() which means *no* irq descriptors will be allocated.
|
||
|
|
||
|
A typical use case for simple domains is where an irqchip provider
|
||
|
is supporting both dynamic and static IRQ assignments.
|
||
|
|
||
|
In order to avoid ending up in a situation where a linear domain is
|
||
|
used and no descriptor gets allocated it is very important to make sure
|
||
|
that the driver using the simple domain call irq_create_mapping()
|
||
|
before any irq_find_mapping() since the latter will actually work
|
||
|
for the static IRQ assignment case.
|
||
|
|
||
|
irq_domain_add_simple() and irq_domain_create_simple() as well as
|
||
|
irq_domain_add_legacy() and irq_domain_create_legacy() are functionally
|
||
|
equivalent, except for the first argument is different - the former
|
||
|
accepts an Open Firmware specific 'struct device_node', while the latter
|
||
|
accepts a more general abstraction 'struct fwnode_handle'.
|
||
|
|
||
|
Hierarchy IRQ domain
|
||
|
--------------------
|
||
|
|
||
|
On some architectures, there may be multiple interrupt controllers
|
||
|
involved in delivering an interrupt from the device to the target CPU.
|
||
|
Let's look at a typical interrupt delivering path on x86 platforms::
|
||
|
|
||
|
Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
|
||
|
|
||
|
There are three interrupt controllers involved:
|
||
|
|
||
|
1) IOAPIC controller
|
||
|
2) Interrupt remapping controller
|
||
|
3) Local APIC controller
|
||
|
|
||
|
To support such a hardware topology and make software architecture match
|
||
|
hardware architecture, an irq_domain data structure is built for each
|
||
|
interrupt controller and those irq_domains are organized into hierarchy.
|
||
|
When building irq_domain hierarchy, the irq_domain near to the device is
|
||
|
child and the irq_domain near to CPU is parent. So a hierarchy structure
|
||
|
as below will be built for the example above::
|
||
|
|
||
|
CPU Vector irq_domain (root irq_domain to manage CPU vectors)
|
||
|
^
|
||
|
|
|
||
|
Interrupt Remapping irq_domain (manage irq_remapping entries)
|
||
|
^
|
||
|
|
|
||
|
IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
|
||
|
|
||
|
There are four major interfaces to use hierarchy irq_domain:
|
||
|
|
||
|
1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
|
||
|
controller related resources to deliver these interrupts.
|
||
|
2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
|
||
|
related resources associated with these interrupts.
|
||
|
3) irq_domain_activate_irq(): activate interrupt controller hardware to
|
||
|
deliver the interrupt.
|
||
|
4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
|
||
|
to stop delivering the interrupt.
|
||
|
|
||
|
Following changes are needed to support hierarchy irq_domain:
|
||
|
|
||
|
1) a new field 'parent' is added to struct irq_domain; it's used to
|
||
|
maintain irq_domain hierarchy information.
|
||
|
2) a new field 'parent_data' is added to struct irq_data; it's used to
|
||
|
build hierarchy irq_data to match hierarchy irq_domains. The irq_data
|
||
|
is used to store irq_domain pointer and hardware irq number.
|
||
|
3) new callbacks are added to struct irq_domain_ops to support hierarchy
|
||
|
irq_domain operations.
|
||
|
|
||
|
With support of hierarchy irq_domain and hierarchy irq_data ready, an
|
||
|
irq_domain structure is built for each interrupt controller, and an
|
||
|
irq_data structure is allocated for each irq_domain associated with an
|
||
|
IRQ. Now we could go one step further to support stacked(hierarchy)
|
||
|
irq_chip. That is, an irq_chip is associated with each irq_data along
|
||
|
the hierarchy. A child irq_chip may implement a required action by
|
||
|
itself or by cooperating with its parent irq_chip.
|
||
|
|
||
|
With stacked irq_chip, interrupt controller driver only needs to deal
|
||
|
with the hardware managed by itself and may ask for services from its
|
||
|
parent irq_chip when needed. So we could achieve a much cleaner
|
||
|
software architecture.
|
||
|
|
||
|
For an interrupt controller driver to support hierarchy irq_domain, it
|
||
|
needs to:
|
||
|
|
||
|
1) Implement irq_domain_ops.alloc and irq_domain_ops.free
|
||
|
2) Optionally implement irq_domain_ops.activate and
|
||
|
irq_domain_ops.deactivate.
|
||
|
3) Optionally implement an irq_chip to manage the interrupt controller
|
||
|
hardware.
|
||
|
4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap,
|
||
|
they are unused with hierarchy irq_domain.
|
||
|
|
||
|
Hierarchy irq_domain is in no way x86 specific, and is heavily used to
|
||
|
support other architectures, such as ARM, ARM64 etc.
|
||
|
|
||
|
Debugging
|
||
|
=========
|
||
|
|
||
|
Most of the internals of the IRQ subsystem are exposed in debugfs by
|
||
|
turning CONFIG_GENERIC_IRQ_DEBUGFS on.
|