198 lines
7.4 KiB
Plaintext
198 lines
7.4 KiB
Plaintext
|
Queue sysfs files
|
||
|
=================
|
||
|
|
||
|
This text file will detail the queue files that are located in the sysfs tree
|
||
|
for each block device. Note that stacked devices typically do not export
|
||
|
any settings, since their queue merely functions are a remapping target.
|
||
|
These files are the ones found in the /sys/block/xxx/queue/ directory.
|
||
|
|
||
|
Files denoted with a RO postfix are readonly and the RW postfix means
|
||
|
read-write.
|
||
|
|
||
|
add_random (RW)
|
||
|
----------------
|
||
|
This file allows to turn off the disk entropy contribution. Default
|
||
|
value of this file is '1'(on).
|
||
|
|
||
|
dax (RO)
|
||
|
--------
|
||
|
This file indicates whether the device supports Direct Access (DAX),
|
||
|
used by CPU-addressable storage to bypass the pagecache. It shows '1'
|
||
|
if true, '0' if not.
|
||
|
|
||
|
discard_granularity (RO)
|
||
|
-----------------------
|
||
|
This shows the size of internal allocation of the device in bytes, if
|
||
|
reported by the device. A value of '0' means device does not support
|
||
|
the discard functionality.
|
||
|
|
||
|
discard_max_hw_bytes (RO)
|
||
|
----------------------
|
||
|
Devices that support discard functionality may have internal limits on
|
||
|
the number of bytes that can be trimmed or unmapped in a single operation.
|
||
|
The discard_max_bytes parameter is set by the device driver to the maximum
|
||
|
number of bytes that can be discarded in a single operation. Discard
|
||
|
requests issued to the device must not exceed this limit. A discard_max_bytes
|
||
|
value of 0 means that the device does not support discard functionality.
|
||
|
|
||
|
discard_max_bytes (RW)
|
||
|
----------------------
|
||
|
While discard_max_hw_bytes is the hardware limit for the device, this
|
||
|
setting is the software limit. Some devices exhibit large latencies when
|
||
|
large discards are issued, setting this value lower will make Linux issue
|
||
|
smaller discards and potentially help reduce latencies induced by large
|
||
|
discard operations.
|
||
|
|
||
|
hw_sector_size (RO)
|
||
|
-------------------
|
||
|
This is the hardware sector size of the device, in bytes.
|
||
|
|
||
|
io_poll (RW)
|
||
|
------------
|
||
|
When read, this file shows whether polling is enabled (1) or disabled
|
||
|
(0). Writing '0' to this file will disable polling for this device.
|
||
|
Writing any non-zero value will enable this feature.
|
||
|
|
||
|
io_poll_delay (RW)
|
||
|
------------------
|
||
|
If polling is enabled, this controls what kind of polling will be
|
||
|
performed. It defaults to -1, which is classic polling. In this mode,
|
||
|
the CPU will repeatedly ask for completions without giving up any time.
|
||
|
If set to 0, a hybrid polling mode is used, where the kernel will attempt
|
||
|
to make an educated guess at when the IO will complete. Based on this
|
||
|
guess, the kernel will put the process issuing IO to sleep for an amount
|
||
|
of time, before entering a classic poll loop. This mode might be a
|
||
|
little slower than pure classic polling, but it will be more efficient.
|
||
|
If set to a value larger than 0, the kernel will put the process issuing
|
||
|
IO to sleep for this amont of microseconds before entering classic
|
||
|
polling.
|
||
|
|
||
|
iostats (RW)
|
||
|
-------------
|
||
|
This file is used to control (on/off) the iostats accounting of the
|
||
|
disk.
|
||
|
|
||
|
logical_block_size (RO)
|
||
|
-----------------------
|
||
|
This is the logical block size of the device, in bytes.
|
||
|
|
||
|
max_hw_sectors_kb (RO)
|
||
|
----------------------
|
||
|
This is the maximum number of kilobytes supported in a single data transfer.
|
||
|
|
||
|
max_integrity_segments (RO)
|
||
|
---------------------------
|
||
|
When read, this file shows the max limit of integrity segments as
|
||
|
set by block layer which a hardware controller can handle.
|
||
|
|
||
|
max_sectors_kb (RW)
|
||
|
-------------------
|
||
|
This is the maximum number of kilobytes that the block layer will allow
|
||
|
for a filesystem request. Must be smaller than or equal to the maximum
|
||
|
size allowed by the hardware.
|
||
|
|
||
|
max_segments (RO)
|
||
|
-----------------
|
||
|
Maximum number of segments of the device.
|
||
|
|
||
|
max_segment_size (RO)
|
||
|
---------------------
|
||
|
Maximum segment size of the device.
|
||
|
|
||
|
minimum_io_size (RO)
|
||
|
--------------------
|
||
|
This is the smallest preferred IO size reported by the device.
|
||
|
|
||
|
nomerges (RW)
|
||
|
-------------
|
||
|
This enables the user to disable the lookup logic involved with IO
|
||
|
merging requests in the block layer. By default (0) all merges are
|
||
|
enabled. When set to 1 only simple one-hit merges will be tried. When
|
||
|
set to 2 no merge algorithms will be tried (including one-hit or more
|
||
|
complex tree/hash lookups).
|
||
|
|
||
|
nr_requests (RW)
|
||
|
----------------
|
||
|
This controls how many requests may be allocated in the block layer for
|
||
|
read or write requests. Note that the total allocated number may be twice
|
||
|
this amount, since it applies only to reads or writes (not the accumulated
|
||
|
sum).
|
||
|
|
||
|
To avoid priority inversion through request starvation, a request
|
||
|
queue maintains a separate request pool per each cgroup when
|
||
|
CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
|
||
|
per-block-cgroup request pool. IOW, if there are N block cgroups,
|
||
|
each request queue may have up to N request pools, each independently
|
||
|
regulated by nr_requests.
|
||
|
|
||
|
optimal_io_size (RO)
|
||
|
--------------------
|
||
|
This is the optimal IO size reported by the device.
|
||
|
|
||
|
physical_block_size (RO)
|
||
|
------------------------
|
||
|
This is the physical block size of device, in bytes.
|
||
|
|
||
|
read_ahead_kb (RW)
|
||
|
------------------
|
||
|
Maximum number of kilobytes to read-ahead for filesystems on this block
|
||
|
device.
|
||
|
|
||
|
rotational (RW)
|
||
|
---------------
|
||
|
This file is used to stat if the device is of rotational type or
|
||
|
non-rotational type.
|
||
|
|
||
|
rq_affinity (RW)
|
||
|
----------------
|
||
|
If this option is '1', the block layer will migrate request completions to the
|
||
|
cpu "group" that originally submitted the request. For some workloads this
|
||
|
provides a significant reduction in CPU cycles due to caching effects.
|
||
|
|
||
|
For storage configurations that need to maximize distribution of completion
|
||
|
processing setting this option to '2' forces the completion to run on the
|
||
|
requesting cpu (bypassing the "group" aggregation logic).
|
||
|
|
||
|
scheduler (RW)
|
||
|
--------------
|
||
|
When read, this file will display the current and available IO schedulers
|
||
|
for this block device. The currently active IO scheduler will be enclosed
|
||
|
in [] brackets. Writing an IO scheduler name to this file will switch
|
||
|
control of this block device to that new IO scheduler. Note that writing
|
||
|
an IO scheduler name to this file will attempt to load that IO scheduler
|
||
|
module, if it isn't already present in the system.
|
||
|
|
||
|
write_cache (RW)
|
||
|
----------------
|
||
|
When read, this file will display whether the device has write back
|
||
|
caching enabled or not. It will return "write back" for the former
|
||
|
case, and "write through" for the latter. Writing to this file can
|
||
|
change the kernels view of the device, but it doesn't alter the
|
||
|
device state. This means that it might not be safe to toggle the
|
||
|
setting from "write back" to "write through", since that will also
|
||
|
eliminate cache flushes issued by the kernel.
|
||
|
|
||
|
write_same_max_bytes (RO)
|
||
|
-------------------------
|
||
|
This is the number of bytes the device can write in a single write-same
|
||
|
command. A value of '0' means write-same is not supported by this
|
||
|
device.
|
||
|
|
||
|
wb_lat_usec (RW)
|
||
|
----------------
|
||
|
If the device is registered for writeback throttling, then this file shows
|
||
|
the target minimum read latency. If this latency is exceeded in a given
|
||
|
window of time (see wb_window_usec), then the writeback throttling will start
|
||
|
scaling back writes. Writing a value of '0' to this file disables the
|
||
|
feature. Writing a value of '-1' to this file resets the value to the
|
||
|
default setting.
|
||
|
|
||
|
throttle_sample_time (RW)
|
||
|
-------------------------
|
||
|
This is the time window that blk-throttle samples data, in millisecond.
|
||
|
blk-throttle makes decision based on the samplings. Lower time means cgroups
|
||
|
have more smooth throughput, but higher CPU overhead. This exists only when
|
||
|
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
|
||
|
|
||
|
Jens Axboe <jens.axboe@oracle.com>, February 2009
|