I think I forgot about holes File with hole demo bio structure "Block I/O" Picture: https://lwn.net/Articles/736534/ I/O schedulers Minimize head movement merging and sorting Independent of process scheduler - Process makes request - I/O scheduler gets around to it Merge if requests are for very close sectors Sort by proximity to head - Without reversing head direction if possible - Like an elevator - Sometimes called I/O elevator The Linus Elevator: - Default in 2.4, replaced in 2.6 - New requests are checked against all others for merging - Merge onto front OR back depending - If no merging, added to sorted point in queue - If no good sorted point, add to the end of the queue - If tail is old, new requests go at the end regardless + To prevent infinite starvation The Deadline I/O scheduler: - Writes starving reads - Requests have expiration time + 500 ms for read, 5 seconds for write - Extra queue for each, sorted by time - If a request expires in the extra queue, it is serviced - Generally services requests by expiration time The Anticipatory I/O Scheduler - Heavy write load may perform badly on Deadline scheduler - Waits a few ms after performing read before resuming write + In case the application needs something else - Keeps track of process access patterns to determine wait Completely Fair Queuing I/O Scheduler - Each process has a queue - Merging is performed, across queues (duplicates, references) - Round-robin with selection of a few items from each queue - Generally works well with non-pathological cases - Intended for multimedia originally - Default I/O scheduler as of book publication Noop I/O Scheduler - Merge requests, but don't do ANY sorting - Simpler than the others - Intended for flash memory only (no head position) The Page Cache: The page cache caches pages of files in main memory It might be in L1, L2, or L3 (remember, those are a hardware thing) Remember: milli, micro, nano Block device contains the backing store Cache miss = access to a page not in the cache Cache hit = access a page that is in the cache 1 disk block *might* be 1 page Could also be different than that They're pages of files Read caching: Simple Write caching: Simple: Just update block device, invalidate cache write-through: Update both write-back (or copy-back or write-behind): Update cache, mark updated pages dirty Update the block device when we get around to it Note that by "write" we mean put a write into the block device's queue Cache Eviction: Nothing voluntarily leaves the cache! Ideal strategy: Remove page that is least used in the future LRU: Pretty good generally Remember: We're not deleting anything forever Even in my house, it's a decent metric Two-list (what Linux actually does): "inactive" and "active" lists 1 access: Into inactive more access will move pages into the active queue If the active queue is too big, pages move to inactive By reference here, not actual copying Structures: Cache page comes from a file A file may be stored in a variety of ways The address_space struct stores information about stuff in the cache radix tree: Did you cover these in 311? I haven't covered them in 311 trie A trie doesn't store keys in the nodes Think of a thing to store a dictionary for spell check Wikipedia pictures of this stuff The radix piece: Merge single children with parents Might eliminate most of the nodes Can be more branchy than binary Flusher threads and Writeback: When we need more free memory, and evict a dirty page When data is older than some threshold On sync() or fsync() Laptop mode: Flush if disk spins up for some other reason Set writeback behaviour to be slow I didn't look up if this is still at thing with SSD Don't run out of battery! In /proc/sys/vm There are multiple flusher threads these days For multiple hard drives. If you don't want to use the page cache, use O_DIRECT