Rethinking multi-grain timestamps
One of the significant features added to the mainline kernel during the 6.6 merge window was multi-grain timestamps, which allow the kernel to selectively store file modification times with higher resolution without hurting performance. Unfortunately, this feature also caused some surprising regressions, and was quickly ushered back out of the kernel as a result. It is instructive to look at how this feature went wrong, and how the developers involved plan to move forward from here.
Filesystems maintain a number of timestamps to record when each file was modified, had its metadata changed, or was accessed (though access-time updates are often turned off for performance reasons). The resolution of these timestamps is relatively coarse, measured in milliseconds; that is usually good enough for users of that information. In certain cases, though, higher resolution is needed; a prominent case is serving files via NFS. Modern NFS protocols can cache file contents aggressively for performance, but those caches must be discarded when the underlying file is modified. One way of informing clients of modifications is through the modification timestamp, but that only works if the resolution of the timestamp is sufficient to reflect frequent changes.
In theory, recording timestamps at higher resolutions is straightforward, as long as filesystems have space for the extra data. The strength of higher-resolution data is also a problem, though; a low-resolution timestamp will change relatively infrequently, but a timestamp that changes more often must be written back to the filesystem more often. That can increase I/O rates, especially for filesystems that perform journaling, where each metadata update must go through the journal as well. The cost of increased resolution is significant, which is especially problematic since the higher-resolution data will almost never be used.