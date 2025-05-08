Martin Lau gave a talk in the BPF track of the 2025 Linux Storage, Filesystem, Memory-Management, and BPF Summit about a performance problem plaguing the networking subsystem, and some potential ways to fix it. He works on BPF programs that need to store socket-local data; amid other improvements to the networking and BPF subsystems, retrieving that data has become a noticeable bottleneck for his use case. His proposed fix prompted a good deal of discussion about how the data should be laid out.

One day, Lau said, Yonghong Song showed him an instruction-level profile of some kernel code from the networking subsystem. Two instructions in particular were much hotter than it seemed like they should be. In bpf_sk_storage_get() (which looks up socket-local data for a BPF program), the inline function bpf_local_storage_lookup() needs to dereference two pointers in order to retrieve the user data associated with a given socket. As it turns out, both of those pointer indirections were causing expensive cache misses.