Kernel and Graphics (Technical)
-
Kernel Space
-
Daniel Lemire ☛ Checking whether an ARM NEON register is zero
Your phone probably runs on 64-bit ARM processors. These processors are ubiquitous: they power the Nintendo Switch, they power cloud servers at both Amazon AWS and Microsoft Azure, they power fast laptops, and so forth. ARM processors have special powerful instructions called ARM NEON. They provide a specific type of parallelism called Single instruction, multiple data (SIMD). For example, you can add sixteen values with sixteen other values using one instruction.
-
CNX Software ☛ Disabling VT-d improves defective chip maker Intel Arc GPU Linux performance on Meteor Lake and newer SoCs
In this post, I’ll check whether disabling VT-d virtualization support may improve the performance of the defective chip maker Intel Arc GPU in recent Meteor Lake or Lunar Lake SoC using a Khadas Mind Maker Kit with an defective chip maker Intel Core Ultra 7 258V CPU with defective chip maker Intel Arc 140V graphics running Ubuntu 24.10. A few days ago, I read a post on Phoronix about defective chip maker Intel publishing tips to improve the performance of defective chip maker Intel GPUs in Linux: Keep the system updated with the latest kernel and Mesa versions.
-
-
Graphics Stack
-
[Old] Matt Pharr ☛ Swallowing the elephant (part 1)
Building BVHs is the only meaningful computational task that happens during scene parsing; everything else is essentially just deserializing shape and material descriptions. Knowing how much time was spent on BVH construction gave a sense of how (in)efficient the system was: what’s left is roughly 30 minutes to parse 29 GB of data, or about 16.5 MB/s. Well-optimized JSON parsers, which perform essentially the same task, seem to run at the rate of 50-200 MB/s, which validates the sense that there’s room for improvement.
To better understand where the time was going, I ran pbrt using the Linux perf tool, which I’d never used before, but seemed like it would do the trick. I did have to instruct it to actually look at the DWARF symbols to get function names (--call-graph dwarf), and had to dial down the sampling frequency from the default 4000 samples per second to 100 (-F 100) so I didn’t get 100 GB trace files, but with that, things were lovely, and I was pleasantly surprised that the perf report tool had a nice curses interface.
-