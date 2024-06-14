This last month is finally the culmination of preparation work over the past few months of cleanups in the FEX JIT. The new register allocator has landed in FEX which is significantly better than our previous RA. Our prior implementation was meant to be a temporary solution when FEX initially started as a project and as with most temporary code, it became permanent. It was excessively slow, best case it ran in quadratic time, worst case it could take INFINITE time which resulted in significant stutters or hangs. This new implementation by Alyssa now runs in two passes in linear time, significantly improving performance and also removing a ton of bad design decisions from the first implementation.

In addition to the new RA, we also have a bunch of little optimizations spread around that improves performance all over the place. One of the bigger performance improvements for people with new hardware is enabling the AFP extension and RPRES if supported. Apple supports these in their latest SoC and the newer Cortex also supports them. This improves scalar SSE performance by quite a bit. We won’t dive in to these too much but the various optimizations can improve performance from 2% to 12% in testing. We’re marching ever closer to running applications at near native speeds now.