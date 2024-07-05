FEX 2407 Tagged ... with AVX! – FEX-Emu

posted by Roy Schestowitz on Jul 05, 2024



Computers traditionally perform one operation at a time. The hardware decodes an instruction, evaluates the operation on a single pair of numbers, and repeats for the next instruction. In mathematical terms, the instructions operate on scalars.

That design leaves performance on the table.

Many programs repeat one operation many times with different data. Modern instruction sets exploit that repetition. A single “vector” instruction can operate on multiple pieces of data at once. Programs will perform the same amount of arithmetic overall, but there are fewer instructions to decode and the arithmetic is more predictable. That enables more efficient hardware.

A “scalar” instruction adds a pair of numbers; a “vector” instruction adds multiple pairs. How many pairs? That is, what length is the vector?

That’s a design trade-off. Increasing the vector length decreases the number of instructions we need to execute while increasing the hardware cost. Supporting large vectors efficiently requires a large register file and many arithmetic logic units. Besides, there are diminishing returns past a certain vector length.

