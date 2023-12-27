Tagged PDF funsies

posted by Roy Schestowitz on Dec 27, 2023



HTML was originally designed as a file format that merely contains the logical structure of a document. End users could format it in a way that was most suitable for them. For example people with reading disabilities could make the text bigger or even use a screen reader. As time went on web site developers wanted pixel perfect control over the layout on end user machines whether this made sense or not. This lead to inventing a side channel to control layout. Since HTML was not originally designed for visual design, this lead to an impedance mismatch which caused a lot of work and headscratching to make it work. There is no "proper" solution so problems persist to this day.

PDF was originally designed as a file format for pixel perfect layout of graphics on every conceivable device. In this way people could be sure that their design was not randomly mangled along the way. As time went on people wanted to make PDF documents more broadly usable, for example to be able to copypaste text out of them and to expose the logical structure of the document to end users to the benefit of e.g. people with disabilities. This lead to inventing a side channel to describe structure but since PDF was not originally designed for semantic content, this lead to an impedance mismatch which caused a lot of work and headscratching to make it work. There is no "proper" solution so problems persist to this day.

