GNU/Linux and Free Software Leftovers
GNU/Linux
-
LWN ☛ Fighting the AI scraperbot scourge
There are many challenges involved with running a web site like LWN. Some of them, such as finding the courage to write for people who know more about the subject matter than we do, simply come with the territory we have chosen. But others show up as an unwelcome surprise; the ongoing task of fending off bots determined to scrape the entire Internet to (seemingly) feed into the insatiable meat grinder of AI training is certainly one of those. Readers have, at times, expressed curiosity about that fight and how we are handling it; read on for a description of a modern-day plague.
Training the models for the generative AI systems that, we are authoritatively informed, are going to transform our lives for the better requires vast amounts of data. The most prominent companies working in this area have made it clear that they feel an unalienable entitlement to whatever data they can get their virtual hands on. But that is just the companies that are being at least slightly public about what they are doing. With no specific examples to point to, I nonetheless feel quite certain that, for every company working in the spotlight, there are many others with model-building programs that they are telling nobody about. Strangely enough, these operations do not seem to talk to each other or share the data they pillage from sites across the net.
The LWN content-management system contains over 750,000 items (articles, comments, security alerts, etc) dating back to the adoption of the "new" site code in 2002. We still have, in our archives, everything we did in the over four years we operated prior to the change as well. In addition, the mailing-list archives contain many hundreds of thousands of emails. All told, if you are overcome by an irresistible urge to download everything on the site, you are going to have to generate a vast amount of traffic to obtain it all. If you somehow feel the need to do this download repeatedly, just in case something changed since yesterday, your traffic will be multiplied accordingly. Factor in some unknown number of others doing the same thing, and it can add up to an overwhelming amount of traffic.
LWN is not served by some massive set of machines just waiting to keep the scraperbots happy. The site is, we think, reasonably efficiently written, and is generally responsive. But when traffic spikes get large enough, the effects will be felt by our readers; that is when we start to get rather grumpier than usual. And it is not just us; this problem has been felt by maintainers of resources all across our community and beyond.
-
Applications
-
Ubuntu Handbook ☛ NetBeans 25 Released! PHP 8.4 Support & Parallel Test with Gradle
Apache NetBeans announced new 25 release a few days ago. Here are the new features and how to install guide for Ubuntu. NetBeans 25 was released after 2 release candidates.
-
-
Distributions and Operating Systems
-
Debian Family
-
DEV Community ☛ Divine Attah-Ohiemi: Re-styling Debian's Download Page
main points from this blog post: [...]
-
-
-
Free, Libre, and Open Source Software
-
Linuxiac ☛ VLC Celebrates 20 Years by Sending Videos to the Moon
Believe it or not, to celebrate its 20th anniversary, VLC—the renowned media player used by millions worldwide—has announced a truly out-of-this-world project: sending user-submitted videos to the Moon.
This ambitious endeavor, called VLC Lunar Time Capsule, will hitch a ride on Griffin, the lander from NASA’s Artemis program and the first commercial lunar flight slated for late 2025.
-
Libre Arts ☛ librearts Weekly recap — 23 February 2025
This has been a very CAD-flavored week. Highlights: new releases of Tahoma2D, JupyterCAD, KiCAD; first builds of Inkscape with CMYK-capable exporting are available; FreeCAD Project Association announces 2025 grant program.
-
Standards/Consortia
-
APNIC ☛ Measuring DNS root servers under change
The Domain Name System (DNS) is an enormous, hierarchical, and distributed database. At the top of this hierarchy, the root zone, served by the root servers, provides the (logical) starting point for all name resolutions. As most Internet applications rely on the DNS, the resilience of these root servers is critical for the functioning of the Internet.
Luckily, the root server system (RSS) is a prime example of a resilient system. This resilience is achieved through diversity and redundancy measures, such as: [...]
-
-