It Was Tweets

Birds flock around the Space Shuttle in 1988, NASA.
Claude helped me work it out. It was swarms gobbling tweets.
A few months ago I realized that I had stuff scattered all over the web and I wanted it in one place, especially in the age of AI grabbing and compressing the world's data into a universal blob. I want my own little brain castle, you know? I want to own my blob.
At the same time LLMs mean that absolutely prohibitive tasks, like converting a 25-year-old messy XML database, or downloading a bunch of newsletters from many different websites and cleaning them up, or setting up HTTPS, or configuring a dev server, or setting up caching, or producing RSS, are far more manageable—you don't need a framework or a CMS. Major efforts are reduced to tasks.
I didn’t intend to launch a living blog again. That's a side effect of me trying to build a good hierarchical content manager for my personal archive. In doing so I accidentally built a CMS. That's vibe coding! Once I saw that text box I had to type into it. Then I had to debug it on mobile and make it work when the subway goes underground. And now I have a blog.
Most of my external work is in the Publications section, and within there are my many terrible tweets. Apparently 23 thousand of them—50k if you count replies (I don't post those here, but I leave them in the database).
Wait, I said, could that be a miscount? That's too many tweets. It can't be that many.
And Claude said: “Peak Twitter years (2013-2017) averaged 15-23 tweets/day. That's high but not impossible if you were actively replying and engaging. The count looks legitimate - you just tweeted a lot!”
You little LLM son of a bitch with your exclamation points.
There are maybe 10 thousand other things in this website of Theseus—like blog posts, podcasts, articles, images, photos, and newsletter posts, plus Bluesky. From my point of view I was building a personal archive that knitted together the rich tapestry of my own written, cultural, and corporate output in a way that would be accessible and permanent—eventually I could even move all the code and data into the Internet Archive.
Analyzing the logs offers a different perspective. From the perspective of the modern web, I've smashed open an enormous piñata and dropped 60,000 pieces of candy into the middle of the room, and now every bot is trying to grab every single piece of candy as quickly as possible before anyone else. Doesn't matter what the candy even tastes like. The modern web consists of giant companies jamming their face into your data like Pacino with his mountain of cocaine in Scarface. In this case, it's a whole swarm of tiny Pacinos—a vast range of IPs downloading everything they could, moving across all the tweets, orchestrated by some hidden system. And since I hadn't turned on any kind of limits, they just kept going, slamajamming my database like monkeys with hammers. I have everything well-indexed—so they could get dozens of pages a second. Multiple things or actors grabbing tens of thousands of URLs as fast as they could. Data orcs.
This way you learn. With the help of a bot (probably fed by the same process) I turned on rate-limiting—nothing serious, just enough to keep the site available to humans and slow them down a bit. I made my peace with being spidered decades ago, I just don't want to spend all my time cleaning up. Detente is the only option. You can’t win. It’s a billion nanothieves.
This explains why I'd log into server and every letter I typed required a full second to show up on screen. I've also read that OpenStreetMap is getting similarly swarmed. The ridiculous thing is they make the data entirely open and easy to download. But the nature of data now is smash-and-grab, zero compensation, go, go, go. All you can do is slow them down and protect your little corner.