Archiving note

The web is working hard to batten down the hatches—it’s hard to get Reddit into my RSS reader these days, I’m always re-authing to read feeds. Meanwhile LinkedIn is harder and harder to scrape. It’s starting to get more and more locked down. But what’s wild is that I wanted to import LinkedIn posts to this personal archive (i.e. this website), and LinkedIn makes that close to impossible to automate. So I told the LLM to start using a real browser, then had it set up VNC on the server so I could log into a bare Chromium instance, and logged in, did the captcha, and then watched it explore my LinkedIn profile. The dates are all relative (“2 weeks ago”), I’m sure it’s copy-and-pasting incorrectly, so there’s stuff to resolve there. We spent all those years building a data-driven web but the user was always caught in the crosshairs when companies would lock down their APIs; it was really hard to save your own stuff, or make custom apps, an so forth. The platforms like the lock-in, but they need to users, and it’s always a balance. However I’m treating LinkedIn as data right now—I’m indistinguishable from a human, copying and pasting from a server in New Jersey acting on my behalf. And it’s truly not abusing or doing anything wrong; I’m literally watching it spider a bit, and it will pull about 100 things.

Loading...