Most recent

Real Editors Ship

tl;dr: needs editing.

Terry Richardson came over, got naked, and took pictures of my cat.

There's a useful dialogue making the rounds. A man named Tom Taylor wrote about shipping product (I picked that up off Waxy Links). He's shipped good work himself, and makes the point that getting stuff out the door is a noble thing. I agree with that. Here's the closing graf:

And the next time someone produces an antenna with a weak spot, or a sticky accelerator, you're more likely to feel their pain, listen to their words and trust their actions than the braying media who have never shipped anything in their lives.

Now as for this—well, as Taylor later pointed out (and I love blogs that point out when someone disagrees with them—very classy), this bothered his friend Bobbie Johnson. Johnson wrote:

Tom's assessment is that these people (let's call them critics) have never shipped anything and therefore can't understand what they're talking about. I'd suggest the opposite is in fact the case: the trouble is that media ships constantly, and therefore becomes inured to the difficulties and delicacies of launching a product of any size or scale.

I agree with that even more. (Well, mostly; they may be inured to the difficulties, or maybe just not impressed by them.) And I think the discussion between Taylor and Johnson brings up a good point. I remember when I used to write for All Things Considered, my editor there sent me a few pictures from the whiteboard they used to put together the show. It changed constantly throughout the day; they kept a webcam trained on it (this was a few years ago; maybe they use websockets and node.js now). There were an insane number of variables that went into creating that big hunk of nightly audio: Recordings created months ago or two hours ago; people working together in a dozen time zones; contracts, permissions, fact-checking. It had to fit together technically; it had to be transmitted efficiently at a high bitrate to maintain quality (but may be sped up or slowed down to the limits of Fourier transforms); it had to be edited to match certain durations; it had to have a certain consistency and flow; and so on. It requires the human equivalent of map-reduce to manage it. And they—meaning editors and producers—managed a release every night, with 12 million users.

People often think that editors are there to read things and tell people "no." Saying "no" is a tiny part of the job. Editors are first and foremost there to ship the product without getting sued. They order the raw materials—words, sounds, images—mill them to approved tolerances, and ship. No one wrote a book called Editors: Get Real and Ship or suggested that publishers use agile; they don't live in a "culture" of shipping, any more than we live in a culture of breathing. It's just that not shipping would kill the organism. This is not to imply that you hit every sub-deadline, that certain projects don't fail, that things don't suck. I failed plenty, myself. It just means that you ship. If it's too hard to ship or you don't want to deal with it, you quit or get fired.

I recently left zineland and did a bunch of freelance work and hooboy do people not know how to ship. A three-year project that yielded only 90-second page load; or $1.5 million down the drain with only a few microsites to show. And I've started to find myself going, God, these projects need editors. Editors are really valuable, and, the way things are going, undervalued. These are people who are good at process. They think about calendars, schedules, checklists, and get freaked out when schedules slip. Their jobs are to aggregate information, parse it, restructure it, and make sure it meets standards. They are basically QA for language and meaning.

But can they deal with character encoding issues when the parser breaks? Not really. They're often luddites of the kind that calls the mouse a clicker, even the young ones. That said, I think there're weird content times afoot. Google just acquired MetaWeb, which is not user-generated as much as user-edited content. (C.f. the Shakespeare page). Wolfram Alpha is purely about curating data sources and then calculating atop the restructured data. Wikipedia growth is slowing, but editing and tagging continue; the infoboxes are a wealth of semantic data. Meanwhile F——b—— and Tw—— (I can't bear to write those words again) continue to dump forth information by the gallon, now tagging their core objects with all manner of extra metadata. Everything is being knit together in all sorts of ways. User-generated content is still king, because it generates page views and inculcates membership (the concept of the subscription being dead, the concept of the membership being ascendant) but user-edited content is of increasing importance because of what I call, having just made it up, "the Barnes & Noble problem."

Until I was about 26 almost everything I wanted to read was in Barnes & Noble. Eventually they had less and less of what I wanted. Now B&N's a place I go before a movie, and I get my books anywhere else. I'm increasingly having B&N moments with full text search ala Google. It's just not doing the job; you have to search, then search, then search again, often within the sites themselves. The web is just too big, and Google really only can handle a small part of it. It's not anybody's fault. It's a hard, hard problem.

Remember when everyone was into the idea that Google is a media company, back in 2008 when YouTube was two? Google is not really a media company as much as a medium company. Google creates forms—i.e. structured ways of representing data—and then populates them with search results. They're the best at that. Google doesn't do the best job making it easy to edit the nodes in every case (they can when they want, though—it's easy to edit in Gmail, or upload a video), or even particularly want you to edit much of their data. Knol being the exception that proves that Knol is kind of eh. And I haven't checked in on Orkut (the 65th largest website in the world) in quite a while.

Now, though, they've bought ITA (a very interesting company that has had tons of weird database stuff going on for a while) and Metaweb. So clearly structured—meaning edited, meaning user-edited—data is now going to be a big part of the web. There are going to be all kinds of new slots and tabs and links and nodes. And whether the users want this or not, it looks like they're going to get it, and the state of NLP being what it is, not to mention NPC, humans will need to be involved. Unfortunate but true. (Then again I've been off in the high wilderness for five years; I have no clue what people think in Mountain View. I could just be blowing more smoke.)

The Semantic Web is basically the edited web, for some very nerdy take on editing. Which implies editors. Facebook has gone turtles all the way down. Django, Rails, and other frameworks make it possible to build custom-structured-and-semantic data acquisition tools with very little pain; Django's admin, in particular, is optimized for exactly that sort of thing. Solr and related technologies make it possible to search through that structured information. And nearest to my heart there's an insane glut of historical data, texts, and so forth, billions of human, historical, textual objects to come online from the millennia before the web. Plus a gaggle of history bloggers trying to contextualize it (the history bloggers are the best bloggers out there—but that's for a different day). Dealing with the glut—and we must deal with this glut, because what is more important than sorting all human endeavor into folders?—will require all manner of editing, writing, commissioning, contextualizing, and searching. (Take a look at Lapham's Quarterly to see one very successful approach, using paper and ink.) Fortunes will be made! Not mine, of course, because I lack the qualities that money likes, but someone's. History is big business.

I see three problems with my idea. First, editors and journalists are mostly luddites, as already noted, and they don't really hang out in places where you might think to hire them. (I think the Awl should have a jobs board; that would be perfect.) But I think this one can be solved: even my most technically mystified editor pals could be trained to use Freebase Gridworks. Add to that the willingness to schedule the living shit out of everything, the ability to see patterns, a total dedication to shipping, and willingness to say "no," and you start to have this very interesting source of power inside your organization, especially given the changes coming in web content, where you need structure and connections in order to play with others. Editors can help you play nice. And they actually do understand standards, at least conceptually. If you tell them the line needs to end with a semicolon they will end it with a semicolon. Words into Type and ISO 8879 are of similar complexity.

Second problem: most editors want to be editing for print or broadcast, not for the web, which is still seen as slumming it. But that said more and more of the big-deal journalism is about aggregating data. Which means that more and more journalists are getting exposed to thinking in grids and bulk-editing and so forth. Or at least getting interns to do it for them. Which is interesting. Also, getting fired or taking a buyout helps people gain perspective on what they like doing; there's that.

Third problem: I've worked on various big content engagements, and I've talked to a number of people with more big-content experience than me. And people agree that big orgs, even if they now have content problems, won't hire editors, or enough editors, to manage their content. Think: museums, non-profits, giant corporations, government. I get very sadpanda when I see someone spend $500K plus deployment, development, and licensing costs on a Java EE-based multilingual platform incorporating a JSR-238 repository with a custom workflow/process approval engine. Because they could build out something for about 20 percent of that (or sometimes 1/2 a percent of that), and hire a few editors to wrangle the content. The content, were it approached strategically, could be of far higher quality—better SEO, more durable, consistent voice, vetted for legal compliance, primed for re-use. And you can make an end-run around workflow if you add versioning and reversion capability to your text fields (like Wikipedia), give most users the ability to edit, and give the editor full revert and publish privileges. Most CMSes are parasitic technologies dedicated to preserving the cultural and hierarchical status quo of their hosts no matter the cost, literally. People hear me whine about this and they say: Our case is different; we need to have a system that sends out seven thousand "todo" emails per day. And I grieve for the spirit of Work, killed by her evil child, Workflow.

That's it. This of course is already too long because I don't have an editor. Sadly. But to summarize: Good conversation between Taylor and Johnson. Editors ship. There's no place to hire the nerdier ones because the Awl won't set up a job board. That's sad. The web is changing and it needs more editors. Do not dispute me. I love you. Goodbye.


[Top]

Parka

My friend wore a green parka. She is, like I now am, self-employed, and called me this afternoon using Skype, which I can already see, a few weeks into my new career, is going to be a problem. Behind her a cat moved, rendered as a set of small animated blocks, like something made of Scrabble tiles. "That green parka," I said. "Let me ask you a question about it."

She waved her arm to point to herself, and to the parka. That caused a problem, a stutter in the system, and then we were both trying to speak at once:

"The parka?" "—Kay—"
"—Ahead—" "—Go—"
"—So—" "—Yeah—"
"—That—" "—Parka—"

We were silent for a while, waiting.

"Go ahead," she said. "Ask your question."

This is the era for brief, frequent pauses. Pinwheels, little watches. FOUC. Vi.Me.O. The future arrives in five-second bundles, but then for the next ten seconds you're back in the past.

The 80s was the last truly futuristic decade. Skinny ties. Power, Corruption & Lies. Tass Times in Tonetown. Something about constant nuclear threat and Neuromancer. After that we kind of caught up with the future. Before, well, the future in the 70s was much goofier. Filtered cigarettes. R2D2. Kitchen appliances. People kept coming up with new kinds of magnetic tape, and new ways to change vinyl records.

I wonder if when we look back at this month of iPad if we'll think what an amazing moment to have lived through, or if it will be like some guy with sideburns telling your dad about the reel-to-reel player in his carpeted van.

.  .  .  .  .  

This man I know once took me out on his sailboat and, long story, but I had to bring the boat around alongside another boat using a rope. He said to me, as I did this: "Listen. You can't go too slow. There is no such thing as too slow. You can only go too fast." And I thought about that for a long time. It's a nice thing to think about, on the weekends, if you have a sailboat.

.  .  .  .  .  

In my novel nervous teenagers go to startup school in abandoned skyscrapers. (I like to say "In my novel..." a lot, instead of writing. I also like to organize my text-conversion pipeline. My latest idea is to port the novel to org-mode.)

.  .  .  .  .  

I want to live in a historically awesome moment. What if in the map of time this is one of the small towns? What if this is someplace we drive through to get somewhere interesting? If right now turns out to be nowhere? Then again have you messed with spatial search in Solr? Right now is turning out to be everywhere.

.  .  .  .  .  

I met an Amish inventor once. Everything he worked on turned out buggy.

.  .  .  .  .  

"My question is," I said, when the pauses settled, "is how many days in a row have you worn that parka?"

My freelancer friend thought for a moment.

"Actually," she said, "that's a very good question."

.  .  .  .  .  

Internet connections mostly fail on users 50-64-years-old, March 12, 2009, IT Facts:
All demographic groups are about equally likely to have certain devices fail them, though seniors who own cell phones are significantly less likely than younger cell phone owners to have problems with their cell phones. Just 18% of cell phone owners 65 years old and older reported that their cell phones had failed in the past year, while 26% of 50-64 year olds, 33% of 30-49 year olds and 30% of 18-29 year olds reported cell phone problems. Seniors are not as exclusively reliant on their cell phones as younger owners, and so they may have less wear and tear on their phones than do younger users who are more likely to experience cell phone failure.

We got the landline back in the new apartment. I can't tell you how happy that made me. I call people on it all the time. It's like we're in the same room. Getting older.


[Top]

I'm on a Panel at SxSW

I'll be in Austin at the interactive slice of SxSW (Where screencasts come alive!) for a few days starting this Wednesday. Mostly I'll be wandering around with a churro in my hand, muttering, but I'll also be on a panel, moderated by Jeffrey Zeldman and featuring Erin Kissane, Lisa Holton, and Mandy Brown. It's called "New Publishing and Web Content" and it's about teaching a meerkat to drive cuddling releasing a super-plague new publishing and web content. I'm working the door.

Here's information from Jeffrey regarding the panel, and an interview with Jeffrey about, among other things, the panel. Panel! (As a side-effect of this actual in-the-flesh attendance I'm sorry to say I'm not doing six-word reviews this year. I have not the proper strength.)

I've never been to SxSW before. It surprises some people when I tell them that. It also surprises people when I cry or vomit, or get into bed with them well after all the other guests have gone home. But I've never had a job where they want to spend money to send me places to learn things. I think that's a very NYC thing; ideas and talent are supposed to come to us, preferably kneeling and begging, not the other way around. This approach is why the finance and publishing industries are enjoying such great years.

So I bought myself a ticket on a jet, and if you see me say hello. I look like this as of a few weeks ago. (Caveat: The device I use to keep my head molded into a cube shape may not be allowed by TSA rules.)

If there are any webhatchets or resentments or awkwardnesses left over from the old days, I apologize. Let's just bury those and be nice. I have nothing left in me for across-the-room awkward twinges but lots of room for niceness.


[Top]

Elsewhere: Just Like Heaven

I wrote a Non-Expert for TheMorningNews.org, called "Just Like Heaven":

Question: Is there afterlife —Matt

Answer: If you ever need to make your own Grand Canyon, start with a river and lift up the earth. As the ground rises the river will carry some of it away. Wait seven million years, at which point tourists will come. Some will see eons of erosion at work; others will believe that, a mere 4,500 years back, God dragged His fingernail across the desert. Like the group of evangelical-Christian creationists that rafted through....

And it goes on from there...


[Top]

But melts just like a little girl

Bob Dylan plans to release a collection of familiar yuletide tunes... with proceeds of the album to benefit hunger-relief charities... —"Sleigh, Lady, Sleigh: Bob Dylan to Release Christmas Album," Dave Itzkoff, the New York Times


[Top]

Panel/Unicode table for you

So I've been out of it for a little while longer than I'd hoped. And I'm back here, like the world's worst ex-boyfriend, to ask for a small favor. I want to ask you to go over to Jeffrey Zeldman's website to read about a panel on which I could, should all go well, appear in March at SXSW, along with some nice people. If you're interested in it go ahead and vote for it.

Since I knew I was going to ask you for something I figured I should make you something nice. Here is a simple Unicode browser for people who like looking at characters; you can click on the number below each character to visit its Wikipedia page. Surprisingly many symbols have their own pages.

There may already be something like it out there, but I couldn't find anything quite like it, and I keep spending time poking around Unicode on Wikipedia and various other sites and finding it hard to get a sense of the whole range of options available.

There's a lot of good stuff up around 9,000. I think my favorite character, however, is ␙, #9241, the SYMBOL FOR END OF MEDIUM.

It's hacky--doesn't work in IE7. Otherwise it seems to roll along. It's all on one page (HTML/CSS/JavaScript) and under the GPL/MIT license, so if you have any big ideas go to town.


[Top]

Been a while

I've been working on something over at the dayjob. (Although I'm writing this at 2:36 AM from the office, so not just dayjob.) I tell you because it's fun, and it's free to use.

I went out the other day with some XML folks, old hands. We talked about ISO 8879, which I once photocopied in its entirety, and old issues of Creative Computing, and about Ted Nelson. I said, now that I have gained experience in key web technologies Django and SOLR, I feel I have the experimental platform I need to implement a new version of Ftrain with a new kind of story, entitled “Lost Dogs, or, the Unhappy Town.” The person I told this to, you could tell he was not buying this. He said, “I am not buying this.” There was a definite sense of trains leaving stations, boats leaving docks, bicycles unracking, respectively blowing whistles or tooting horns or tinkling bells. A part of me turned into birds and fluttered away, a flock heading to sea. They were dragging a whale. I thought, well, shit, I guess I better think about that.


[Top]

Learning to Fear the Semantic Web

Zotero is an open-sourced bibliography-management tool that runs inside Firefox-based browsers (see screencast). It helps you keep track of your research. I've enjoyed using it as I work on writing projects. From the about page:

Zotero is a production of the Center for History and New Media at George Mason University. It is generously funded by the United States Institute of Museum and Library Services, the Andrew W. Mellon Foundation, and the Alfred P. Sloan Foundation.

Nice! Except today, a good bit after the fact, I learned of a peculiar lawsuit that information and news giant Thomson Reuters Inc. filed last month against the makers of Zotero. From the website of The Chronicle of Higher Education, October 3, 2008, by Jeffrey R. Young (links added):

Thomson Reuters Inc. sued George Mason University in a Virginia court this month, arguing that a free software tool made by the university makes improper use of the company’s EndNote citation software....

Thomson Reuters argues that the latest release of George Mason’s software, which can import files created by EndNote and turn them into files that can be used and shared online using Zotero, “is willfully and intentionally destroying Thomson’s customer base for the EndNote software.” The company seeks $10-million in damages for each year the university has offered the software and to stop the university from distributing versions of Zotero that can convert EndNote files.

One person who commented on the lawsuit is Michael Feldstein, who writes a blog about online learning. He posted the following on October 5:

Apparently, the Zotero team did create their own style format and is crowd-sourcing the creation of import styles. As you can see from this Zotero developer discussion thread, the developers considered and explicitly rejected supporting the redistribution of Thomson-supplied EndNote conversion files. In fact, while Zotero can read EndNote style files, it specifically does not convert them into Zotero’s own format, in large part to discourage the redistribution (deliberately or accidentally) of Thomson-created files. What the import feature does facilitate is (a) users who have already licensed EndNote and want to migrate to Zotero can use the EndNote styles that they have already paid for, and (b) Zotero users can take advantage of the EndNote import styles that individual journal publishers (as opposed to Thomson itself) make available for the convenience of their subscribers. These uses strike me as totally within bounds.

(More is available from the Disruptive Library Technology Jester blog.)

Given my biases this lawsuit seems like an anachronistic, hamfisted attempt to block competition. While as a programmer I love being able to adapt open-source software to my particular needs, I use a mix of closed-source and open-source software without many qualms. That said, non-standard, closed-source document formats are awful stuff that block competition between software vendors and, worse, waste god-awful amounts of my time. If you wish to dispute me on this then come to my office tomorrow to help me, over the course of several hours, yank a magazine's-worth of text out of Quark XPress, using a mix of applications and balky emacs macros. (Imagine if you could take back all the time spent wrangling closed, proprietary document formats. You could finish Perl 6; you could probably write it in Arc.)

I'm not an Endnote user and I don't like to borrow trouble (which is why I've been avoiding this blog; blogging is a great way to borrow trouble). But not only does this lawsuit invoke the dread specter of legally-enforced proprietary data formats, it raises questions about Thomson Reuters's legal attitude towards the data produced by its other software offerings—including, in this case, a piece of software called OpenCalais.

OpenCalais is a web-based application that consumes text and returns special Semantic Web-style metadata that you can use to do interesting, Semantic Web-style things, like: create topic pages, improve search, or enhance local taxonomies. It has a Facebook group and its website features both video of straight-talking bearded coders and a creatively borrowed terms of service statement:

We based these Terms of Service under those released by Automattic under a Creative Commons Sharealike license. Thanks to Automattic and WordPress.com for sharing.

I have a quarter-million-page corpus at work and I'm looking for simple, inexpensive ways to enhance it, so I've followed the development of their platform for some time—joining the FaceBook group, signing up for an account, and using their free endpoint for testing (go ahead and give it a spin). My grand, entirely unrealized plan was to include a direct hook to OpenCalais in our content management system. The OpenCalais team seem trustworthy, progressive, and smart, and committed to openness. But, at least for now, the lawsuit against Zotero has scared me off using the product.

This despite, as pointed out by the Panlibus blog at Talis, in a post on OpenCalais as it relates to the Zotero lawsuit, the following statement from the OpenCalais folk:

We want to make all the world’s content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the Semantic Web or the Giant Global Graph—we call our piece of it Calais.

So why am I overreacting? Well, that “our piece of it” bit is a little tricky, but I think I get what they mean, and the Endnote people and the OpenCalais people are in different parts of a very large organization and working on different projects with different goals. But the parent company is the same, and, professionally I feel required to overreact, because in every situation—as editor, coder, designer, and so forth—I to my great regret must always concern myself with liability.

I hate that part of my job. From worrying about copyright and fair use, to questioning whether we can reuse art or prose from our own archives, to sending out cease and desists—it all fills me with gloom and despair, the sense of being a culpable cog in a lumbering legal machine. It's the opposite of creative, interesting work, but if you get something wrong the consequences can be dire, so worrying about getting sued is something that has to be done, every day, even on the subway. I'm worried about getting sued right now, sitting here, typing this. If you've had someone threaten you with a lawsuit, you know the sort of fear and second-guessing it engenders. Even if I am certain that I have followed every ethical and legal guideline, it's an instant panic attack to see the words “contacting a lawyer” or “liable for damages” in an email; it leads to second-guessing, and I know there will be phone calls, meetings, and several months of followups to comply with the needs of insurers. If I can see the shadow of a lawsuit anywhere I am obligated to shine a light upon it and freak out at least a little; otherwise I'm not doing my job.

And that's what's going on here. This recent lawsuit against George Mason/Zotero immediately brought to to mind a scenario: Thomson Reuters maintains control over the taxonomy, the thesaurus, of terms used in OpenCalais, and they do the indexing of content to associate that content with terms. The use pattern I was considering was as follows:

  1. Create text within a content management system;
  2. Send that text to OpenCalais;
  3. Store the metadata it returns;
  4. Over time, use aggregated metadata, integrated with our existing ~80,000 subjects, to create a local taxonomy for faceted search and automatically-compiled topic pages, along with other interesting interfaces.
  5. Share as much of the taxonomy as possible as downloadable RDF;
  6. Make sure to provide links back to OpenCalais wherever possible, on their terms, as defined in their Terms of Service (TOS) document.

That's probably not a big deal. I doubt anyone would even notice. But... is it at all possible, conceivable, even a tiny bit that at some point in the future Thomson Reuters could claim that we were misusing their data in step (4), above? From the TOS:

If you syndicate, publish or otherwise transmit any content containing, enhanced by or derived from Calais-generated metadata you will use your best efforts to incorporate the correct Calais-provided Globally Unique Identifier [GUID] in that content.

It seems straightforward, but that “best efforts....” The truth is, I don't really know exactly what they mean there. Also from the TOS:

You will not use any metadata or GUIDs produced by Calais to create a metadata retrieval service similar to Calais.

And could they claim that we were somehow creating a derivative work without permission and distributing it in step (5)?

I would say, based on my far-from-authoritative reading of the TOS, and given the suit against George Mason University, there is now a precedent; that is, it is within the realm of possibility that if I passed thousands of web pages through OpenCalais and decided to adapt the resultant format for my own use in a way that Thomson Reuters disliked, I could get a fat letter from some lawyer someday demanding damages, accusing me of creating a derivative work based on their proprietary taxonomy, in violation of their terms.

I'm not saying it's likely; I'm not saying I'm right; I'm not even saying that Thomson Reuters would be legally or ethically wrong to sue for damages. I would bet $10,000 right now against my fears coming to pass. But IANAL, which is exactly my problem here. And this is not a call to boycott anything, nor an attempt to get personalized service out of OpenCalais, where the developers are doing some very fine Semantic Web-bootstrapping work. I know Thomson Reuters could give a damn about me, and in that they are justified—I'm just another API key hash in their database, and even if I upgraded to their for-pay service I'd never represent more than a balance-sheet rounding error.

My only purpose in writing today is to point out how a lawsuit can have unintended chilling effects, at least for me. We're in a remarkable downturn, and people are being told to “get real or go home.” One way corporations get “real” is to sue the living shit out of everything that blinks. It's probably a good time to review the terms of service for all of your critical software to make sure you're in compliance; I wonder if a lot of Web 2.0 mashup decentralized goodwill is going to go to good-faith heaven as companies under financial strain start to look closely at their patent portfolios and vendor agreements, and decide that printing out lawsuits is even cheaper than deploying to EC2. And now that the “Semantic Web,” or “Web 3.0,” or the “Linked Data Web,” or the “Web of Really, That's How to Query Over an rdf:Bag?” or whatever they're calling it, is viable enough that you can't shrug off legal worries—now that the Semantic Web is no longer just a research project, if someone owns the taxonomy you're using and changes it up on you, what rights do you have in the matter? Who owns the GUIDs? Your honor, I just wanted to build a hierarchy of topic pages. I never meant to hurt nobody. And so forth.

To summarize: working in web publishing, I have a healthy fear of lawsuits bordering on the insanely paranoid; and I wish it were not so, but that is now part of the job, as the web of ideas has given way to the web of pricks; and finally, actions speak louder than Creative Commons-licensed terms of service. You can still get handed a subpoena while you're riding the Cluetrain.

Now that I got the fear, do I want to go to the effort to (1) educate a few people in management, none of whom would have great interest in the subject except as a soporific, about the far-fetched risks of using externally-generated taxonomies to organize our content; and do I (2) want to spend a number of hours in the near future educating myself over the completely nebulous rights issues connected to taxonomies, linking, and file formats, thus taking even more time away from code and prose to give it to the law; and do I possibly even (3) want to allocate the budget to work with a lawyer on taxonomy-related issues? All the while knowing that I'm overreacting and that this is probably pointless?

Not really. I'd rather let other people do that and read the judges' opinions. Let deeper pockets set the precedent; what I do want to do is to port the CMS to Django, an open-sourced CMS published by a foundation, get the search into Solr, also published by a foundation, and introduce hierarchy to the 80,000 subjects we already have indexed. I'm just going to put OpenCalais away for a while and start looking at DBpedia again, then see how that whole Zotero suit works out over the next few months or decades.

In one way, this is all great because I love the Semantic Web to the point of stupidity—to the point of building a custom content management system entirely based on alpha-level technology using RDF for storage, creating a framework even slower than Rails. So I'm grateful to Zotero for taking the brunt of the lawsuit, because it gave me reason to take off my rose-tinted Linked Data goggles, and made me aware that all of my planned Semantic Web taxonomy-sharing fun could come crashing down if I don't carefully track the provenance of every one of my triples, erring always on the side of raving terror.

Know what else is great? Now, finally, ten years on, I know that the Semantic Web is real and viable, because I'm afraid I'll get sued for using it. That's the true measure of a maturing technology—eat it, Gartner hype cycle.

I believe, as in don't-get-him-started, that taxonomy-driven interactive editorial is essential to the future of the web, and thus to storytelling and narrative in general. Clearly a great deal of money is being spent by major companies in pursuit of the golden triple: It appears the AP is working on taxonomy tools, and Rupert Murdoch's Dow Jones has Synaptica and publishes a cute taxonomy cookbook. A number of other companies are out there, building massive thesauri and indexing tools, hacking parsers and coding semantic disambiguators like mad, banging their heads against pronouns. There will be many, many competitors seeking to add their own structure our increasingly Web-content-driven reality, and we will, if we use their services, find ourselves beholden to their methods of indexing, with all manner of legal compliance and copyright issues as of yet untested in courts. Creating good, broad, world-describing taxonomies is extraordinarily expensive, because reality is large, and these companies will need to strike a balance between sharing their work and protecting it, so I imagine this will be a subject I'll revisit, professionally, many times over the next few decades (barring complete societal breakdown, or a personal spiritual awakening that allows me to stop thinking about this sort of thing).

Such questions could keep a librarian up at night, staring at the wall, petting his or her sleek gray cat Otlet and wondering what, for instance, a political campaign looks like when all of the news and columns are automatically classified before being published. Competition, he or she might conclude, must be encouraged between these platforms; there must be a free, and yet somehow regulated (perhaps by the W3C, or preferably by an organization with a more attractive website), market of taxonomies—you can't have people claiming to own concepts conjoined to unique identifiers, can you? Can you? You probably can? Oh.

But there's likely no reason to worry; and I am just borrowing trouble; and maybe the Semantic Web won't matter that much after all. Even if taxonomies do become increasingly important in our web of linked data, thank God we live in a society with an enlightened understanding of intellectual property, and that we can trust the tiny handful of organizations that control the world's supply of news, as they become software providers as well as content providers, to do the right thing when it comes to serving the needs of a wider populace, in a culture that would rather foster dialogue, discussion, and mutually beneficial resolutions than use the ugly, blunt tool of potentially profitable lawsuits. I'm sure—really, I am—that mine is an overreaction. And onward, to progress.


[Top]

Fixed


[Top]

Ftrain.com

PEEK

Ftrain.com is the website of Paul Ford and his pseudonyms.

There is a Facebook group.

And six-words-only Twitter posts.

See also: Gary Benchley, Rock Star, a novel; Harper's Magazine; NPR's All Things Considered; The Morning News.

POKE


Syndicate: RSS1.0, RSS2.0
Links: RSS1.0, RSS2.0

Contact

© 1974-2007 Paul Ford

Recent

Real Editors Ship, by Paul Ford. tl;dr: needs editing. (July 20)

Parka. (April 21)

I'm on a Panel at SxSW. (March 8)

Elsewhere: Just Like Heaven. (January 11)

But melts just like a little girl. (August 26)

Panel/Unicode table for you. (August 21)

Been a while. (February 16)

Learning to Fear the Semantic Web, by Paul Ford. (October 15)

Fixed. (September 18)

NYU. (September 18)

Also. (September 11)

Steering Wheel. (September 11)

I never told you because I was kind of out of it for a while there but. (April 1)

Sasquatch. (March 26)

Over There. (March 24)

Signs. (March 21)

Eloquence Personified. (March 20)

Note. I wonder what the poor folks are doing tonight. (March 20)

The Wind Chest, by Paul Ford. (March 18)

Six-Word Reviews of 763 SXSW Mp3s. (March 13)

More...
Tables of Contents

News

In the past

Sunday, July 20, 2003

One's Self I Sing, by Walt Whitman.

Monday, July 20, 1998

20 Jul 98, by Paul Ford. Poetry in Perspective I

Popularity contest

August 2009: How Google beat Amazon and Ebay to the Semantic Web

Colgate Money Shot

Pissing my Pants at Work

Selections from My Name is Blanket, © 2046 Blanket Jackson

Story

About Ftrain.com

Ford, Paul Edmund

Theory

Robot Exclusion Protocol

Ftrain FAQ

Until the Water Boils

Shaving the Eyebrows

The Condiment War

The Passivator

Looking for Something Stable

A Response to Clay Shirky's “The Semantic Web, Syllogism, and Worldview”

Cleaning My Room