@felixsalmon @gabrielsnyder Well... # of documents (11.5MM), total corpus size in words, # of proper names/nouns, # of links