@simonw @magnetikonline I find I'm increasing the speed of exploration every time I create a subset of the data, but I can always refer back to the core data object to fetch more, and can add indexes on the JSON too. If it needs to graduate to a "real" DB, NBD. Pleasant way to work with large corpora.