Castanet: automatically generating a browsing structure for a collection

A debate seems to come up when folks in charge of organizing digital collections get together: standardized schema such as the Library of Congress Subject Headings are annoying to read, outdated, and rigidly hierarchical, they’re better than tags in one way — they’re organized.

And while that might be okay for library books, musem archives, and other lovingly-put-together collections, what do we do with other, more organic collections like the flickr images, all the twitter posts you’re trying to follow, or the thousands of recipes on allrecipes.com? One could imagine sitting down and creating a classification schema that would encompass each of these domains, but that would be ignoring the impossible task of assigning each blog, image, or recipe a place in the organization.

Enter Castanet (research papers here), a tool that automatically creates browsing structures from whatever metadata or data happens to be in a collection. Of course, like the topic-modeling tool LDA (an emerging favorite for humanities researchers wishing to exploit natural language processing technologies, examples here and here) , the results aren’t perfect, but they’re actually not a bad place to start. Castanet automatically carves a sub-structure from the hierarchical concept dictionary, WordNet (http://wordnet.princeton.edu), and matches items in the collection to one or many appropriate places within that hierarchy. Then, after some automated trimming and flattening, the result is a hierarchical browsing system.

It can be used with any kind of metadata. Last summer, for example, I used the algorithm to create this category system for the Flickr Commons images, just using the image tags (sometimes the link doesn’t work, check out Castanet on other collections). The category system isn’t ideal – the names of the categories are a little weird at times, and perhaps a curator would want to organize the items differently, but these operations – renaming, moving, reclassifying, are much easier than manually creating the hierarchy in the first place.

While browsing structures are a lot less sexy than automated topic models, I think that Castanet could reduce the time cost of creating nice browsing interfaces to otherwise hard-to-navigate digital collections – a step towards making them more navigable and easy-to-use for the humanities researchers that depend on them.

Castanet: automatically generating a browsing structure for a collection

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112