Tuesday, January 18, 2011

Culturomics, a portmanteau blending the words “culture” and “genomics”

The noun culturomics is a portmanteau derived from the words culture and genomics. While in genomics the sequences of DNA base pairs are studied, in culturomics the changes in meaning and frequencies of word sequences (n-grams) in written documents over time are the subjects of curiosity. John Bohannon describes how Jean-Baptiste Michel and Erez Lieberman Aiden came up with the term culturomics [1]: “In a nod to data-intensive genomics, Michel and Lieberman Aiden call this nascent field ‘culturomics’.”

This nascent field has already been applied to investigate cultural trends and linguistic phenomena in a large corpus of digitized English-language texts published between 1800 and 2000 [2,3]. The culture of culturomics can be followed via tweets at the Cultural Observatory [4]. And if you want to track the evolution of your favorite phrase or a strange word, you are invited to use the Ngram Viewer [5]. But there is no absolute warranty that your favorite book has been scanned, digitized and n-gramatized!

References and n-gram exploration
