The U.S. Library of Congress announced earlier today that they would be acquiring the entire archive of tweets from Twitter. Not surprisingly, the announcement was accompanied by a tweet whose viral retweeting might explain why their blog is down at the time of this writing.
The Library of Congress says that they will be acquiring all tweets, right back to the first one from co-founder Jack Dorsey in Mar 2006. (Note: If the Library of Congress blog containing the anouncement is still down, see LOC’s Facebook Page for more information, or go directly to their note on Facebook.) As Ars Technica amusingly points out, this includes all those tweets about your bout with “crocodile flu.” That includes any griping, whining, and trash talking, now officially archived for history.
A press release is to follow this announcement, so it’s not clear yet what the technical details of the archive are, how often it’ll be updated, whether there’ll be social graph info, or any interface features similar to Google Search’s new Twitter replay feature, which lets you filter Twitter search results by keyword and timeline. As with Google’s offering, this LOC archive has immense value to social researchers, and possibly even for fine-tuning predictive analysis models, such as the one two HP Labs researchers built to determine the box office potential of a theatrically-released movie. Hopefully, too, this archiving by LOC, Google and others will reduce some of the load on Twitter’s site, which is renowned for its “fail whale” image when it’s unavailable.
This Twitter archive is not the only digital asset that that the Library of Congress has. In fact, they’ve accumulated nearly 170TB (TeraBytes) of “web-based information, including legal blogs, websites of candidates for national office, and websites for Members of Congress.” A side project is the National Digital Information Infrastructure and Preservation Program (NDIIPP) at digitalpreservation.gov.