Last month I attended the Digital Strategies for Heritage (DISH) conference in Rotterdam. This conference brought together representatives of ‘GLAM’ institutions (the slightly inappropriate acronym of Galleries, Libraries, Archives and Museums) from across Europe and America to discuss the need to adapt to the ever-accelerating changes in technology.
In her keynote presentation, Amber Case, a self confessed digital philosopher, asked: Have we all become cyborgs? Case raised some interesting questions about our use and subsequent reliance on mobile technology, and how our virtual existence has effectively become not just an extension, but also a parallel of our physical one.
The growing use of devices such as the smartphone and tablet computer does make me wonder how the work of the National Library will be affected in the future. Many smartphone ‘apps’ use proprietary formats, and more often than not, cloud (online) data storage. This has the potential to cause a number of problems for institutions like the National Library, who generally like to deal in open standards and physical items.
Then I started thinking about the social networking websites we all seem to use. Very few have static content that can be harvested – they’re dynamically created, user driven and ever changing. So how exactly do we go about archiving a virtual existence, such as a Twitter feed or a Facebook wall? Social networks can undoubtedly contain valuable information that could be of real significance in the future, not just as a record of an individual, but also as record of social history, news and changing trends. However, to access and disseminate the content we are often reliant on the platforms they were created on. We can’t necessarily rely on sites such as Twitter and Facebook to manage their own archives; with the pace of change so high, they may not even be here next year, never mind in 50 years. The doom mongers among us might say that despite the apparent accessibility of information in the modern world, we’re on the edge of a digital dark age.
While travelling home on the train pondering these thoughts, I stumbled upon a news item (on my iPad, via Twitter – obviously) detailing an agreement between the Library of Congress in Washington D.C. and Twitter, to archive every ‘tweet’ ever made. Phew, that’s a relief! But that’s also a significant amount of data; many billions of ‘tweets’, and increasing at the rate of around 200 million a day from its user base of around 300 million. Scary stuff, especially when you consider that this only includes ‘tweets’ that were public in the first place, and not those that were restricted to the authors’ authorised followers.
Facebook is an even scarier prospect. It has a slightly more complex way of dealing with privacy, which presents another challenge to archiving its content. Facebook also has over double the users of Twitter – 800 million (and counting), who are not just restricted to 140 character status updates, but also upload photographs, videos and private messages. Facebook have recently built another data store the size of football pitch to house the many petabytes of data created by their users; in fact they had to build an extension to it before it was even finished just to keep up!
However, before anybody starts panicking about turning the National Library’s car park into multi-storey data centre, it may be worthwhile putting this all into perspective – In the future you may have no need to actually drive to the library to access our collections; through the power of digitisation they’ll be right there in the palm of your hand.
Scott Waby

The topic of indexing/preserving everthing digital is very interesting. Not only, can it be done, but also, should it apply to everything? I can see the value of preserving some Twitter streams eg in the future it will show how society reacted to a particular topic. However, purely on a personal level, I cannot hope to read all the tweets from the modest number of people I follow. I almost felt overwhelmed by the amount of information out there, needing to be read, but then I read an interesting point of view which said they approach tweets like conversations in a room. If you join a room where there’s lots of people talking in difficult groups, you don’t run round trying to go to each group asking them what they’ve been talking about for the last half hour, hour etc, but that’s what I was trying to do on Twitter. It’s not quite the same as preserving our digital heritage, but I wonder if we need to address what we collect, although how that is decided I don’t know! Sorry for long personal ramblings!
Thanks, Scott, for a thought-provoking piece. I wonder whether, confronted with this problem of how we should collect and store vast quantities of miscellaneous popular data like the content of Facebook and Twitter (assuming we do want to collect it – and maybe my use of the word data betrays my position), we should not pay a visit to GCHQ: they certainly collect all this stuff – and analyse it. That would save the BL, NLS and NLW a whole lot of trouble post-eLD Regulation: outsourcing is the answer …