Five years on, people are really struggling to archive your tweets

Archiving every tweet ever sent is not an easy task

140 characters doesn't sound like much, but when half a billion tweets are sent daily it's almost impossible to keep up. That's the conclusion of a researcher tracking the progress of an attempt by the US Library of Congress to archive the entirety of Twitter.

The Library made the pledge in 2010, back when the microblogging network was a smaller fish. Even then, there was already a four-year backlog of 21 billion tweets - each with more than 100 bits of metadata accompanying its 140 characters.

According to Michael Zimmer, who has published a paper on the library's progress, merely storing the tweets is just the first part of the problem. Providing useful means of retrieval is another tricky task - according the library's most recent update in January 2013, searching through even the 21-billion-strong backlog takes 24 hours.

Further challenges

Then there's the question of how public the archive should be - should any information be censored or restricted? Is it even ethical to hold the archive in the first place, when Twitter is generally seen as a conversational medium? Should people be allowed to opt out?

These are all questions that the library is wrestling with, and hasn't yet come up with a satisfactory answer to. "The many policy challenges — of access, restrictions, privacy, and control — remain largely unresolved," writes Zimmer. "Sufficiently addressing these policy concerns will, undoubtedly, result in further technical and practical challenges."

Meet the particle that could be crucial to future technology

Duncan Geere is TechRadar's science writer. Every day he finds the most interesting science news and explains why you should care. You can read more of his stories here, and you can find him on Twitter under the handle @duncangeere.

Further challenges

Useful links