'The ultimate mission is free knowledge'

Arthur Richards is a software engineer for Wikipedia

How does Wikipedia handle a whopping 700,000 donations from 150 countries?

Linux Format magazine caught up with Arthur Richards of the Wikimedia Foundation to find out.

Linux Format: Last year the Wikimedia Foundation raised $16m, and the average donation was about $20...

Arthur Richards: That's right – we had more than 700,000 contributions, and we doubled the revenue from the previous year's funding. There was a massive change in how we went about doing the fundraiser. It was a big year – pretty exciting, lots of changes.

LXF: What was the most successful strategy for getting people to donate cash?

AR: Jimmy. Just... Jimmy. It's so funny – we called it something like the 'founder appeal', something about that voice, having someone of authority, a substantial figurehead...

We actually tried doing some messaging and banners with Sue Gardner, executive director of the Wikimedia Foundation, and those performed pretty well but it was never as successful as Jimmy. Hopefully this year, things will be a little bit different, and we won't be relying so heavily on the Jimmy Wales banners.

One of our primary goals is to find something that will actually beat Jimmy – it's really starting to suffer from Jimmy fatigue! He was the butt of so many internet jokes, and people were just so sick of looking at his mug, right?

LXF: We used to joke about it in the office...

AR: We did too!

LXF: We were impressed that he seemed to speak about 40 languages. If you viewed the image alone it'd say like, jimmy_en.jpg, but you could change it to fr.jpg or de.jpg and it'd change the language every time.

AR: Yeah! This year we started testing a little bit earlier. One of our first banner tests that out-performed Jimmy just slightly was our Brandon appeal. Brandon Harris is a designer and software engineer who works for the Wikimedia Foundation – he has a very distinctive look and a wonderful way of speaking.

We got some great images of him, and we also hired a bunch of storytellers for this year's fundraiser, a team of three, who are going around and trying to analyse what works and what doesn't work in all of our messaging. They try to get stories from people involved in the Wikimedia movement, from staffers and people in the community, and try to shape them into messages we can use.

LXF: That's really creative...

AR: Yeah, and I think it really paid off. We sat down and interviewed Brandon for an hour or so and, using his own words, we managed to craft an appeal that really works. So, we think we can identify things that mean we don't have to rely so heavily on Jimmy this year.

LXF: Do you think other free software projects could learn from what you've done?

AR: I think so. After every test that we do, and throughout the fundraiser, we publish all of the information that we gather. So, you can go online and look at all the statistics, graphs and written explanations.

In a lot of ways we're unique because we have such a gigantic community – we're global, we've been around for a long time and we have lots of recognition. But, at the same time, I think some of the techniques that we used for the fundraiser could be adopted by other people to help them be more effective.

LXF: What's a typical Wikimedia box? A generic rack-mount box running LAMP (Linux, Apache, MySQL, PHP)?

AR: Yep, pretty much.

LXF: Multiplied by 10,000?

AR: Despite Wikipedia being one of the top-five most visited websites in the world, we have nowhere near the kind of server infrastructure that Google or Facebook has – we're essentially running a LAMP stack. We joke that we use LAMP on steroids because we use caching really aggressively.

We're really fortunate because more than 90% of the requests for Wikipedia articles are read-only – if we can load all of this content into a cache in memory, it doesn't take much to actually serve that up. Very little of our traffic is actually stuff that has to be written to a database, or that has to go through serious processing. So we can use lots of little tricks and clustering technologies, but all of it is totally open source.

LXF: A few years ago, when making the DVD for our magazine, we wanted to put some kind of Wikipedia snapshot on the disc – a subset of articles. We found a few random database dumps, but nothing official...

AR: I'm working on that. There's a small suite of tools that exists on the system called Toolserver, which is just a server that some of our community members have access to, for bots and scripts to mine data from Wikipedia.

Someone has written a tool that basically goes through all the Wikipedia articles and parses assessment data that exists for a lot of articles. You won't see them on every article, but on those that are very carefully watched by a certain project or group.

So you can go into this tool and say: I want to see all the B-or-above rated articles about reptiles, and get back a whole big list. Then you can go through and carefully select specific revisions, and ultimately take those articles and export them to a CSV file, which you can feed through a crazy home-grown system that some guy has, that will turn it into an openZIM file. That's a highly compressed data storage format, that you can then load into an openZIM reader, and basically have Wikipedia at your fingertips.

You can search through articles, but it's read-only at the moment. I'm actually mentoring a Google Summer of Code student who's taking those tools on the Toolserver, and porting them over to a MediaWiki extension, which will then allow people to build their own collections of articles.

Ideally, once it's actually out there as an extension, other people will be able to pick it up and expand it. We'd like people to be able to build their own custom libraries of Wikipedia articles – specific revisions and the like that they can then share with other people.

You could then take someone else's collection, amend it to make it bigger or smaller, apply certain filters to it – such as making it child-safe for instance – and ultimately be able to export those groupings of articles into some kind of offline format. That's the long-term vision.

LXF: What area really interests you in the future of Wikimedia?

AR: I'm not sure... I like it all. One of the cool things about working for the Wikimedia Foundation is it's like being a kid in a candy store. There's so much you can do, and so much that needs to be done, and not that many people doing it. You get to explore and touch lots of different aspects.

LXF: I guess that's why you get involved with a project like this in the first place – you know it's not going to be like working on a production line.

AR: Exactly. At the end of the day, the ultimate mission is free knowledge – you can do anything to further that, and that's the passion that drives almost everybody who's involved in it.

--------------------------------------------------------------------------------------------------

First published in Linux Format Issue 153

Liked this? Then check out Inside the Free Geek non-profit hardware emporium

Sign up for TechRadar's free Week in Tech newsletter
Get the top stories of the week, plus the most popular reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register

Follow TechRadar on Twitter * Find us on Facebook * Add us on Google+