AlphaGo Zero AI teaches itself to play Go better than any human, or other AI, ever

Google’s DeepMind team that specializes in machine learning and artificial intelligence has created an AI called AlphaGo Zero that is able to teach itself the Chinese strategy game Go. Not only that, it can teach itself so effectively that it is able to beat the previous iteration of AlphaGo that successfully beat the world’s best human players.

The previous AlphaGo was taught to play by inputting the data of how the best human players in the world played certain moves, effectively creating a compendium of the best players in the world.

AlphaGo Zero however, according to the Guardian, learnt completely differently, by being given the rules to Go, and being left to its own devices. Obviously, it started by making some pretty foolish and ill-advised moves, but quickly learnt which moves were more likely to lead to victory, and which to failure.

Carrot or the stick?

Called reinforcement learning, it sounds pretty similar to how we learn as humans, but just with massive computational power. And that’s what makes all the difference. AlphaGo Zero was able to go from complete amateur to grandmaster in a matter of days.

What’s even more interesting is that removing humans from the equation was potentially beneficial to the learning process. AlphaGo Zero was using more complex moves before simpler ones, and even started generating moves that haven’t been seen before by day three.

You can see Professor David Silver, lead researcher for AlphaGo, explaining how exciting AlphaGo Zero learning tabula rasa is below:

YouTube

Watch On

Writing in science journal Nature, DeepMind CEO Demis Hassabis said: “It discovers some best plays, [moves called] josekis, and then it goes beyond those plays and finds something even better. You can see it rediscovering thousands of years of human knowledge.”

When pitted against the 2015 version of AlphaGo, AlphaGo Zero won 100 out of 100 games. But for Hassabis and the team, getting really good at Go isn’t the end goal: “For us, AlphaGo wasn’t just about winning the game of Go, it was also a big step for us towards building these general-purpose algorithms.”

And that means an algorithm that can actually help in a number of real-world applications. The team envision a world in the not-too-distant future where AlphaGo (or its equivalent) will be able to work as a medical assistant. In fact, AlphaGo Zero is now working on figuring out how proteins fold, one of the major scientific challenges of our time.

Andrew London is a writer at Velocity Partners. Prior to Velocity Partners, he was a staff writer at Future plc.

Carrot or the stick?

Useful links