Play the Wikipedia Game: Get to Philosophy!

Subscribe to BrokenAirplane!

It's no secret that I love Python and that I love Wikipedia. So when my former teacher and collaborator Michel Paul told me about this game I had to see if I could merge the two.

Take any random article on Wikipedia and click on the first link that is not within parenthesis or italicized, you will eventually end up on the Philosophy page.

Sounds simple enough. He was made aware of this by the webcomic XKCD which if you hover over the comic, you discover the Wikipedia Game. Not to be outdone, Wikipedia has its own page about the Get to Philosophy game where you learn this works with approximately 93% of the articles and the rest end up in a 2 page loop. There are variations of the game for multiple people or to see how many clicks are necessary to find Jesus, etc. I feel like this has a deeper meaning about the nature of knowledge or perhaps about our culture and perhaps that is why I am drawn to the problem.

So, I rolled up my sleeves and gave it a shot. I always want to improve my skills and this would allow me to get better at parsing data and working with HTML/Internet fetching through Python. I am not usually interested in the typical programming puzzles, there needs to be a context and this seemed to work for me.

I came up with a basic working program after an hour or so but I ran into a stumbling block when trying to get past the parenthesis like when I came across England, this page has a pronunciation key and parenthesis and my algorithm would not work. I tried to adjust it and even considered some Regular Expressions but I have to throw in the towel for the moment. I found that the XKCD wiki has a page where Ryan Elmquist has a beautiful script that will show you the jumps. I hope he will be kind enough to help me see what I am missing.

Usual disclaimer follows: I am not a professional programmer, I pick it up as I go so if you dislike my code help me make it better. I love to learn. Note if you are running Python 3.0, remove the "2" from urllib2 on lines 1 and 2.

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

newURL = '' #URL for a random article
#newURL = ''

def getNextURL(newURL): #Finds the first link/title (broken when it has a country)
    infile =
    page =
    mainP = page[page.find('<p>'):page.find('<p>')+500] #Find the first <p> tag for the main body
    newPage = mainP[mainP.find('<a href="/wiki/')+15:mainP.find('"',mainP.find('<a href="/wiki/')+15)] #Find the first href for the link
    return newPage

newPage = getNextURL(newURL)

counter = 0 #Keeps track of the jumps

if newPage == 'Philosophy':
    print("The Random page chosen was the Philosophy page. Isn't the universe cool?")
    print("We begin our journey on the " + newPage + " page.")
    while newPage !='Philosophy':
        newURL = '' + newPage #Creates the next link to go to based upon the first link
        newPage = getNextURL(newURL)
        print ('Now jumping to the ' + newPage + ' page.')
        counter +=1

print ('It took %d times to get to the Philosophy page on Wikipedia. Thanks Michel for the puzzle!' % counter)  

As I said, if you run the program, it runs well enough but it will fail if it comes across a page that does not play nice with how I find the first link. Hopefully the Python/CS community will come to my rescue and teach me something and help me solve the problem! I post this for education and entertainment purposes as I hope it inspires you to keep learning and solving puzzles no matter how old you are.

Subscribe to BrokenAirplane for all of the relevant technology and education news/information.