PAS #2 – Web Scraping

I’ve spent a lot of time over the last few weeks learning how to code using Python.  I learned about lists, dictionaries, functions, libraries, and modules.  A couple of days ago I decided I knew enough to try and start writing my stock trading program.  The first part I decided to implement is a way to pull stock quotes off of the internet.

Pulling stock quotes off the internet turned out to be more of a pain that I had prepared for. When I was learning Python I studied this thing called an API, which stands for application programming interface.  APIs are used by websites to allow programmers to use their services in the applications they develop.  I planned to use Google to search all the stock prices, pulling the information from pages like this: http://www.google.com/finance?q=NASDAQ:GOOG.

So far my program is able to pull the HTML code from that website and save it to a document on my computer.  That was the easy part.  Now I must figure out a way to parse through the html code and pull out all of the numbers I am interested in.  I think this is going to be pretty difficult and Google’s source code is pretty difficult. You can see for yourself by going to the link I posted above, right clicking on the background of the site, and selective ‘view source code’ or your browser’s equivalent.  If you were able to get that to work you should understand why this next part is going to be difficult.  Read my blog next week to see what solution I came up with (hopefully I find one!).

Leave a Reply