Python Pandas or BeautifulSoup, anyone?

I’m not a technical person (I struggle with WordPress, and only recently opened by first twitter account – add me). I came across Python when I was in college – back when it wasn’t “in”. It has grown rampantly over the years, and is becoming (if it hasn’t already) a prerequisite for any kind of data related job.

And most jobs, and most of everything, is now data driven. That, however, is not my motivation.

I am not seeking job marketability.

I’m just intellectually curious, and I think it’d be a good hobby when I’m actually FI :).

I picked up VBA in Excel this year, and I’m amazed at the power that has given me. I’ve enjoyed it greatly – and it has given me the confidence that maybe I can program.

I am, after all, and just like you, a logical person — more logical than some of my programming buddies.

Within Python, Pandas and BeautifulSoup are two libraries I’m keen to explore. Pandas for data analysis, and BeautifulSoup for web-scraping. A recent planet money podcast featured a guy who tracked text book prices on Amazon. I thought that was just incredible. The most popular Python libraries, I’m sure you’d like to see, are here. 

I installed Python, but couldn’t get it running. There are lots of youtube tutorials, and my issues were unique. You can follow these steps to get it running – it’s quick, it’s free and the scope is beyond anything I am aware of at the moment.

  1. Install Python. I got the last version, not the latest one (Windows x86 MSI installer)
  2. Once installed, open command prompt (click on start / windows, and type cmd)
  3. Find out where you’ve housed Python (enter “dir + directory name” to open directory and “cd..” or “cd + file name” to close and close the directory respectively)
  4. Install Pandas (“pip install pandas”). I also installed numpy, xlwing (for excel), and jupyter (IDE).
  5. If the installations err, try upgrading the pip (“pip install –upgrade setuptools” and/or “pip install –upgrade pip”) first and then try install pandas again. I was able to get pandas after upgrading pip.
  6. Open Jupyter “C:\Python34\Scripts>jupyter notebook” (or “C:\appdata\Python34\Scripts>jupyter notebook”)