Archive for the ‘Python’ Category

Turbogears Development: Enviroment

Saturday, April 19th, 2008

I thought I'd run through a list of tools that I used to make New Metal Army. New Metal Army is my first major website and I learned a lot along the way. Here are the tools in no particular order:

Subversion

200804080112First and foremost, subversion is a source control tool but it's also a safety net. I can't see how you can call yourself a developer if you don't have source control. From day one I got myself an account on Webfaction set up subversion and started versioning everything. They allow https connection so everything is safe and because I don't live in their data center it's off site as well :)

Trac

Trac LogoAgain on Webfaction, I installed Trac. Trac is a project management webtool. It hooks in to svn so you can see repository changes as a timeline and see checkins as colored diffs. It also has a ticket system and wiki. I found the wiki to be invaluable. It allowed me to track my thoughts in a 'low barrier to entry' fashion. It's so easy to add a wiki page, there really isn't any excuse for not doing it! The ticket/task tracking is basic but fine for small teams. There is a simple notion of a milestone and a release but the metrics for measuring progress just aren't there. For me though, this was fine. Web projects are relatively simple and single man projects don't require metrics really.

TextMate

200804080025TextMate is a text editor primarily aimed at programmers but useful for anyone who needs to edit structured documents where the content is more important then the layout. I first ran in to TextMate while I was evaluating Ruby. The Ruby community seems to be dominated by macbook pro users sitting in Starbucks, bashing code in to TextMate and buzzing on coffee. Although I decided not to use Ruby I did stick with TextMate. It really is the premier text editor for OS X. It's the first editor that has replaced emacs for me. I didn't use any extensions for TextMate but I did use PyChecker and JSLint with it to check my python and Javascript code on save.

CSSEdit

200804080026CSSEdit is an editor devoted to CSS. The cool thing is that it has live editting. So as you edit the CSS file for a site, it changes the preview of the site. This is quite frankly brilliant and very cheap compared to the time it saved me.
Combined with YUI's CSSReset I can proudly say I 'ported' New Metal Army from Firefox to Safari, IE and Opera in a few days with minimal difficulty. CSSReset should be the basis for ALL website development and Yahoo should get a fucking medal for creating it. Cheers guys :)

Pixelmator

200804080027Pixelmator is a nice image editor the supports PSD's and layers. I got it as part of a MacHeist bundle and just started using it rather then paying hundreds of quid for Photoshop. Because it supports PSD's I can still work with artists who quite rightly want to use Photoshop.

VMWare Fusion

200804080027-1Working on a mac makes it hard to get Internet Explorer working. I don't want or need a PC so VMWare comes to the rescue. VMWare is an OS virtualisation package that basically allows me to run various versions on Windows, Linux and BSD in a window. So I can run IE 5.5, 6.0 and 7 in separate virtual instances and check compatibility across IE5.5+, Safari, FireFox 2+ and Opera 9+. Since I'm a one man operation time consuming and boring tasks like this need to be made as painless as possible... or they won't get done.

Firebug

200804080029Firebug is an extension for Firefox that allows you to examine pages and page elements to see how their layout has been calculated but it also allows you to debug Javascript inside the browser. It far exceeds IE's rather forced script debugger compatibility or .NET integration. This tool saved me months of debugging and more importantly sped up the learning curve for web development.

Wingware IDE

200804080030There were times when I needed to debug fiddly python code and the python command line debugger just wasn't cutting it. I turned to Wingware's IDE for debugging and it served me very well. I only used it a few times though.


NeoOffice

200804080031Neooffice is a Mac port of Open Office. It's a good replacement for Claris or MS Office. I only needed to write a few letters to lawyers and such but it's good to not have to pay £500 for the privilege. I did use it a lot for opening up 50Mb csv files and it coped quite well with it.

Python

200804080031-1Well this is pretty obvious, TurboGears uses Python so I am forced to work with it. Well luckily python is one of the best languages to do anything in. With a vast array of modules it's really quick to create scripts to process data for your website. As an affiliate for many music companies (iTunes, HMV, Play, Amazon, ...) I get access to huge XML and CSV files with their latest prices, stock and shipping information. Because of python's flexibility it is easy to merge this data in to New Metal Army.

TurboGears

200804080110
TurboGears is a no brainer here. It's the web framework New Metal Army sits upon. For me though TG was more then a framework. TG comes with a great community bundled for free. I learned a lot just reading the mailing list. I also learned a lot by reading the code for TurboGears itself. It's well written and cleanly constructed. There is some advanced python on there but I don't think anything is overly complicated. Python is a great language to read and I think more programmers need to learn that they CAN dive in to most code and quickly work out what is going on.


PostgreSQL

200804080118
Postgres has always been my database of choice. Unfortunately I have no real reason for this other then it's always covered my needs and it is standards compliant so I expect it to cover my needs for the foreseeable future. TurboGears uses an ORM mapper (SQLAlchemy) between it and the database so I could swap Postgres for MSSQL, MySQL, Firebird, Oracle or a number of others. Postgres serves New Metal Army VERY well.

New Metal Army: Overview

Thursday, April 3rd, 2008

For the last year I've been working on a project in TurboGears. Well no I can basically say version 1.0 if New Metal Army is done. It's been a long time (nearly a year) and it's far from complete but what is there is basically feature complete and it makes a nice site.

Here is a quick summary of New Metal Army:

  • Pulls together the news from the top metal and rock webites from around the world. All news is tagged and associated with appropriate bands
  • Has full gig listings for the uk rock/metal scene with full venue details and links to buy tickets
  • Brings together band details from wikipedia, flickr, youtube, musicbrainz and amazon

As it currently stands there are 2500+ bands, 400+ gigs, 100+ venues, 500,000+ band pictures and 1000000+ band videos and it's growing all the time.

I've learned a lot about the internet doing this project. Most of my time was spent researching different ways to scrape and understand websites. Here is a summary of what I learned:

  • The internet is a mess. There are a lot of sites with HTML that isn't just slightly wrong but VERY wrong.
  • BeautifulSoup goes a long way to parsing the bad markup in the internet.
  • Structural markup (bold, italic, div etc) were the least of my worries. The semantic meaning of the data is very hard to discern. Now this is obvious but when I started I didn't really think about it. I assumed that MicroFormats would come to my rescue... but no one uses them (well very few people use them). I must confess that I have a task in my Trac to add them but it's not a priority at the moment. So even I don't use them!
  • Even with MicroFormats the data I needed is tainted by human input (like most of the internet). I deal a lot with band names and bands names are often misspelt and adjectives like 'a' and 'the' added and removed.
  • Following on from spellings: people in the world still insist on using foreign languages that have funny accents and despite what you read unicode, while simple to code in python, is not simple to think in. When will everyone learn to speak local :)
  • A human algorithm for tagging things is... to tag things with a scatter gun approach. The code base looks in flickr for pictures of bands. It does this by looking at the tags associated with a picture. The trouble is when someone takes some photos at a concert they mass upload them and tag them all with the names of all the bands at the concert and sometimes with bands like the bands at the concert... not to useful.
  • Python does scale pretty well to large projects... but it's easy to get 'leaks' which make your heap grow and grow. They aren't really leaks they are normally lists of things that you are forgetting to tidy up and python is dutifully holding them for you. This is something I really didn't think about until I deployed the site and associated tools and noticed that my 'Job Runner' (threaded application that ran various jobs) just grew and grew.
  • Before you start writing anything in Python look for a module on the internet that does it. If you can't find a module to do it then write your own. However, write you own module knowing this: As soon as you have finished, somehow you will find a module that is more complete then yours on the internet and you will kick yourself for not finding it before!
  • Zombies rock... everyone likes zombie and Simon (my Zombie) is no exception to this :)

I intend to add some articles on how I did various bits and bobs over the next few weeks.

FreeBSD 6.3 and Turbogears

Saturday, January 19th, 2008

I upgraded a test server to FreeBSD 6.3 (released a few days ago) and all was working well apart from my TurboGears app. I run a TurboGears instance behind mod_wsgi and it wouldn't start. Here is the error I got in http_errors.log

[Sat Jan 19 11:32:42 2008] [error] [client 207.155.93.149] mod_wsgi (pid=1292): Exception occurred within WSGI script '/home/m/release1.0/apache/turbogears.wsgi'.
[Sat Jan 19 11:32:42 2008] [error] [client 207.155.93.149] Traceback (most recent call last):
[Sat Jan 19 11:32:42 2008] [error] [client 207.155.93.149]   File "/home/m/release1.0/apache/turbogears.wsgi", line 67, in <module>
[Sat Jan 19 11:32:42 2008] [error] [client 207.155.93.149]     import turbogears
[Sat Jan 19 11:32:42 2008] [error] [client 207.155.93.149] ImportError: No module named turbogears

That's odd. I've not uninstalled TurboGears and my background processes that #import TurboGears still work. Infact if I go to the python command line and type #import TurboGears it all works... bugger.

To complicate matters (in this case) I use a workingenv to contain a very specific version of TurboGears and all of it's dependencies. In order for the wsgi script to access the sandbox environment I use an excellent script which tweaks the runtime environment to include the paths in a working env. My first though is that something here had gone wrong. So I turned to prints and some basic error capture.

# Load all distributions into the working set.
from pkg_resources import working_set, Environment
 
env = Environment(root)
env.scan()
 
distributions, errors = working_set.find_plugins(env)
for dist in distributions:
    working_set.add(dist)

Printing out errors revealed:

errors:
{Amara 1.2.0.2 (/usr/home/m/tgenv1_0_32/lib/python2.5/Amara-1.2.0.2-py2.5.egg):
   DistributionNotFound(Requirement.parse('4Suite-XML>=1.0.2'),),
TGCaptcha 0.11 (/usr/home/m/tgenv1_0_32/lib/python2.5/TGCaptcha-0.11-py2.5.egg):
   DistributionNotFound(Requirement.parse('pycrypto>=2.0.1'),)}

Well I hadn't uninstalled those packages and I'm pretty sure that freebsd-update hadn't uninstalled them so where the hell have they gone! Looking in the workingenv sandbox package directory

ls -la /usr/home/m/tgenv1_0_32/lib/python2.5
4Suite_XML-1.0.2-py2.5-freebsd-6.2-RELEASE-i386.egg
Amara-1.2.0.2-py2.5.egg
BeautifulSoup-3.0.5-py2.5.egg
Cheetah-2.0.1-py2.5-freebsd-6.2-RELEASE-i386.egg
Cheetah-2.0rc8-py2.5-freebsd-6.2-RELEASE-i386.egg
CherryPy-2.2.1-py2.5.egg
...
PasteScript-1.3.6-py2.5.egg
PyProtocols-1.0a0dev_r2302-py2.5-freebsd-6.2-RELEASE-i386.egg
Routes-1.7.1-py2.5.egg
RuleDispatch-0.5a0.dev_r2306-py2.5-freebsd-6.2-RELEASE-i386.egg
SQLAlchemy-0.3.10-py2.5.egg
...
moved_aside_site.py
psycopg2-2.0.6-py2.5-freebsd-6.2-RELEASE-i386.egg
pycrypto-2.0.1-py2.5-freebsd-6.2-RELEASE-i386.egg
python_dateutil-1.3-py2.5.egg
...
setuptools.pth
simplejson-1.7.3-py2.5-freebsd-6.2-RELEASE-i386.egg
...
 

BUGGER, there are packages in there with the OS version number in that need to be updated:

easy_install -U amara
easy_install -U pycrypto
easy_install -U psycopg2
...

fixed all the problems and finally the site is up again :) So I've fixed the problem but I don't know why my other processes and the python command line worked. If anyone knows, I love to know too. Cheers.