Archive for April, 2008

Email HMTL standards

Friday, April 11th, 2008

I've been avoiding sending email from New Metal Army because it is SO hard to make it look good (without resorting to HMTL hacks and tables). I wish the Email Standards project good luck. Come on google... don't be evil :)


Email Standards Project - Gmail Grimaces from Mathew Patterson on Vimeo.

Microcontroller Demo Scene

Thursday, April 10th, 2008

There is something really fucking cool about making demos and there is something even more cool about making the hardware it runs on as well.

I should point out that I came across this video when reading Make Magazines RSS Feed.

New Metal Army: Overview

Thursday, April 3rd, 2008

For the last year I've been working on a project in TurboGears. Well no I can basically say version 1.0 if New Metal Army is done. It's been a long time (nearly a year) and it's far from complete but what is there is basically feature complete and it makes a nice site.

Here is a quick summary of New Metal Army:

  • Pulls together the news from the top metal and rock webites from around the world. All news is tagged and associated with appropriate bands
  • Has full gig listings for the uk rock/metal scene with full venue details and links to buy tickets
  • Brings together band details from wikipedia, flickr, youtube, musicbrainz and amazon

As it currently stands there are 2500+ bands, 400+ gigs, 100+ venues, 500,000+ band pictures and 1000000+ band videos and it's growing all the time.

I've learned a lot about the internet doing this project. Most of my time was spent researching different ways to scrape and understand websites. Here is a summary of what I learned:

  • The internet is a mess. There are a lot of sites with HTML that isn't just slightly wrong but VERY wrong.
  • BeautifulSoup goes a long way to parsing the bad markup in the internet.
  • Structural markup (bold, italic, div etc) were the least of my worries. The semantic meaning of the data is very hard to discern. Now this is obvious but when I started I didn't really think about it. I assumed that MicroFormats would come to my rescue... but no one uses them (well very few people use them). I must confess that I have a task in my Trac to add them but it's not a priority at the moment. So even I don't use them!
  • Even with MicroFormats the data I needed is tainted by human input (like most of the internet). I deal a lot with band names and bands names are often misspelt and adjectives like 'a' and 'the' added and removed.
  • Following on from spellings: people in the world still insist on using foreign languages that have funny accents and despite what you read unicode, while simple to code in python, is not simple to think in. When will everyone learn to speak local :)
  • A human algorithm for tagging things is... to tag things with a scatter gun approach. The code base looks in flickr for pictures of bands. It does this by looking at the tags associated with a picture. The trouble is when someone takes some photos at a concert they mass upload them and tag them all with the names of all the bands at the concert and sometimes with bands like the bands at the concert... not to useful.
  • Python does scale pretty well to large projects... but it's easy to get 'leaks' which make your heap grow and grow. They aren't really leaks they are normally lists of things that you are forgetting to tidy up and python is dutifully holding them for you. This is something I really didn't think about until I deployed the site and associated tools and noticed that my 'Job Runner' (threaded application that ran various jobs) just grew and grew.
  • Before you start writing anything in Python look for a module on the internet that does it. If you can't find a module to do it then write your own. However, write you own module knowing this: As soon as you have finished, somehow you will find a module that is more complete then yours on the internet and you will kick yourself for not finding it before!
  • Zombies rock... everyone likes zombie and Simon (my Zombie) is no exception to this :)

I intend to add some articles on how I did various bits and bobs over the next few weeks.