Skip to content

because i’m a nerd, that’s why

I’ve been dutifully putting song ratings into iTunes for years now, rating each song individually according to its merit. iTunes actually died a while ago and forced me to start the entire rating process over again, but I still hope that one day I will have a fully rated music library.

While I can set up smart playlists within iTunes to get a good mix of music, it’s more interesting to have data that I can visualise. Naturally, I wrote a program to gather and interpret that data for me. Here are the (somewhat voluminous) results:

[edit 20080820: updated the results now that a more significant portion of the library has been rated]

Parsing XML... 7282 track items parsed
Building model of track/album/artist relationships... Done!
	7282 total tracks
	163 genres
	1597 artists
	2242 albums
	74 orphan tracks

Pruning library with a threshold of 5...
	Unrated tracks eliminated...
	albums with too few ratings eliminated...
	artists with too few ratings eliminated...
	genres with too few ratings eliminated...

Final cleanup of pruned library... Done!
	1372 pruned tracks
	46 genres
	93 artists
	75 albums

Average tracks per artist: 13.3669380088
Artists with the most tracks:
Red Hot Chili Peppers: 186
KMFDM: 115
Star Ocean The Second Story OST: 86
The Doobie Brothers: 80
Spoon: 74
311: 74
Insane Clown Posse: 64
Fatboy Slim: 62
Powerman 5000: 61
Nine Inch Nails: 59
Trans-Siberian Orchestra: 59
The Kleptones: 56
Pitchshifter: 54
Rage Against the Machine: 51
Cake: 50

77% of tracks have genres noted
Average tracks per genre: 9.55828220859
Genres with the most tracks:
rock: 869
Other: 783
Pop: 344
Alternative: 323
Soundtrack: 263
Electronic: 259
Metal: 147
Sound Clip: 132
Blues: 114
Game: 109
Classic Rock: 96
Techno: 92
Punk: 89
Industrial: 83
Mix CD: 81

Average albums per artist: 1.73324984346
Artists with the most albums:
The Beatles: 27
NOFX: 22
Queen: 19
Red Hot Chili Peppers: 16
Marilyn Manson: 16
311: 14
U2: 14
Dream Theater: 13
KMFDM: 13
Aerosmith: 13
Eminem: 12
Dave Matthews Band: 12
Sublime: 11
Cake: 11
Jars of Clay: 10

22% of tracks have ratings noted
Artists with the best average rating
Imogen Heap: 100.0
Buckshot LeFonque: 100.0
Splashdown: 100.0
analoq: 100.0
JET: 100.0
Stretch & Vern Present "Maddog": 100.0
The Evolution Control Committe: 100.0
Dispatch: 100.0
Ben Folds - Ben Folds: 100.0
Elastica: 100.0
Metric: 90.0
Remy Zero: 80.0
Mylo Feat. Freeform Five: 80.0
Ленинград: 80.0
川井憲次: 80.0

Genres with the best average rating
Ambient Alternative: 80.0
Revival: 80.0
Salsa: 60.0
Art Rock: 60.0
BritPop: 60.0
General Alternative: 60.0
hip stuff: 60.0
Vocal: 60.0
Folklore: 60.0
Retro: 60.0
Humor: 60.0
Broadway: 60.0
Noise: 60.0
Film Soundtrack: 60.0
Folk/Rock: 60.0

Considering only categories with at least five samples to compare between:
Artists with the best average rating
Moby: 84.0
Gorillaz: 82.0
Roisin Murphy: 80.0
Justice: 80.0
As Fast As: 80.0
Mylo: 77.1428571429
Blockhead: 76.6666666667
Daft Punk: 76.0
Heart: 76.0
Elektel: 76.0
Poe: 74.2857142857
Sara Bareilles: 74.2857142857
Vitalic: 73.3333333333
Prodigy: 73.3333333333
The Knife: 73.3333333333

Albums with the best average rating
Cross: 80.0
Discovery: 80.0
Demon Days: 80.0
Ruby Blue: 80.0
Open Letter to the Damned: 80.0
Palookaville: 80.0
Destroy Rock & Roll: 77.7777777778
Uncle Tony's Coloring Book: 76.6666666667
Space Travel with Teddybear: 76.0
Haunted: 74.2857142857
Little Voice: 74.2857142857
V Live: 73.3333333333
Hello Mom! (iTunes Version): 73.3333333333
Silent Shout: 73.3333333333
OK Cowboy: 73.3333333333

Genres with the best average rating
Electronica/Dance: 77.7777777778
Anime: 74.2857142857
Dance: 74.0
Electronic: 72.4324324324
Folk: 72.0
Unclassifiable: 70.0
RnB: 70.0
Neo-Electro: 70.0
Mix CD: 69.4736842105
Electronica: 69.3333333333
Alternative Rock: 67.7777777778
Alternative & Punk: 67.2727272727
Techno: 66.4516129032
Acapella: 66.3636363636
General Rock: 65.7142857143

Because I am a good person and like you, here is the source:

iTunesStats.py

#!/usr/env/python
"""
A set of utilities for working with iTunes XML files and generating interesting statistics therefrom.

Dependencies:
PListReader (http://www.shearersoftware.com/software/developers/plist/)
XMLFilter   (http://www.shearersoftware.com/software/developers/xmlfilter/)
path        (http://www.jorendorff.com/articles/python/path)
"""

from __future__ import division

import sys
from PListReader import PListReader
from XMLFilter import XMLFilter
from path import path
from copy import copy

alphabet = set(list('abcdefghijklmnopqrstuvwxyz'))

def load(iml=None):
if iml is None:
#yup, i'm assuming Windows here
iml = path('~/My Documents/My Music/iTunes/iTunes Music Library.xml').expand().abspath()
if not iml.exists():
#i do take into account the possibility of mac/unix users
iml = path('~/Music/iTunes/iTunes Music Library.xml').expand().abspath()
if not iml.exists():
raise IOError('Could not automatically find "iTunes Music Library.xml"')
else:
iml = path(iml).expand().abspath()

reader = PListReader()
XMLFilter.parseFilePath(iml, reader, features = reader.getRecommendedFeatures())
return reader.getResult()

class Track(object):
class Lib(object):
def __init__(self, track):
self.track = track
self.artist = None
self.album = None
self.genre = None

def __str__(self):
return u''.join([u'Track: ', unicode(self.track), u'\nArtist: ', unicode(self.artist), u'\n',
u'Album: ', unicode(self.album), u'\nGenre: ', unicode(self.genre)])

def __init__(self, tdict, library = None):
for key, val in tdict.iteritems():
self.toAttr(key, val)
keys = set(tdict.keys())
if u'Name' not in keys:
self.name = path(self.location).name
if '.' in self.name:
self.name = self.name.rpartition('.')[0]
self.name = self.name.replace('%20', ' ')
self.name = self.name.replace('_', ' ')
if u'Artist' not in keys or self.artist == 'Various':
self.artist = None
if u'Album' not in keys:
self.album = None
if u'Genre' not in keys or self.genre == 'Unknown':
self.genre = None
if u'Rating' not in keys:
self.rating = None

#these will be initialized from the outside to point to
#the object representations
self.lib = Track.Lib(self)

self.library = library
if self.library is not None:
self.setLibrary(self.library)

def __cmp__(self, other):
return cmp(self.trackID, other.trackID)

def __str__(self):
return unicode(self.name)

def __repr__(self):
return u'<track: %s="%s" -="-" %s="%s">' % (unicode(self.artist), unicode(self.name))

def toAttr(self, keyname, val):
kn = []
first = True
for i in xrange(len(keyname)):
ki = keyname[i]
if ki.lower() in alphabet:
if first:
kn.append(ki.lower())
first = False
else:
kn.append(ki)
setattr(self, ''.join(kn), val)

def setLibrary(self, library):
self.library = library

self.library.tracks.add(self)

if self.album is not None:
self.library.albums.setdefault(self.album.lower(), TrackCollection(self.album)).add(self)
self.lib.album = self.library.albums[self.album.lower()]

if self.artist is not None:
self.library.artists.setdefault(self.artist.lower(), TrackCollection(self.artist)).add(self)
self.lib.artist = self.library.artists[self.artist.lower()]
else:
self.library.orphans.add(self)

if self.genre is not None:
self.library.genres.setdefault(self.genre.lower(), TrackCollection(self.genre)).add(self)
self.lib.genre = self.library.genres[self.genre.lower()]

class TrackCollection(set):
def __init__(self, name):
self.name = name

def __cmp__(self, other):
return cmp(self.name.lower(), other.name.lower())

def __repr__(self):
return '<%s: %i Tracks>' % (self.name, len(self))

def __str__(self):
return self.name

def average(self, key=lambda track: track):
return self.sum((key(track) for track in self)) / len(self)

def sum(self, iterable, key=lambda track: track):
t = 0
for i in iterable:
try:
t += i
except TypeError:
pass
return t

class Library(object):
def __init__(self, iml=None, messages=sys.stdout, suppressAutoIML=False):
self.tracks  = set()
self.albums  = {}
self.artists = {}
self.genres  = {}
self.orphans = set()

self.messages = messages

if not suppressAutoIML:
self.initFromIML(iml)

def initFromIML(self, iml):
"""
Initialize the library from an iTunes Media Library
"""
self.pr("Parsing XML... ", newline=False)
lib = load(iml)
self.pr("%i track items parsed" % len(lib[u'Tracks']))

self.pr("Building model of track/album/artist relationships... ", newline=False)
for tid, track in lib[u'Tracks'].iteritems():
Track(track, self)
self.pr("Done!")
self.pr("	%i total tracks" % len(self.tracks))
self.pr("	%i genres" % len(self.genres))
self.pr("	%i artists" % len(self.artists))
self.pr("	%i albums" % len(self.albums))
self.pr("	%i orphan tracks" % len(self.orphans))

def pr(self, msg='', newline=True):
self.messages.write(unicode(msg).encode("utf-8"))
if newline:
self.messages.write('\n')

def most(self, collection, collectionOperation=lambda col: len(col), viewTop=15, show=False):
"""
See the most populous members of a collection.

Collection is one of "albums", "artists", "genres"
collectionOperation is a function which is performed on each collection. Defaults to lambda col: len(col),
which causes this function to return the most populous members of the collection. Other examples:
lambda col: col.average(lambda track: track.rating) causes this to return the collections with the
best average rating.
viewTop restricts the number displayed. If 0, displays all.
"""
if show:
for col, size in self.most(collection, collectionOperation, viewTop, False):
self.pr(unicode(col) + u': ' + unicode(size))
else:
col = [(collectionOperation(c), c) for c in getattr(self, collection).values()]
col.sort()
col.reverse()
return [(c, cl) for cl, c in col] if viewTop == 0 else [(c, cl) for cl, c in col][:viewTop]

def prune(self, threshold=5):
"""
Generates a copy of the library with weak members pruned out.

All unrated tracks are pruned. Then, for each collection type, each member with
fewer than threshold tracks are pruned.
"""

self.pr("Pruning library with a threshold of %i..." % threshold)

l2 = Library(messages=self.messages, suppressAutoIML=True)
for track in self.tracks:
if track.rating is not None and track.rating > 0:
t2 = copy(track)
t2.lib = Track.Lib(t2)
t2.setLibrary(l2)
self.pr("	Unrated tracks eliminated...")

for collection in ['albums', 'artists', 'genres']:
toremove = set()
for key, member in getattr(l2, collection).iteritems():
if len(member) < threshold:
toremove.add(key)
elif len([i for i in member if i.rating is not None and i.rating > 0]) < threshold:
toremove.add(key)
coll = getattr(l2, collection)
for key in toremove:
del coll[key]
setattr(l2, collection, coll)
self.pr("	%s with too few ratings eliminated..." % collection)

self.pr()
self.pr("Final cleanup of pruned library... ", False)
newtracks = set()
for collection in ['albums', 'artists', 'genres']:
for member in getattr(l2, collection).values():
for track in member:
newtracks.add(track)
l2.tracks = newtracks
self.pr("Done!")
self.pr("	%i pruned tracks" % len(l2.tracks))
self.pr("	%i genres" % len(l2.genres))
self.pr("	%i artists" % len(l2.artists))
self.pr("	%i albums" % len(l2.albums))

return l2

def main(argv=None):
if argv is None:
argv = sys.argv
iTunesLib = None
if len(argv) > 1:
iTunesLib = argv[1]
lib = Library(iTunesLib)

lib.pr()

l2 = lib.prune()

lib.pr()

#now we just run through some standard stats
lib.pr("Average tracks per artist: ", False)
spa = [len(a) for a in lib.artists]
lib.pr(sum(spa) / len(spa))
lib.pr("Artists with the most tracks:")
lib.most('artists', show=True)

lib.pr()

lib.pr("%i%% of tracks have genres noted" % int(100*(len([i for i in lib.tracks if i.genre is not None])/len(lib.tracks))))
lib.pr("Average tracks per genre: ", False)
spg = [len(g) for g in lib.genres]
lib.pr(sum(spg) / len(spg))
lib.pr("Genres with the most tracks:")
lib.most('genres', show=True)

lib.pr()

lib.pr("Average albums per artist: ", False)
lib.pr(sum((len(set((track.album for track in artist))) for artist in lib.artists.values())) / len(lib.artists))
lib.pr("Artists with the most albums:")
lib.most('artists', lambda artist: len(set((track.album for track in artist))), show=True)

lib.pr()

noratings = len([i for i in lib.tracks if i.rating is not None])
lib.pr("%i%% of tracks have ratings noted" % int(100*(noratings/len(lib.tracks))))
if noratings > 0:
lib.pr("Artists with the best average rating")
lib.most('artists', lambda col: col.average(lambda track: track.rating), show=True)

lib.pr()

lib.pr("Genres with the best average rating")
lib.most('genres', lambda col: col.average(lambda track: track.rating), show=True)

if len(l2.tracks) > 0:
lib.pr()

lib.pr("Considering only categories with at least five samples to compare between:")
lib.pr("Artists with the best average rating")
l2.most('artists', lambda col: col.average(lambda track: track.rating), show=True)

lib.pr()

lib.pr("Albums with the best average rating")
l2.most('albums', lambda col: col.average(lambda track: track.rating), show=True)

lib.pr()

lib.pr("Genres with the best average rating")
l2.most('genres', lambda col: col.average(lambda track: track.rating), show=True)

if __name__ == '__main__':
sys.exit(main())

I encourage you to post your own results.

RSS feed | Trackback URI

2 Comments »

Comment by frankg
2008-04-13 13:13:30

This is my first time with Python, but I got the following error:

Parsing XML… Traceback (most recent call last):
File “iTunesStats.py”, line 316, in
sys.exit(main())
File “iTunesStats.py”, line 253, in main
lib = Library(iTunesLib)
File “iTunesStats.py”, line 154, in __init__
self.initFromIML(iml)
File “iTunesStats.py”, line 161, in initFromIML
lib = load(iml)
File “iTunesStats.py”, line 35, in load
XMLFilter.parseFilePath(iml, reader, features = reader.getRecommendedFeatures())
AttributeError: class XMLFilter has no attribute ‘parseFilePath’

I see the parseFilePath function in the library, so I’m not sure what’s up.

 
Comment by coriolinus
2008-04-13 20:44:10

It seems I have an obsolete version of the XMLFilter library, and he’s gone and changed the interface on me. Change line 15 from

from XMLFilter import XMLFilter

to

import XMLFilter

and it should work.

Sorry about that!

 
Name
E-mail
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

Trackback responses to this post