<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>the corioblog &#187; XML</title>
	<atom:link href="http://www.coriolinus.net/tag/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.coriolinus.net</link>
	<description>read, and be entertained</description>
	<lastBuildDate>Sat, 09 Jul 2011 19:53:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>keeping up</title>
		<link>http://www.coriolinus.net/2010/04/13/keeping-up/</link>
		<comments>http://www.coriolinus.net/2010/04/13/keeping-up/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 09:34:47 +0000</pubDate>
		<dc:creator>coriolinus</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[army]]></category>
		<category><![CDATA[civilian contractor]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[Cross-platform software]]></category>
		<category><![CDATA[Data management]]></category>
		<category><![CDATA[Database management systems]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[IBM software]]></category>
		<category><![CDATA[Microsoft SQL Server]]></category>
		<category><![CDATA[MSSQL query designer]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.coriolinus.net/?p=3053</guid>
		<description><![CDATA[Today I got to code to solve a problem. The Army&#8217;s bit of code which moved flight records from the maintenance database into the pilot records database broke, and I got to write a replacement. It&#8217;s trivial code, really: take a fairly complex SQL join generated by MSSQL, and write its results to an XML [...]]]></description>
			<content:encoded><![CDATA[<p>Today I got to code to solve a problem. The Army&#8217;s bit of code which moved flight records from the maintenance database into the pilot records database broke, and I got to write a replacement. It&#8217;s trivial code, really: take a fairly complex SQL join generated by MSSQL, and write its results to an XML file using a particular schema. Still, I got really, stupidly excited about this. </p>
<p>I also learned some things:</p>
<ol>
<li>I am very, very out of practice. More than four hours into the exercise, I was still debugging. The bugs were things like MSSQL Optional Feature Not Implemented, not actual logic errors, but still. That&#8217;s too long given the complexity of the task.</li>
<li>I have a lot of fun coding. In a basically unprecedented move, I was delaying leaving work until the guy whose office I was borrowing made me leave so he could lock up. Especially given that I&#8217;d had an 11 hour day at that point, this is a significant development.</li>
<li>For all that I rag on MS products, the MSSQL query designer really does take a lot of work out of the process of writing complex queries.</li>
</ol>
<p>Actually, 90 minutes into the exercise, a civilian contractor came by and worked magic and solved the problem for which I was writing code in the first place. I kept working, using the excuse that my version will be more featureful than the Army&#8217;s, and that by having the source to it the Army will benefit. The real reason is much simpler: I&#8217;m having way too much fun to just give this project up. I am perpetually at the 50% mark and working rapidly towards completion; I&#8217;m not going to let this just escape me. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.coriolinus.net/2010/04/13/keeping-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Protocol Buffers</title>
		<link>http://www.coriolinus.net/2008/07/08/protocol-buffers/</link>
		<comments>http://www.coriolinus.net/2008/07/08/protocol-buffers/#comments</comments>
		<pubDate>Wed, 09 Jul 2008 01:14:23 +0000</pubDate>
		<dc:creator>coriolinus</dc:creator>
				<category><![CDATA[misc.link]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[dom]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[I/O stream]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.coriolinus.net/?p=2167</guid>
		<description><![CDATA[Subtitle: The good, the bad, and the&#8230; no, wait; this is a Google project. XML and Java have the same sort of flavor to them: they&#8217;re reasonably good and very widely used; they&#8217;re the sort of product that design committees everywhere aspire to create. Their flaws only really become visible after something better comes along. [...]]]></description>
			<content:encoded><![CDATA[<p>Subtitle: The good, the bad, and the&#8230; no, wait; this is a Google project.</p>
<p>XML and Java have the same sort of flavor to them: they&#8217;re reasonably good and very widely used; they&#8217;re the sort of product that design committees everywhere aspire to create. Their flaws only really become visible after something better comes along. In Java&#8217;s case, Python demonstrated that a whole lot of the structure and required text that gives Java code its rigidity can be stripped away, leaving a language that&#8217;s a joy to develop in. However, there hasn&#8217;t been an analogous improvement on XML.</p>
<p>Until yesterday.</p>
<p>Protocol Buffers have a non-descriptive name; I had no idea what to expect when I clicked <a href="http://google-opensource.blogspot.com/2008/07/protocol-buffers-googles-data.html">the link to the announcement</a> that Google put out. As it turns out, they&#8217;re a generic data serialization format (much like XML), except without all the human-readability business that so bloats actual XML. From the announcement:</p>
<blockquote><p>Protocol Buffers allow you to define simple data structures in a special definition language, then compile them to produce classes to represent those structures in the language of your choice. These classes come complete with heavily-optimized code to parse and serialize your message in an extremely compact format. Best of all, the classes are easy to use: each field has simple &#8220;get&#8221; and &#8220;set&#8221; methods, and once you&#8217;re ready, serializing the whole thing to – or parsing it from – a byte array or an I/O stream just takes a single method call.</p></blockquote>
<p>In case you missed that, <em>all you have to write is the schema</em>. All the encoding and decoding crap that you have to wade through in XML has already been abstracted away; they generate classes to do that for you. This is, in fact, cooler than sliced bread.</p>
<p>Of course, there <a href="http://code.google.com/apis/protocolbuffers/docs/overview.html#whynotxml">do exist times</a> when XML might better serve your needs:</p>
<blockquote><p>However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the <code>.proto</code> file).</p></blockquote>
<p>In my experience, the human-readability and self-documentation inherent in XML have always been bonus features not essential to the core mission, which was getting data from Point A to Point B. However, I&#8217;ve had to spend countless hours wrangling with DOM and SAX, dealing with the problem of getting the data into and out of that intermediate form.</p>
<p>There is one wart that I noticed: you still have to create and read the Messages entirely distinctly from your own native class structure. The natural thing to do, if you want to use this to serialize and deserialize a class, would be just to put all the members into the Message definition and put the methods into a subclass of the generated class. However, that is <a href="http://code.google.com/apis/protocolbuffers/docs/reference/python-generated.html#message">expressly forbidden</a>. All is not lost, though: all you really need, at simplest, is a pair of methods like this:</p>
<pre class="brush: python">
class AClass(object):
     ...
     def toPBuff(self):
          out = AClassPBuff()
          for member in dir(self):
               if not (callable(member) or &#039;__&#039; in member or member in self.__excludeFromSerialize):
                    setattr(out, member, getattr(self, member))
          return out

     @classmethod
     def fromPBuff(cls, pBuff):
          out = AClass()
          for member in dir(out):
               if not (callable(member) or &#039;__&#039; in member or member in self.__excludeFromSerialize):
                    setattr(out, member, getattr(pBuff, member))
          return out
</pre>
<p>In short, even if only in terms of making efficient use of developer time, this is already an awesome project. Once you count in that it is also faster and slimmer than the alternatives, this becomes astonishingly cool. Expect it to be making appearances in my code from now on.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coriolinus.net/2008/07/08/protocol-buffers/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>because i&#8217;m a nerd, that&#8217;s why</title>
		<link>http://www.coriolinus.net/2008/04/12/because-im-a-nerd-thats-why/</link>
		<comments>http://www.coriolinus.net/2008/04/12/because-im-a-nerd-thats-why/#comments</comments>
		<pubDate>Sat, 12 Apr 2008 12:53:00 +0000</pubDate>
		<dc:creator>coriolinus</dc:creator>
				<category><![CDATA[geekspeak]]></category>
		<category><![CDATA[74.2857142857]]></category>
		<category><![CDATA[Aerosmith]]></category>
		<category><![CDATA[Album]]></category>
		<category><![CDATA[Artist]]></category>
		<category><![CDATA[Ben Folds]]></category>
		<category><![CDATA[Cake]]></category>
		<category><![CDATA[Chili Peppers]]></category>
		<category><![CDATA[cmp]]></category>
		<category><![CDATA[Dave Matthews Band]]></category>
		<category><![CDATA[Destroy Rock]]></category>
		<category><![CDATA[Documents/My Music]]></category>
		<category><![CDATA[Dream Theater]]></category>
		<category><![CDATA[Fatboy Slim]]></category>
		<category><![CDATA[Genre]]></category>
		<category><![CDATA[Gorillaz]]></category>
		<category><![CDATA[iml]]></category>
		<category><![CDATA[Imogen Heap]]></category>
		<category><![CDATA[India]]></category>
		<category><![CDATA[iTunes Media Library]]></category>
		<category><![CDATA[Jars of Clay]]></category>
		<category><![CDATA[KMFDM]]></category>
		<category><![CDATA[Marilyn Manson]]></category>
		<category><![CDATA[Moby]]></category>
		<category><![CDATA[Music Library]]></category>
		<category><![CDATA[My Documents/My Music]]></category>
		<category><![CDATA[Mylo Feat]]></category>
		<category><![CDATA[Mylo Vs]]></category>
		<category><![CDATA[None]]></category>
		<category><![CDATA[Open Letter to the Damned]]></category>
		<category><![CDATA[Palookaville]]></category>
		<category><![CDATA[Prodigy]]></category>
		<category><![CDATA[Pruning library]]></category>
		<category><![CDATA[Remy Zero]]></category>
		<category><![CDATA[Ruby Blue]]></category>
		<category><![CDATA[Sara Bareilles]]></category>
		<category><![CDATA[Space Travel with Teddybear]]></category>
		<category><![CDATA[Sublime]]></category>
		<category><![CDATA[The Beatles]]></category>
		<category><![CDATA[Tracks]]></category>
		<category><![CDATA[Trans-Siberian Orchestra]]></category>
		<category><![CDATA[Tsuneo Imahori]]></category>
		<category><![CDATA[Uncle Tony's Coloring Book]]></category>
		<category><![CDATA[unix]]></category>
		<category><![CDATA[Vitalic]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.coriolinus.net/2008/04/12/because-im-a-nerd-thats-why/</guid>
		<description><![CDATA[I&#8217;ve been dutifully putting song ratings into iTunes for years now, rating each song individually according to its merit. iTunes actually died a while ago and forced me to start the entire rating process over again, but I still hope that one day I will have a fully rated music library. While I can set [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been dutifully putting song ratings into iTunes for years now, rating each song individually according to its merit. iTunes actually died a while ago and forced me to start the entire rating process over again, but I still hope that one day I will have a fully rated music library.</p>
<p>While I can set up smart playlists within iTunes to get a good mix of music, it&#8217;s more interesting to have data that I can visualise. Naturally, I wrote a program to gather and interpret that data for me. Here are the (somewhat voluminous) results:<span id="more-2085"></span></p>
<p>[edit 20080820: updated the results now that a more significant portion of the library has been rated]</p>
<pre>Parsing XML... 7282 track items parsed
Building model of track/album/artist relationships... Done!
	7282 total tracks
	163 genres
	1597 artists
	2242 albums
	74 orphan tracks

Pruning library with a threshold of 5...
	Unrated tracks eliminated...
	albums with too few ratings eliminated...
	artists with too few ratings eliminated...
	genres with too few ratings eliminated...

Final cleanup of pruned library... Done!
	1372 pruned tracks
	46 genres
	93 artists
	75 albums

Average tracks per artist: 13.3669380088
Artists with the most tracks:
Red Hot Chili Peppers: 186
KMFDM: 115
Star Ocean The Second Story OST: 86
The Doobie Brothers: 80
Spoon: 74
311: 74
Insane Clown Posse: 64
Fatboy Slim: 62
Powerman 5000: 61
Nine Inch Nails: 59
Trans-Siberian Orchestra: 59
The Kleptones: 56
Pitchshifter: 54
Rage Against the Machine: 51
Cake: 50

77% of tracks have genres noted
Average tracks per genre: 9.55828220859
Genres with the most tracks:
rock: 869
Other: 783
Pop: 344
Alternative: 323
Soundtrack: 263
Electronic: 259
Metal: 147
Sound Clip: 132
Blues: 114
Game: 109
Classic Rock: 96
Techno: 92
Punk: 89
Industrial: 83
Mix CD: 81

Average albums per artist: 1.73324984346
Artists with the most albums:
The Beatles: 27
NOFX: 22
Queen: 19
Red Hot Chili Peppers: 16
Marilyn Manson: 16
311: 14
U2: 14
Dream Theater: 13
KMFDM: 13
Aerosmith: 13
Eminem: 12
Dave Matthews Band: 12
Sublime: 11
Cake: 11
Jars of Clay: 10

22% of tracks have ratings noted
Artists with the best average rating
Imogen Heap: 100.0
Buckshot LeFonque: 100.0
Splashdown: 100.0
analoq: 100.0
JET: 100.0
Stretch &amp; Vern Present "Maddog": 100.0
The Evolution Control Committe: 100.0
Dispatch: 100.0
Ben Folds - Ben Folds: 100.0
Elastica: 100.0
Metric: 90.0
Remy Zero: 80.0
Mylo Feat. Freeform Five: 80.0
Ленинград: 80.0
川井憲次: 80.0

Genres with the best average rating
Ambient Alternative: 80.0
Revival: 80.0
Salsa: 60.0
Art Rock: 60.0
BritPop: 60.0
General Alternative: 60.0
hip stuff: 60.0
Vocal: 60.0
Folklore: 60.0
Retro: 60.0
Humor: 60.0
Broadway: 60.0
Noise: 60.0
Film Soundtrack: 60.0
Folk/Rock: 60.0

Considering only categories with at least five samples to compare between:
Artists with the best average rating
Moby: 84.0
Gorillaz: 82.0
Roisin Murphy: 80.0
Justice: 80.0
As Fast As: 80.0
Mylo: 77.1428571429
Blockhead: 76.6666666667
Daft Punk: 76.0
Heart: 76.0
Elektel: 76.0
Poe: 74.2857142857
Sara Bareilles: 74.2857142857
Vitalic: 73.3333333333
Prodigy: 73.3333333333
The Knife: 73.3333333333

Albums with the best average rating
Cross: 80.0
Discovery: 80.0
Demon Days: 80.0
Ruby Blue: 80.0
Open Letter to the Damned: 80.0
Palookaville: 80.0
Destroy Rock &amp; Roll: 77.7777777778
Uncle Tony's Coloring Book: 76.6666666667
Space Travel with Teddybear: 76.0
Haunted: 74.2857142857
Little Voice: 74.2857142857
V Live: 73.3333333333
Hello Mom! (iTunes Version): 73.3333333333
Silent Shout: 73.3333333333
OK Cowboy: 73.3333333333

Genres with the best average rating
Electronica/Dance: 77.7777777778
Anime: 74.2857142857
Dance: 74.0
Electronic: 72.4324324324
Folk: 72.0
Unclassifiable: 70.0
RnB: 70.0
Neo-Electro: 70.0
Mix CD: 69.4736842105
Electronica: 69.3333333333
Alternative Rock: 67.7777777778
Alternative &amp; Punk: 67.2727272727
Techno: 66.4516129032
Acapella: 66.3636363636
General Rock: 65.7142857143</pre>
<p>Because I am a good person and like you, here is the source:</p>
<p>iTunesStats.py</p>
<pre class="brush: python">
#!/usr/env/python
&quot;&quot;&quot;
A set of utilities for working with iTunes XML files and generating interesting statistics therefrom.

Dependencies:
PListReader (http://www.shearersoftware.com/software/developers/plist/)
XMLFilter   (http://www.shearersoftware.com/software/developers/xmlfilter/)
path        (http://www.jorendorff.com/articles/python/path)
&quot;&quot;&quot;

from __future__ import division

import sys
from PListReader import PListReader
from XMLFilter import XMLFilter
from path import path
from copy import copy

alphabet = set(list(&#039;abcdefghijklmnopqrstuvwxyz&#039;))

def load(iml=None):
if iml is None:
#yup, i&#039;m assuming Windows here
iml = path(&#039;~/My Documents/My Music/iTunes/iTunes Music Library.xml&#039;).expand().abspath()
if not iml.exists():
#i do take into account the possibility of mac/unix users
iml = path(&#039;~/Music/iTunes/iTunes Music Library.xml&#039;).expand().abspath()
if not iml.exists():
raise IOError(&#039;Could not automatically find &quot;iTunes Music Library.xml&quot;&#039;)
else:
iml = path(iml).expand().abspath()

reader = PListReader()
XMLFilter.parseFilePath(iml, reader, features = reader.getRecommendedFeatures())
return reader.getResult()

class Track(object):
class Lib(object):
def __init__(self, track):
self.track = track
self.artist = None
self.album = None
self.genre = None

def __str__(self):
return u&#039;&#039;.join([u&#039;Track: &#039;, unicode(self.track), u&#039;\nArtist: &#039;, unicode(self.artist), u&#039;\n&#039;,
u&#039;Album: &#039;, unicode(self.album), u&#039;\nGenre: &#039;, unicode(self.genre)])

def __init__(self, tdict, library = None):
for key, val in tdict.iteritems():
self.toAttr(key, val)
keys = set(tdict.keys())
if u&#039;Name&#039; not in keys:
self.name = path(self.location).name
if &#039;.&#039; in self.name:
self.name = self.name.rpartition(&#039;.&#039;)[0]
self.name = self.name.replace(&#039;%20&#039;, &#039; &#039;)
self.name = self.name.replace(&#039;_&#039;, &#039; &#039;)
if u&#039;Artist&#039; not in keys or self.artist == &#039;Various&#039;:
self.artist = None
if u&#039;Album&#039; not in keys:
self.album = None
if u&#039;Genre&#039; not in keys or self.genre == &#039;Unknown&#039;:
self.genre = None
if u&#039;Rating&#039; not in keys:
self.rating = None

#these will be initialized from the outside to point to
#the object representations
self.lib = Track.Lib(self)

self.library = library
if self.library is not None:
self.setLibrary(self.library)

def __cmp__(self, other):
return cmp(self.trackID, other.trackID)

def __str__(self):
return unicode(self.name)

def __repr__(self):
return u&#039;&lt;track: %s=&quot;%s&quot; -=&quot;-&quot; %s=&quot;%s&quot;&gt;&#039; % (unicode(self.artist), unicode(self.name))

def toAttr(self, keyname, val):
kn = []
first = True
for i in xrange(len(keyname)):
ki = keyname[i]
if ki.lower() in alphabet:
if first:
kn.append(ki.lower())
first = False
else:
kn.append(ki)
setattr(self, &#039;&#039;.join(kn), val)

def setLibrary(self, library):
self.library = library

self.library.tracks.add(self)

if self.album is not None:
self.library.albums.setdefault(self.album.lower(), TrackCollection(self.album)).add(self)
self.lib.album = self.library.albums[self.album.lower()]

if self.artist is not None:
self.library.artists.setdefault(self.artist.lower(), TrackCollection(self.artist)).add(self)
self.lib.artist = self.library.artists[self.artist.lower()]
else:
self.library.orphans.add(self)

if self.genre is not None:
self.library.genres.setdefault(self.genre.lower(), TrackCollection(self.genre)).add(self)
self.lib.genre = self.library.genres[self.genre.lower()]

class TrackCollection(set):
def __init__(self, name):
self.name = name

def __cmp__(self, other):
return cmp(self.name.lower(), other.name.lower())

def __repr__(self):
return &#039;&lt;%s: %i Tracks&gt;&#039; % (self.name, len(self))

def __str__(self):
return self.name

def average(self, key=lambda track: track):
return self.sum((key(track) for track in self)) / len(self)

def sum(self, iterable, key=lambda track: track):
t = 0
for i in iterable:
try:
t += i
except TypeError:
pass
return t

class Library(object):
def __init__(self, iml=None, messages=sys.stdout, suppressAutoIML=False):
self.tracks  = set()
self.albums  = {}
self.artists = {}
self.genres  = {}
self.orphans = set()

self.messages = messages

if not suppressAutoIML:
self.initFromIML(iml)

def initFromIML(self, iml):
&quot;&quot;&quot;
Initialize the library from an iTunes Media Library
&quot;&quot;&quot;
self.pr(&quot;Parsing XML... &quot;, newline=False)
lib = load(iml)
self.pr(&quot;%i track items parsed&quot; % len(lib[u&#039;Tracks&#039;]))

self.pr(&quot;Building model of track/album/artist relationships... &quot;, newline=False)
for tid, track in lib[u&#039;Tracks&#039;].iteritems():
Track(track, self)
self.pr(&quot;Done!&quot;)
self.pr(&quot;	%i total tracks&quot; % len(self.tracks))
self.pr(&quot;	%i genres&quot; % len(self.genres))
self.pr(&quot;	%i artists&quot; % len(self.artists))
self.pr(&quot;	%i albums&quot; % len(self.albums))
self.pr(&quot;	%i orphan tracks&quot; % len(self.orphans))

def pr(self, msg=&#039;&#039;, newline=True):
self.messages.write(unicode(msg).encode(&quot;utf-8&quot;))
if newline:
self.messages.write(&#039;\n&#039;)

def most(self, collection, collectionOperation=lambda col: len(col), viewTop=15, show=False):
&quot;&quot;&quot;
See the most populous members of a collection.

Collection is one of &quot;albums&quot;, &quot;artists&quot;, &quot;genres&quot;
collectionOperation is a function which is performed on each collection. Defaults to lambda col: len(col),
which causes this function to return the most populous members of the collection. Other examples:
lambda col: col.average(lambda track: track.rating) causes this to return the collections with the
best average rating.
viewTop restricts the number displayed. If 0, displays all.
&quot;&quot;&quot;
if show:
for col, size in self.most(collection, collectionOperation, viewTop, False):
self.pr(unicode(col) + u&#039;: &#039; + unicode(size))
else:
col = [(collectionOperation(c), c) for c in getattr(self, collection).values()]
col.sort()
col.reverse()
return [(c, cl) for cl, c in col] if viewTop == 0 else [(c, cl) for cl, c in col][:viewTop]

def prune(self, threshold=5):
&quot;&quot;&quot;
Generates a copy of the library with weak members pruned out.

All unrated tracks are pruned. Then, for each collection type, each member with
fewer than threshold tracks are pruned.
&quot;&quot;&quot;

self.pr(&quot;Pruning library with a threshold of %i...&quot; % threshold)

l2 = Library(messages=self.messages, suppressAutoIML=True)
for track in self.tracks:
if track.rating is not None and track.rating &gt; 0:
t2 = copy(track)
t2.lib = Track.Lib(t2)
t2.setLibrary(l2)
self.pr(&quot;	Unrated tracks eliminated...&quot;)

for collection in [&#039;albums&#039;, &#039;artists&#039;, &#039;genres&#039;]:
toremove = set()
for key, member in getattr(l2, collection).iteritems():
if len(member) &lt; threshold:
toremove.add(key)
elif len([i for i in member if i.rating is not None and i.rating &gt; 0]) &lt; threshold:
toremove.add(key)
coll = getattr(l2, collection)
for key in toremove:
del coll[key]
setattr(l2, collection, coll)
self.pr(&quot;	%s with too few ratings eliminated...&quot; % collection)

self.pr()
self.pr(&quot;Final cleanup of pruned library... &quot;, False)
newtracks = set()
for collection in [&#039;albums&#039;, &#039;artists&#039;, &#039;genres&#039;]:
for member in getattr(l2, collection).values():
for track in member:
newtracks.add(track)
l2.tracks = newtracks
self.pr(&quot;Done!&quot;)
self.pr(&quot;	%i pruned tracks&quot; % len(l2.tracks))
self.pr(&quot;	%i genres&quot; % len(l2.genres))
self.pr(&quot;	%i artists&quot; % len(l2.artists))
self.pr(&quot;	%i albums&quot; % len(l2.albums))

return l2

def main(argv=None):
if argv is None:
argv = sys.argv
iTunesLib = None
if len(argv) &gt; 1:
iTunesLib = argv[1]
lib = Library(iTunesLib)

lib.pr()

l2 = lib.prune()

lib.pr()

#now we just run through some standard stats
lib.pr(&quot;Average tracks per artist: &quot;, False)
spa = [len(a) for a in lib.artists]
lib.pr(sum(spa) / len(spa))
lib.pr(&quot;Artists with the most tracks:&quot;)
lib.most(&#039;artists&#039;, show=True)

lib.pr()

lib.pr(&quot;%i%% of tracks have genres noted&quot; % int(100*(len([i for i in lib.tracks if i.genre is not None])/len(lib.tracks))))
lib.pr(&quot;Average tracks per genre: &quot;, False)
spg = [len(g) for g in lib.genres]
lib.pr(sum(spg) / len(spg))
lib.pr(&quot;Genres with the most tracks:&quot;)
lib.most(&#039;genres&#039;, show=True)

lib.pr()

lib.pr(&quot;Average albums per artist: &quot;, False)
lib.pr(sum((len(set((track.album for track in artist))) for artist in lib.artists.values())) / len(lib.artists))
lib.pr(&quot;Artists with the most albums:&quot;)
lib.most(&#039;artists&#039;, lambda artist: len(set((track.album for track in artist))), show=True)

lib.pr()

noratings = len([i for i in lib.tracks if i.rating is not None])
lib.pr(&quot;%i%% of tracks have ratings noted&quot; % int(100*(noratings/len(lib.tracks))))
if noratings &gt; 0:
lib.pr(&quot;Artists with the best average rating&quot;)
lib.most(&#039;artists&#039;, lambda col: col.average(lambda track: track.rating), show=True)

lib.pr()

lib.pr(&quot;Genres with the best average rating&quot;)
lib.most(&#039;genres&#039;, lambda col: col.average(lambda track: track.rating), show=True)

if len(l2.tracks) &gt; 0:
lib.pr()

lib.pr(&quot;Considering only categories with at least five samples to compare between:&quot;)
lib.pr(&quot;Artists with the best average rating&quot;)
l2.most(&#039;artists&#039;, lambda col: col.average(lambda track: track.rating), show=True)

lib.pr()

lib.pr(&quot;Albums with the best average rating&quot;)
l2.most(&#039;albums&#039;, lambda col: col.average(lambda track: track.rating), show=True)

lib.pr()

lib.pr(&quot;Genres with the best average rating&quot;)
l2.most(&#039;genres&#039;, lambda col: col.average(lambda track: track.rating), show=True)

if __name__ == &#039;__main__&#039;:
sys.exit(main())
</pre>
<p>I encourage you to post your own results.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coriolinus.net/2008/04/12/because-im-a-nerd-thats-why/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>moon mine misc.</title>
		<link>http://www.coriolinus.net/2004/11/26/moon-mine-misc/</link>
		<comments>http://www.coriolinus.net/2004/11/26/moon-mine-misc/#comments</comments>
		<pubDate>Sat, 27 Nov 2004 04:29:00 +0000</pubDate>
		<dc:creator>coriolinus</dc:creator>
				<category><![CDATA[geekspeak]]></category>
		<category><![CDATA[misc.link]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.coriolinus.net/2004/11/26/629/</guid>
		<description><![CDATA[It is a favorite theme in science fiction that the moon is colonized as a mining base, to provide resources to a depleted Earth. This marks the first time I&#8217;ve seen something like that in the news. Then, though it happened too soon for there to be a cause-and-effect relationship, this came out. The second [...]]]></description>
			<content:encoded><![CDATA[<p>It is a favorite theme in science fiction that the moon is colonized as a mining base, to provide resources to a depleted Earth. <a href="http://www.dailytimes.com.pk/default.asp?page=story_27-11-2004_pg4_25">This</a> marks the first time I&#8217;ve seen something like that in the news. Then, though it happened too soon for there to be a cause-and-effect relationship, <a href="http://www.al.com/news/huntsvilletimes/index.ssf?/base/news/110146424621340.xml">this</a> came out.</p>
<p>The second link loses points for claiming to be xml while clearly failing to be so.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coriolinus.net/2004/11/26/moon-mine-misc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

