The Python programming language has become my first choice for most tasks over the last year or so. The more I use it, the more I find to like about it. I just stumbled across generators in a way that made them make sense to me and it is so cool that I want to share it with you. A generator can make a program immensely more readable by separating the task of producing (or generating) data from the task of processing the data.
This will make more sense with an example: print an alphabetized list of all the usernames for a Linux system. On a computer running Linux, the file /etc/passwd contains information about all of the users. Here is the file for my laptop:
root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/bin/sh man:x:6:12:man:/var/cache/man:/bin/sh lp:x:7:7:lp:/var/spool/lpd:/bin/sh mail:x:8:8:mail:/var/mail:/bin/sh news:x:9:9:news:/var/spool/news:/bin/sh uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh proxy:x:13:13:proxy:/bin:/bin/sh www-data:x:33:33:www-data:/var/www:/bin/sh backup:x:34:34:backup:/var/backups:/bin/sh list:x:38:38:Mailing List Manager:/var/list:/bin/sh irc:x:39:39:ircd:/var/run/ircd:/bin/sh gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh nobody:x:65534:65534:nobody:/nonexistent:/bin/sh libuuid:x:100:101::/var/lib/libuuid:/bin/sh syslog:x:101:102::/home/syslog:/bin/false klog:x:102:103::/home/klog:/bin/false hplip:x:103:7:HPLIP system user,,,:/var/run/hplip:/bin/false avahi-autoipd:x:104:110:Avahi autoip daemon,,,:/var/lib/avahi-autoipd:/bin/false gdm:x:105:111:Gnome Display Manager:/var/lib/gdm:/bin/false saned:x:106:113::/home/saned:/bin/false pulse:x:107:114:PulseAudio daemon,,,:/var/run/pulse:/bin/false messagebus:x:108:117::/var/run/dbus:/bin/false polkituser:x:109:118:PolicyKit,,,:/var/run/PolicyKit:/bin/false avahi:x:110:119:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/bin/false haldaemon:x:111:120:Hardware abstraction layer,,,:/var/run/hald:/bin/false art:x:1000:1000:Art Zemon,,,:/home/art:/bin/bash postfix:x:112:124::/var/spool/postfix:/bin/false candy:x:1001:1002:Candy Zemon,,,:/home/candy:/bin/bash sshd:x:113:65534::/var/run/sshd:/usr/sbin/nologin mediatomb:x:114:126:MediaTomb Server,,,:/var/lib/mediatomb:/usr/sbin/nologin couchdb:x:115:116:CouchDB Administrator,,,:/var/lib/couchdb:/bin/bash speech-dispatcher:x:116:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/sh kernoops:x:117:65534:Kernel Oops Tracking Daemon,,,:/:/bin/false usbmux:x:118:46:usbmux daemon,,,:/home/usbmux:/bin/false festival:x:119:29::/home/festival:/bin/false rtkit:x:120:128:RealtimeKit,,,:/proc:/bin/false
Since the username is the first “word” on each line, up to the first colon, most of that file is drek and can be ignored. So given that file of stuff, the program breaks down into these tasks:
- Open the file /etc/passwd.
- Read every line from the file and get the username, the first word, off of each line.
- Construct a list of all the usernames.
- Sort the list.
- Print the results.
My first attempt at such a program would have been something like this:
namelist = [] passwd = open('/etc/passwd') for line in passwd: username, drek = line.split(':', 1) namelist.append(username) passwd.close() namelist.sort() for name in namelist: print name
This little Python program does what I just described, producing this output:
art avahi avahi-autoipd backup bin candy couchdb daemon festival games gdm gnats haldaemon hplip irc kernoops klog libuuid list lp mail man mediatomb messagebus news nobody polkituser postfix proxy pulse root rtkit saned speech-dispatcher sshd sync sys syslog usbmux uucp www-data
The ugliness is that the for-loop does two things which are unrelated to each other: It finds the usernames within the /etc/passwd file and it constructs a list of the usernames. Why does a piece of a program which finds usernames care what happens to the usernames after they have been found? Why does a piece of a program which constructs a list of usernames need to care where the names came from? This is an artificially contrived example, so each of these pieces is very simple, but it is generally A Good Thing if each piece of a program does exactly one task. This makes everything easier: design, coding, testing, and debugging.
By using a generator, we can pry these two tasks apart and the program becomes easier to understand:
def usernames(): passwd = open('/etc/passwd') for line in passwd: username, drek = line.split(':', 1) yield username passwd.close() namelist = [] for name in usernames(): namelist.append(name) namelist.sort() for name in namelist: print name
The generator at the top does just one thing: it produces usernames, one at a time. Python takes care of all the complexities. We can simply use the generator wherever we need a list of usernames. On first use, the /etc/passwd file is opened. Then each line is read, the username split off the beginning of the line, and the username yielded up to whatever other part of the program needs it. When the file has been completely processed, it is closed.
The second part of the program has become an easy-to-read loop: for name in usernames()
This loop processes each name. We can understand that without being distracted by the details of processing the /etc/passwd file. Sweet.
[Update: I particularly enjoy programming because there is always something new to be learned. I have updated the following example, shortening it by one line while simultaneously making it easier to understand.]
Of course, Python offers more shortcuts and we can make the program more concise. Try this flavor:
def usernames(): passwd = open('/etc/passwd') for line in passwd: username, drek = line.split(':', 1) yield username passwd.close() print '\n'.join(sorted(usernames()))
Reading from the inside to the outside: usernames()
produces the list of usernames. sorted(...)
produces an alphabetized list of usernames. '\n'.join(...)
takes the alphabetized list of names and joins them together into a string, one name per line, which is ready to be printed.
I hope that this has not been too deep a peek into the machinations of a programmer’s mind. 🙂