Cheerful Curmudgeon

A complete lack of ideas and the power to express them.

  • Home
  • About Me
    • Art Zemon’s PGP Key
    • Privacy Policy
  • Bede BD-4C
    • Hall of Fame
  • Piper Arrow

Python Generators Neatly Untangle Loops

July 2, 2010 Art Zemon

The Python programming language has become my first choice for most tasks over the last year or so. The more I use it, the more I find to like about it. I just stumbled across generators in a way that made them make sense to me and it is so cool that I want to share it with you. A generator can make a program immensely more readable by separating the task of producing (or generating) data from the task of processing the data.

This will make more sense with an example: print an alphabetized list of all the usernames for a Linux system. On a computer running Linux, the file /etc/passwd contains information about all of the users. Here is the file for my laptop:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:Mailing List Manager:/var/list:/bin/sh
irc:x:39:39:ircd:/var/run/ircd:/bin/sh
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
libuuid:x:100:101::/var/lib/libuuid:/bin/sh
syslog:x:101:102::/home/syslog:/bin/false
klog:x:102:103::/home/klog:/bin/false
hplip:x:103:7:HPLIP system user,,,:/var/run/hplip:/bin/false
avahi-autoipd:x:104:110:Avahi autoip daemon,,,:/var/lib/avahi-autoipd:/bin/false
gdm:x:105:111:Gnome Display Manager:/var/lib/gdm:/bin/false
saned:x:106:113::/home/saned:/bin/false
pulse:x:107:114:PulseAudio daemon,,,:/var/run/pulse:/bin/false
messagebus:x:108:117::/var/run/dbus:/bin/false
polkituser:x:109:118:PolicyKit,,,:/var/run/PolicyKit:/bin/false
avahi:x:110:119:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/bin/false
haldaemon:x:111:120:Hardware abstraction layer,,,:/var/run/hald:/bin/false
art:x:1000:1000:Art Zemon,,,:/home/art:/bin/bash
postfix:x:112:124::/var/spool/postfix:/bin/false
candy:x:1001:1002:Candy Zemon,,,:/home/candy:/bin/bash
sshd:x:113:65534::/var/run/sshd:/usr/sbin/nologin
mediatomb:x:114:126:MediaTomb Server,,,:/var/lib/mediatomb:/usr/sbin/nologin
couchdb:x:115:116:CouchDB Administrator,,,:/var/lib/couchdb:/bin/bash
speech-dispatcher:x:116:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/sh
kernoops:x:117:65534:Kernel Oops Tracking Daemon,,,:/:/bin/false
usbmux:x:118:46:usbmux daemon,,,:/home/usbmux:/bin/false
festival:x:119:29::/home/festival:/bin/false
rtkit:x:120:128:RealtimeKit,,,:/proc:/bin/false

Since the username is the first “word” on each line, up to the first colon, most of that file is drek and can be ignored. So given that file of stuff, the program breaks down into these tasks:

  1. Open the file /etc/passwd.
  2. Read every line from the file and get the username, the first word, off of each line.
  3. Construct a list of all the usernames.
  4. Sort the list.
  5. Print the results.

My first attempt at such a program would have been something like this:

namelist = []
passwd = open('/etc/passwd')
for line in passwd:
    username, drek = line.split(':', 1)
    namelist.append(username)
passwd.close()
namelist.sort()
for name in namelist:
    print name

This little Python program does what I just described, producing this output:

art
avahi
avahi-autoipd
backup
bin
candy
couchdb
daemon
festival
games
gdm
gnats
haldaemon
hplip
irc
kernoops
klog
libuuid
list
lp
mail
man
mediatomb
messagebus
news
nobody
polkituser
postfix
proxy
pulse
root
rtkit
saned
speech-dispatcher
sshd
sync
sys
syslog
usbmux
uucp
www-data

The ugliness is that the for-loop does two things which are unrelated to each other: It finds the usernames within the /etc/passwd file and it constructs a list of the usernames. Why does a piece of a program which finds usernames care what happens to the usernames after they have been found? Why does a piece of a program which constructs a list of usernames need to care where the names came from? This is an artificially contrived example, so each of these pieces is very simple, but it is generally A Good Thing if each piece of a program does exactly one task. This makes everything easier: design, coding, testing, and debugging.

By using a generator, we can pry these two tasks apart and the program becomes easier to understand:

def usernames():
    passwd = open('/etc/passwd')
    for line in passwd:
        username, drek = line.split(':', 1)
        yield username
    passwd.close()

namelist = []
for name in usernames():
    namelist.append(name)
namelist.sort()
for name in namelist:
    print name

The generator at the top does just one thing: it produces usernames, one at a time. Python takes care of all the complexities. We can simply use the generator wherever we need a list of usernames. On first use, the /etc/passwd file is opened. Then each line is read, the username split off the beginning of the line, and the username yielded up to whatever other part of the program needs it. When the file has been completely processed, it is closed.

The second part of the program has become an easy-to-read loop: for name in usernames() This loop processes each name. We can understand that without being distracted by the details of processing the /etc/passwd file. Sweet.

[Update: I particularly enjoy programming because there is always something new to be learned. I have updated the following example, shortening it by one line while simultaneously making it easier to understand.]

Of course, Python offers more shortcuts and we can make the program more concise. Try this flavor:

def usernames():
    passwd = open('/etc/passwd')
    for line in passwd:
        username, drek = line.split(':', 1)
        yield username
    passwd.close()

print '\n'.join(sorted(usernames()))

Reading from the inside to the outside: usernames() produces the list of usernames. sorted(...) produces an alphabetized list of usernames. '\n'.join(...) takes the alphabetized list of names and joins them together into a string, one name per line, which is ready to be printed.

I hope that this has not been too deep a peek into the machinations of a programmer’s mind. 🙂

Software

Recent Posts

  • Stretching a Photo April 21, 2025
  • There are Elephants in the Room April 10, 2025
  • Let’s Eliminate Real WFA April 1, 2025
  • Thumb Wrist Neck Waist Height March 18, 2025
  • Avoid Targeted Advertisements February 5, 2025

About Art Zemon

Omni-curious geek. Husband. Father. Airplane builder & pilot. Bicyclist. Photographer. Computer engineer.

Categories

  • Aviation (261)
    • Bede BD-4C (174)
    • Soaring (5)
  • Bicycling (37)
    • St. Louis to Atlanta (8)
    • St. Peters to Minneapolis (18)
  • Business (48)
  • Cabbages & Kings (24)
  • Communicating (37)
  • Ecology (21)
  • Economy (8)
  • Family (35)
  • Finding the Good (43)
  • Fun (188)
    • Six Word Stories (8)
  • Gardening (5)
  • Genealogy (5)
  • Government (35)
  • Health (67)
  • Judaism (10)
  • Men (12)
  • Mideast (5)
  • Movies (8)
  • Philosophy (15)
  • Photography (27)
  • Rants & Raves (103)
  • Recommendations (35)
  • Safety (37)
  • Science (22)
    • Biology (7)
    • Physics (7)
    • Pyschology (3)
  • Technology (195)
    • eBooks (7)
    • Internet (66)
    • Software (63)
    • VOIP (5)
  • Travel (43)
  • Tzedakah (12)
  • Women (5)

You Will Also Like

  • Art Zemon's Genealogy
  • Art Zemon's Photos
  • Mastodon @babka.social
  • Mastodon @raphus.social

Search

#DonorForLife

6 gallon blood donor badge
#DonorForLife - Give Blood - Save Lives

Archives

Copyright © 2025 · Daily Dish Pro Theme on Genesis Framework · WordPress · Log in