One Handed Dictionary

Here’s some silly Python 2.x code to display a list of words that can be typed solely with one hand. Of course any word can be typed one handed if you’re a hunt and peck typist…

The list might be useful for passwords or for people who need to um… er… type one handed. Um like someone writing a lipogram

The code can be sped up on Python 2.4 by changing the list comprehensions “[ x for x in y]” to generator expressions “(x for x in y)”. Generator comprehensions are more efficient: processing 234,937 words on my 1.5GHz Powerbook takes 23.768 seconds with list comprensions and 11.157 seconds with generator comprehensions.

The longest words found in this dictionary were:

AFTERCATARACT

PHYLLOPHYLLIN

TESSERADECADE

Next time you use one of those in scrabble don’t forget to remind your opponent that the word can be typed one-handed.

#!/usr/local/bin/python

theLeft = 'QWERTASDFGZXCV'
theRight = 'YUIOPHJKLBNM'

theFile = file('/usr/share/dict/words')
### Remove line endings...
theWords = [theLine.rstrip('\r\n') for theLine in theFile.readlines()]
### Remove single letters...
theWords = [theWord for theWord in theWords if len(theWord) > 1]
### Make the words upper case...
theWords = [theWord.upper() for theWord in theWords]
### Check to find words made up of only left hand letters or right hand letters but not both...
theWords = [theWord for theWord in theWords if not (True in [X in theLeft for X in theWord]) == (True in [X in theRight for X in theWord])]
### Print out the words...

for theWord in theWords:
    print theWord
This entry was posted in Default. Bookmark the permalink.
  • Python does have regular expressions. They're not built into the language, but are in the standard Python library.

    Secondly - only the first version of my code is looping multiple times. The second version with the generator expressions (and don't forget you need Python 2.4) is really only looping once. For example:

    theWords = (theLine.rstrip('\r\n') for theLine in theFile.readlines())
    
    ### Remove single letters...
    theWords = (theWord for theWord in theWords if len(theWord) > 1)
    ### Make the words upper case...
    theWords = (theWord.upper() for theWord in theWord)
    print theWords


    No loop is occurring until the print statement is reached. The (xxx for xxx) lines are just creating iterator objects which are like stages of a loop.
  • Eric
    Thanks for the kind words! I have very little experience with python, but am curious about it because many people whose opinions I trust like python better than perl. Me, I know perl really quite well, so it was the first tool I went to when I realized grep wouldn't cut it for this test.

    I'd be curious to see how python would benchmark with a similar algorithm to the one I opted for in my perl version. I don't even know whether python has built-in support for regular expressions? (Yes, I'm that unfamiliar with python. ; )

    Even my perl version has some room for improvement, of course; there's no reason to copy the sorted array back into @words, so might as well just print it directly. This benchmarks similarly, though—I think internally perl is copying the sorted array into temp memory space before passing it to the print() call, so there's no performance gain. Just slightly terser code, is all.

    I tried briefly to adapt the python version into perl, and it's really interesting to see how python seems to encourage lots of looping in its semantics, whereas everything I might translate it to in perl would do several smaller steps in one pass over the word list (for example, instead of looping over the list taking off newlines, then looping over the list taking out short words, I'd do one loop that accomplished several steps and stored remaining altered values in an array). I wonder how much of this is just programming style vs. how much is the language we chose imposing its semantic tricks on the procedure chosen.

    Above all, I reiterate my thanks, both for this great Gedankenexperiment, and for your good will in sharing it. Quite intriguing.
  • I take that back - your perl is SUPER fast:

    real    0m0.877s
    
    user 0m0.435s
    sys 0m0.057s


    Congrats.
  • Well - I guess my timing was a little off...

    I just ran it again:

    real    0m8.366s
    
    user 0m7.315s
    sys 0m0.211s


    That's quite a difference! (This is on the same Powerbook - with the only difference major difference I can think of is that File Vault is turned off now - not that I see how file vault can make much of a difference)

    I just changed all the list comprehensions to generator expressions and tested it:

    real    0m5.095s
    
    user 0m2.683s
    sys 0m0.177s


    With output piped to /dev/null it is a touch faster

    real    0m3.057s
    
    user 0m2.584s
    sys 0m0.114s

    That's about the same ballpark as your perl script.

    I was never claiming my code was super efficient or I wasn't looping too many times... ;-)
  • Eric
    ) { push( @words, $_ ) if( m/^[abcdefgqrstvwxz]+$/ || m/^[hijklmnopuy]+$/ ) } @words = sort( { length($a) <=> length($b) } @words ); print @words;'

    ...perhaps I'm missing out on some functionality in your algorithm, but this takes less than 2 seconds on my 1.33GHz Powerbook (the python version takes about 11 seconds under similar conditions). I'm going to have to try to replicate it using something closer to your algorithm and see if maybe I'm just skipping some functionality (my python's pretty rusty, though).

    Ahh, I think I see now; it looks like the python version not only loops over the list more than once (I loop over it once, then do a bubble sort which doesn't seem present in the python version), but the python version also loops over each letter in each word for uppercasing (I skipped this step), then looping over each letter in each word, and for each letter in each word also looping over each element in the two letter lists. Lotsa looping!

    I'm thinking this is just one of those cases where regular expressions turn out to be handier, though I have to confess I was pretty shocked by the vast difference in performance between the two versions.

    Anyway, thanks for the entry! Quite a fascinating exercise.
blog comments powered by Disqus