The ipxword.py
script in this repo is a crossword compiler using integer optimization (IO/IP). Crossword as in dense newspaper-style crossword. I believe the usual way crossword makers attack this problem is with "constraint satisfaction programming" (CSP) methods (for example, I'm pretty sure CSP is what the commercial, Windows-only Crossword Compiler uses, which is the industry standard). So I'm trying to do something different with the IP way, although it has been discussed before.
Details on the methodology in a blog post over here.
The bones of the ipxword.py
script are (1) a Grid
class which wraps convenience methods for manipulating the crossword grid, and (2) a IPXWordGenerator
class which takes a Grid
as input and formulates and solves as an integer optimization problem (IO/IP).
Assuming you have Python installed, to try it out, first clone this repository
$ git clone https://github.com/stmorse/IP-crossword.git
and make sure you have the PuLP
package installed (the only dependency) by running
$ pip install pulp
Then you can create a simple 3x3 test grid by just running
$ python ipxword.py
which should print out something like
you/IP-crossword $ python ipxword.py
GRID:
# - -
- - -
- - -
Number slots: 6
Different word lengths possible: {2, 3}
Dictionary size after sampling: 500
(Using random word values.)
Building...
Puzzle status: Optimal
Total words: 6
Assignments: (index, (slot), word)
(19, (2, 'across'), 'ems')
(33, (0, 'across'), 'ma')
(207, (1, 'down'), 'mum')
(227, (2, 'down'), 'ars')
(323, (0, 'down'), 'pe')
(379, (1, 'across'), 'pur')
Like I said, the ospd.txt
wordlist is pretty crummy.
You can also play around with it a little more in an IPython notebook, if you prefer. The Grid
class takes a size parameter and a list of where you want the black squares to be (list of coordinate tuples). Right now I've only had success getting the default solver to handle grids with N=3
or N=4
, and only taking a sample of numk=500
words or so.
This is a Work-In-Progress, (but it is working). I think the following improvements may help:
- Improve the grid handling --- currently very hacky, using base Python, lots of list comprehensions and loops ... blech.
- Use something better than the default solver for
PuLP
. - Related: a better solver might be able to give insight if there are pre-processing steps that speed things up.
- Improve the word list and add points per word. Most of a crossword maker's struggle (I'm told) is getting a good, well-sorted word list. Currently I'm using a
ospd.txt
file with a few thousand words, most of which are crummy/archaic, and assigning each word a random score.
Also, crossword constructors typically want to be able to specify certain word-to-slot assignments, and/or specify certain words they want to ensure make it in the grid. This code already has the bones to handle this, and it wouldn't take much to fill out the set_words()
method in IPXWordGenerator
to handle these kinds of constraints. The idea is:
- To specify a word-to-slot assignment, simply set the corresponding decision variable
zvars[k,s] = 1
wherek
corresponds to the word index you want ands
corresponds to the slot assignment. - To specify a word-in-the-puzzle assignment, you could do one of two things: (1) either give that word a very high associated cost by adjusting
allcosts[k]
. OR, (2) specify a new constraint thatLpSum(zvars[k,s] for s in slots) == 1
, i.e. that wordk
must be assigned to exactly one of the slots in the puzzle.
Again, check out my post to get a better feel for the underlying math of the algorithm.
Enjoy!
Feel free to send me comments on Twitter.