===============
== Victor Ma ==
===============

Google Summer of Code final report

For Google Summer of Code 2025, I worked on GNOME Crosswords. GNOME Crosswords is a project that consists of two apps:

Here are links to everything that I worked on.

Merge requests

Merge requests related to the word suggestion algorithm:

  1. Improve word suggestion algorithm
  2. Add word-list-tests-utils.c
  3. Refactor clue-matches-tests.c by using a fixture
  4. Use better test assert macros
  5. Add macro to reduce boilerplate code in clue-matches-tests.c
  6. Add a macro to simplify the test_clue_matches calls
  7. Add more tests to clue-matches-tests.c
  8. Use string parameter in macro function
  9. Add performance tests to clue-matches-tests.c
  10. Make phase 3 of word_list_find_intersection() optional
  11. Improve print functions for WordArray and WordSet

Other merge requests:

  1. Fix and refactor editor puzzle import
  2. Add MIME sniffing to downloader
  3. Add support for remaining divided cell types in svg.c
  4. Fix intersect sort
  5. Fix rebus intersection
  6. Use a single suggested words list for Editor

Design documents

Work in progress.

Other documents

Development:

Word suggestion algorithm:

Competitive analysis:

Other:

Blog posts

  1. Introducing my GSoC 2025 project
  2. Coding begins
  3. A strange bug
  4. Bugs, bugs, and more bugs!
  5. My first design doc
  6. It’s alive!
  7. When is an optimization not optimal?
  8. This is a test post

Journal

I kept a daily journal of the things that I was working on.

Project summary

I improved GNOME Crossword Editor’s word suggestion algorithm, by re-implementing it as a forward-checking algorithm. Previously, our word suggestion algorithm only considered the constraints imposed by the intersection where the cursor is. This resulted in frequent dead-end word suggestions, which led to user frustration.

To fix this problem, I re-implemented our word suggestion algorithm to consider the constraints imposed by every intersection in the current slot. This significantly reduces the number of dead-end word suggestions and leads to a better user experience.

As part of this project, I also researched the field of constraint satisfaction problems and wrote a report on how we can use the AC-3 algorithm to further improve our word suggestion algorithm in the future.

I also performed a competitive analysis of other crossword editors on the market and wrote a detailed report, to help identify missing features and guide future development.

Word suggestion algorithm improvements

The goal of any crossword editor software is to make it as easy as possible to create a good crossword puzzle. To that end, all crossword editors provide a word suggestion list—a list of words that fit the current slot. This feature helps the user find words that fit the slots on their grid.

In order to generate the word suggestion list, crossword editors use a word suggestion algorithm. The simplest example of a word suggestion algorithm considers two constraints:

  • The size of the current slot.
  • The letters in the current slot.

So for example, if the current slot is C A _ S, then this basic word suggestion algorithm would return all four-letter words that start with CA and end in S—such as CATS or CABS, but not COTS.

The problem

There is a problem with this basic word suggestion algorithm, however. Consider the following grid:

+---+---+---+---+
|   |   |   | Z |
+---+---+---+---+
|   |   |   | E |
+---+---+---+---+
|   |   |   | R |
+---+---+---+---+
| W | O | R |   | < current slot
+---+---+---+---+

4-Down begins with ZER, so the only word it can be is ZERO. This constrains the bottom-right cell to the letter O.

4-Across starts with WOR. We know that the bottom-right cell must be O, so that means that 4-Across must be WORO. But WORO is not a word. So, 4-Down and 4-Across are both unfillable, because no letter fits in the bottom-right cell. This means that there are no valid word suggestions for either 4-Across or 4-Down.

Now, suppose that the current slot is 4-Across. The basic algorithm only considers the constraints imposed by the current slot, and so it returns all words that match the pattern W O R _—such as WORD and WORM. But none of these word suggestions actually fit in the slot—they all cause 4-Down to become some nonsensical word.

The problem is that the basic algorithm only looks at the current clue, 4-Across. It does not also look at other slots, like 4-Down. Because of that, the algorithm doesn’t realize that 4-Down causes 4-Across to be unfillable. And so, the algorithm generates incorrect word suggestions.

Our word suggestion algorithm

Our word suggestion algorithm was a bit more advanced than this basic algorithm. Our algorithm considered two constraints:

  • The constraints imposed by the current slot.
  • The constraints imposed by the intersecting slot where the cursor is.

This means that our algorithm could actually handle the problematic grid properly if the cursor is on the bottom-right cell. But not if the cursor is on any other cell of 4-Across:

Broken behaviour

Consequences

All this means that our word suggestion algorithm was prone to generating dead-end words—words that seem to fit a slot, but that actually lead to an unfillable grid.

In the problematic grid example I gave, this unfillability is immediately obvious. The user fills 4-Across with a word like WORM, and they instantly see that this turns 4-Down into ZERM, a nonsense word. That makes this grid not so bad.

The worst cases are the insidious ones, where the fact that a word suggestion leads to an unfillable grid is not obvious at first. This leads to a ton of wasted time and frustration for the user.

The fix

To fix this problem, I re-implemented our word suggestion algorithm to account for the constraints imposed by all the intersecting slots. Now, our word suggestion algorithm correctly handles the problematic grid example:

Fixed behaviour

Our new algorithm doesn’t eliminate dead-end words entirely. After all, it only checks the intersecting slots of the current slot—it does not also check the intersecting slots of the intersecting slots, etc.

However, the constraints imposed by a slot onto the current slot become weaker, the more intersections-removed it is. Consider: in order for a slot that’s two intersections away from the current slot to constrain the current slot, it must first constrain a mutual slot (a slot that intersects both of them) enough for that mutual slot to then constrain the current slot.

Compare that to a slot that is only one intersection away from the current slot. All it has to do is be constrained enough that it limits what letters the intersecting cell can be.

And so, although my changes do not eliminate dead-end words entirely, they do significantly reduce their prevalence, resulting in a much better user experience.

The end

This concludes my Google Summer of Code 2025 project! I give my thanks to Jonathan Blandford for his invaluable mentorship and clear communication throughout the past six months. And I thank the GNOME Foundation for its participation in GSoC and commitment to open source.