|
How Hard is This Puzzle Anyway?
Leveraging user analytics to measure and understand the difficulty of crossword puzzles
by Roy Leban
Puzzle solvers love to talk about puzzles.
We’re solving puzzles to challenge ourselves, so, of course, a frequent topic
is how hard puzzles are and how well we do in solving them.
In the case of the New York Times crossword, which features an escalating
level of difficulty each week, it’s fairly common to hear people say things like
“This puzzle is easy for a Monday” or
“This puzzle is hard for a Thursday.”
I wondered if it was possible to quantify those feelings,
to actually know if a puzzle was easy or hard relative to expectations.
So we set out to do just that.
As a result, we’ve added a Difficulty Index to the Puzzazz leaderboard
page for every New York Times crossword going back to July 1st, 2015 and
every future NYT crossword will have the Difficulty Index added to its leaderboard page
automatically as soon as we have collected enough data to calculate one.
In this document, we use the term solve to refer to when a person works on
a puzzle, whether or not they finish it and get the correct solution.
We use the term complete to refer to when a person solves a puzzle,
finishes it, and gets the correct solution.
Some History
Many people consider the New York Times crossword puzzle,
which started in 1942 (see right),
to be the “gold standard” of crosswords.
“Best” is always subjective, but the Gray Lady’s puzzle’s longevity
(74 years with only four editors)
as well as its consistency has made it the standard-bearer of American-style crosswords
for a long time. When the NYT Crossword leads, others follow.
One way the Times leads is with an escalating difficulty level of the puzzles
throughout the week, with Monday puzzles being the easiest and Saturday puzzles being
the hardest. Sunday puzzles are special and are about twice as large as a typical daily
puzzle and aimed at a mid-week level.
The practice dates back to the Times first editor, Margaret Farrar.
According to Will Shortz, the current puzzle editor, Farrar
“made Saturday’s crosswords a little harder than those for the other days.
She figured, as many people didn’t have to work on Saturday,
they’d have more time to solve.
She referred to the Saturday Times crossword as a ‘two cups of coffee’ puzzle.”
Somewhere along the line, according to Shortz, the difficulty of the puzzles
began to increase throughout the week.
When Shortz became editor in 1993, he decided to steepen the slope of difficulty:
“The Monday puzzles I edit,” he says, “are probably easier than they’d ever been
before, and the Friday/Saturday puzzles tend to be harder.”
Shortz’s Monday crossword became one that “anyone in America” could solve, though
not necessarily quite as quickly as the best solvers.
Today, the vernacular of puzzle solvers includes phrases like “Monday puzzle” and
“Sunday puzzle” to describe a certain level of difficulty, and, in the case of
“Sunday puzzle,” a certain size.
The Analysis (and some interesting things we learned)
Of course, puzzle creation isn’t a hard science, and different people know different
things, so it will never be the case that every “Monday puzzle” is the same level of
difficulty for every person.
We collected a data set representing hundreds of
thousands of puzzle solves for a wide range of NYT crosswords.
We have even more data on non-NYT puzzles,
but, for now, we decided to limit our analysis to just the NYT corpus.
With that in mind, let’s look at this chart:
For each day of the week, there is an overall average (the black line).
Rather than using times, we’ve calibrated the data with the difficulty index
of an average Times “Monday puzzle” fixed at 1.0.
A puzzle with a difficulty index of 2.0 would be expected to take twice
as long to complete as one with an index of 1.0.
This allows each person to think about the overall data relative to their own
solving speed, and our analysis shows that this is pretty accurate for most people.
So, for an average solver, a typical Sunday puzzle is a little more than
four times as hard as a typical Monday puzzle.
Better solvers generally have a shallower curve, while weaker solvers have
a steeper curve. Averaging them all together, we get the curve shown.
We’ve also calculated ranges for “Typical” difficulty puzzles (purple),
plus “Easy” and “Hard” puzzles (pink), based on clusterings of completion times.
The gray area represents puzzles which are “Very Easy” or “Very Hard”.
Fridays are easier than Thursdays.
As expected, the difficulty does increase from Monday through Saturday.
What we didn’t expect is that the average difficulty actually goes down
slightly from Thursday to Friday. But there’s a logical explanation.
Thursday puzzles are typically the tricky ones,
which can have anything from a multi-letter rebus in the grid to backwards words.
The fastest solvers aren’t thrown very much by these tricks
(and Friday puzzles are, in fact, slightly harder for those people), but
there can be a big jump in completion time for less experienced solvers.
We can see this in the huge peak for Thursday puzzles in the chart,
which represents more puzzles which are “Hard” or “Very Hard” for a Thursday puzzle
(more than any other day of the week).
These puzzles push the average up.
Notice that the increased range for Thursday puzzles is only on the up side,
which is related to, but not exactly the same thing as, more solvers having difficulty
with the tricks.
Sunday is between a Wednesday and Thursday in difficulty.
Will Shortz aims for a mid-week difficulty level for a Sunday puzzle, and he’s getting
it, even though the graph makes it looks like that’s not happening.
The Sunday puzzle is twice as big as a daily puzzle in terms of both letters to enter
and number of clues. When this is taken into account, the difficulty level of a Sunday
puzzle fits right between Wednesday and Thursday.
There aren’t many “Very Easy” puzzles.
Except for Sunday, there’s a very narrow slice of gray below the “Easy” range.
I think this is explained by consistent editing combined with a small group of
test solvers for the puzzles. The fewer the test solvers, the more likely it is
that some unforeseen knowledge gap or other issue among the general public will
cause a puzzle to be harder than expected. In contrast, it would rare for
extra knowledge to make a puzzle easier than expected.
I think the greater number of “Very Easy” Sunday puzzles may be partially explained
by the solvers, which is discussed more below.
There are more “Very Hard” puzzles earlier in the week.
This seems counterintuitive. After all, the Monday puzzle is supposed to be
solvable by “anyone in America”. But the Tuesday isn’t, and there’s a pretty steep
curve going up to Wednesday. This is also related to who’s solving the puzzles and
is discussed below.
Sometimes, there’s a puzzle that throws the curve.
In the process of analyzing the data, we discovered one puzzle that, all by
itself, threw the curve.
It’s Patrick’s Berry’s Sunday masterpiece of September 6th, 2015.
Without giving anything away, it has a complex trick that took many solvers
a long time to figure out.
On average, it took solvers more than 14 times as long to complete that puzzle
as it did for them to complete a typical Monday puzzle, a data point that
would be twice as high as the top of this chart.
In comparison, the runner-up hardest Sunday puzzles (and there was a cluster of them)
took solvers just under five times as long as a Monday puzzle, so Patrick’s puzzle
was three times harder than the next closest Sunday puzzle.
Consistent with other puzzles, there were plenty of people who completed Patrick’s puzzle
with a time in their expected range. But the slower solvers always push the average up,
and, for this puzzle, they pushed it up to an extreme range.
This single puzzle skewed the stats enough that we removed it from our analysis, and
it is the only puzzle we currently rate as “Extremely Hard.”
Click here
to see Patrick’s puzzle and solution on XWord Info or here
to see the Puzzazz leaderboard for the puzzle.
Who’s Solving?
Many people solve the Times puzzle every day, but some people solve a subset of days.
We can learn some interesting things if we delve into this.
As this chart shows, the number of
New York Times crossword solvers drops off after the Monday puzzle.
The solvers.
The top (blue) line shows the solve rate for each day of the week. These are the people
who try to solve the puzzle, whether or not they complete it. We start with a baseline
of 100% for Monday, because that’s the most solved puzzle.
Fewer people solve on Tuesday than Monday, fewer still on Wednesday.
There’s a slight pickup on Thursday and another dropoff until Sunday,
which has almost as many solvers as Monday (97%).
It’s worth noting that the chart simplifies things slightly.
Each day (except Sunday), there is a dropoff of solvers from the day before,
but there are also new solvers added day which makes up for some of the people lost.
The greatest churn is on Thursday, where the new solvers more
than make up for the people lost from Wednesday. That’s a surprising find, but
it’s not completely a shock. Thursday is known for being the first
“hard” day of the week (and we’ve confirmed that above),
so some more serious solvers start solving on Thursday.
When we refer to a “Monday puzzle”, we are referring to a puzzle that the Times
labels as a Monday puzzle, which is actually available in Puzzazz on Sunday afternoon
and could potentially be solved on any day of the week.
We only care about how the Times labels the puzzle, not when it’s solved.
One of the benefits of solving in Puzzazz is that it’s easy to solve puzzles
when it’s convenient to the solver; we see many solvers solving puzzles out of order,
such as solving a bunch of early week puzzles in succession.
Most people who solve puzzles complete them successfully.
The bottom (green) line shows the completed puzzles with the
same baseline of 100% for Monday solves.
Looking at the two lines together with what we’ve learned above yields
some more interesting finds.
Fewer people complete harder puzles.
This makes perfect sense. Fewer people complete the puzzle on each succeeding day of the week
(except Sunday, again), even though there’s an increase of people who solve the Thursday puzzle.
The most completed puzzles of the week are Tuesday and Wednesday,
where almost everybody who tries to solve the puzzle completes it.
While initially surprising, this also makes sense when we think about it.
Monday is the puzzle for “anyone in America” and many of the people who struggle with it
or can’t complete it drop out.
The people who are left are solid solvers and almost all of them complete the puzzles.
Relatively speaking, we do see a big drop off in completed puzzles on Thursday,
but the dropoff is smaller than the increase in difficulty.
The result is a remarkably straight line down from Monday to Friday.
I don’t have an explanation for that.
Relative completion rates pick up on Friday.
Friday puzzles get pretty hard, and, unlike Thursday, they are consistently hard,
so more people drop out. Friday and Saturday are also typically themeless puzzles,
and it may be that some solvers don’t like them as much (we don’t have any
evidence for or against this hypothesis). However, we can tell that the solvers who
are left, like the Tuesday and Wednesday solvers, are better solvers, because of the
relative increase of completed puzzles, despite the puzzles being harder.
People really like the Sunday puzzle.
Total Sunday solvers almost reaches the Monday peak, but the completed puzzles are
much lower, below 80%. That’s some evidence for the thesis that people enjoy
solving puzzles even when they don’t complete them successfully.
Unlike the paper world, the digital world doesn’t have very many
Monday-only or Sunday-only solvers. It’s one thing to solve only one puzzle a week
when in comes in the newspaper you’re already getting, but it’s quite another thing
to pay for a subscription and only use one seventh of it.
The larger solving pool may also partially explain why more Sunday puzzles
are considered “Very Easy.”
Validity of results
It’s important to note that our results are relatively valid, not
absolutely valid. We are not saying how difficult any particular puzzle,
or puzzles on any particular day of the week are. Rather, we are comparing relative
difficulties between puzzles and groups of puzzles.
To verify that our results are valid, we analyzed different date ranges
and sliced the raw data in different ways. The results were very consistent,
as seen in this chart in which each line is a different month.
This graph uses a single baseline of the lowest average puzzle in the last 12 months, but the
graph looks pretty much the same if we use a separate baseline for each month.
The variability between the lines is attributable to the fact that there are
only 4-5 puzzles per day of the week in each month and some months naturally
have easier or harder puzzles than other months.
This analysis yields a 4% margin of error, which would not affect any conclusions.
The change in solvers from day to day, discussed above,
also affects our results very little,
at least in part because the dropoffs aren’t very large.
The relationship between the days, including the relationship between Thursdays and
Fridays, holds, as well as the measurement of Sunday as being between
Wednesday and Thursday in difficulty.
Tracking individual solvers separately is more complicated, harder to validate, and
harder to explain. The main thing that changes with such an analysis is that we
would have a slightly steeper Monday-to-Saturday difficulty curve.
In addition, a few more Monday and Tuesday puzzles would be considered “Very Easy”,
plus a few more Friday to Sunday puzzles would be considered “Very Hard,” increasing
slightly the gray areas on the first chart.
The conclusions are the same.
If we could factor in incomplete puzzles, we’d probably also see a
steeper Monday-to-Saturday curve, but there’s no reasonable way to incorporate that data.
When somebody doesn’t complete a puzzle, we don’t know why, and we can’t
extrapolate to get a time estimate that would match up with actual time values.
It’s better to look at that data separately, as we’ve done here.
Some people use hinting, and we don’t take that into account.
Since those people who use hinting generally use it in a consistent manner
(typically more on puzzles later in the week),
this actually has no effect on the perceived difficulty levels of the puzzles for those
solvers.
Finally, it’s worth asking if these results are equally valid for those people solving
on paper. Some people solve faster on digital devices while others solve faster on paper,
and sometimes it’s affected by the difficulty of the puzzle — for harder puzzles,
more time is spent thinking about the puzzle than in writing or entering text.
We have done some comparisons between paper solving and Puzzazz solving, and
we believe that the relative relationships would hold producing very similar charts.
But, it would be a significant project to prove this empirically.
Summary
We analyzed a year’s worth of solving data for New York Times crossword puzzles,
and drew some interesting conclusions, some surprising.
-
As intended by NYT puzzle editor Will Shortz,
the difficulty of puzzles does increase throughout the week,
and our chart quantifies this increase.
-
The degree of difficulty varies the most on Thursday,
which is known for its tricky puzzles.
-
On average, Thursday puzzles are harder than Friday puzzles.
It’s the only exception to each day being harder than the day before,
and most likely attributable to the tricks on Thursday.
-
Sunday puzzles are usually between Wednesday and Thursday in difficulty when
the increased size of the puzzle is factored out.
-
There aren’t many puzzles which are “Very Easy” for the day.
There are more puzzles that are harder than average than easier than average.
-
There are puzzles which are “Very Hard” for the day earlier in the week.
-
Fewer people complete harder puzzles, even though better solvers are solving them.
-
More people solve the Monday and Sunday puzzles than any other day of the week,
but more solvers complete the Tuesday and Wednesday puzzles.
-
The Sunday puzzle is solved by more people than any day except Monday,
but, on a percentage basis, fewer people complete it than any other day.
To view the Puzzazz leaderboard for the New York Times crossword, tap
the trophy icon from the green bar after you've completed the puzzle in the Puzzazz app,
or visit our web site to view the leaderboard and
learn how to solve the New York Times crossword in Puzzazz.
Publication date: November 14, 2016
|
|