I used the sieving algorithm for subtract-a-square to generate its P-positions (the ones you want to move to) in order to estimate how quickly this sequence of positions grows. Based on some assumptions that the sequence was basically random and on considerations involving the coupon collector problem, I was expecting its growth rate (the number of P-positions up to \( n \)) to be roughly \( O(\sqrt{n} \log n) \). But that seems to be inconsistent with the data. Instead (based on generating the first 350k or so of these numbers) it seems to be more like \( O(n^{0.69}) \), a surprisingly high growth rate.

Below is a table of some P-positions (the middle column), the number of P-positions up to that value (left column), and the exponent one would infer from those two numbers:

22 109 0.658881526366
39 204 0.688882847768
66 425 0.692265864743
99 827 0.684021054309
158 1625 0.684757862413
255 3209 0.686333836589
419 6414 0.688764172045
682 12812 0.689885260226
1087 25647 0.688637867852
1752 51202 0.688752703605
2821 102517 0.688593820775
4497 204817 0.687755821596
7436 409687 0.689776813171
12068 819267 0.69023196924
19659 1638632 0.690896170477
31742 3276892 0.690915578787
51489 6553775 0.691222814299
83851 13107272 0.691745659235
136858 26215012 0.692354994835
221074 52429275 0.69233584282
358248 104857942 0.692489695744

In any subtraction game like this one, the growth rates of the move set and the P-position set must multiply together to at least linear. But it occurred to me to wonder: maybe they are always superlinear? That is, maybe there is some more general reason why the winning positions in subtract-a-square seem to be significantly more dense than the squares? But the answer is no. There's a different subtraction game where both number of moves up to \( n \) and the number of winning positions up to \( n \) are proportional to the square root, with some oscillation in the constant of proportionality.

To define such a game, let the available moves be the Moser–de Bruijn sequence of sums of distinct powers of 4: 0, 1, 4, 5, 16, 17, 20, 21, 64, ... Any integer \( n \) can be decomposed into its even part \( n \ \&\ \mathrm{0x55555555} \), a member of its sequence, and its odd part \( n\ \&\ \mathrm{0xaaaaaaaa} \), which is two times a member of the sequence. The P-positions for the game defined by these moves are the ones whose even part is zero, and the unique winning move (when the even part is nonzero) is to subtract off the even part.

Another way of looking at this, in number theoretic terms, is that the Moser–de Bruijn sequence is something like a Sidon sequence. A Sidon sequence is defined by the property that the sums \( x + y \) of pairs of numbers \( x \), \( y \) from the sequence are all distinct. To achieve this property, the number of elements in an infinite Sidon sequence up to \( n \) must (for infinitely many \( n \)) grow strictly smaller than the square root of \( n \). Instead, the Moser–de Bruijn sequence has distinct sums of the form \( x + 2y \), and because this form is a little different it can grow proportionally to the square root.

So, any explanation of the high growth rate of P-positions in subtract-a-square will need to depend on the actual distribution of square numbers, and not just on their frequency.

Update Nov. 18: Compare these old messages on modular behavior of the subtract-a-square positions with Ruzsa's lower bound technique in the Furstenberg–Sárközy theorem. Then look at the table below, of what happens when we represent subtract-a-square positions in base 5 and count digit frequencies for each digit position of these numbers (here, for positions up to 4 million):

digit 0 : 15179 8 18520 36 2803
digit 1 : 7369 7280 7277 7351 7268
digit 2 : 11273 3903 10586 6161 4614
digit 3 : 7230 7303 7360 7470 7160
digit 4 : 7863 6726 7524 7453 6894
digit 5 : 7094 7278 7346 7280 7302

It looks very much to me like the greedy algorithm is trying to emulate Ruzsa's strategy in base 5 (that is, to use a square-difference-free set of digits in even positions and arbitrary digits in odd positions), which would already give it approximately \( n^{0.715338} \) positions. If it could bring in other prime factors to the base it could do even better. Maybe it emulates Ruzsa for all bases simultaneously?