Relevant math background: the Gaussian integers are the complex numbers of the form a+bi where a and b are good, old-fashioned integers. For example, 2+3i or -1 +2i are Gaussian integers. Any integer n is a Gaussian integer since you can write it as n+0i. But say or 3- 0.5 i would not be Gaussian integers. Also notation: We write x|y to mean y is a multiple of x. We can use this notation in our regular integers (so for example 2|8 but it is not true that 3|8 ) or in the Gaussian integers where we are then allowed to multiple by another Gaussian integer. For example (2+i)| (2+i)(3-i). A good exercise if you he not seen the Gaussian integers before: Convince yourself that 1+i | 1+3i.
It also turns out that the Gaussian integers he an analog of unique prime factorization just as that in the usual integers. The Gaussian integers also he a notion of size called the norm. For a given Gaussian integer a+bi, the norm is a^2 +b^2 .
Recently I had to prove a specific Lemma where I needed to find all Gaussian integers and where both are Gaussian primes, and b|a^2 + a +1 and a|b+1. I had as a template a very similar Lemma in the integers which was a Lemma which said exactly which integers and b such that b|a^2 + a +1 and a|b+1. I worked out the proof, essentially modifying the version in the integers. Then, I did something I've often been doing after I've completed a small Lemma, namely giving the task to ChatGPT or another system and seen how they've done. For prior iterations (GPT3, ChatGPT , GPT4, 4o) this has almost universally been a disaster. But this time I ge the task to GPT5, and ge it the integer version to start with. It tried to do the same basic task and produced a result pretty close to mine, but it had multiple small errors in the process, to the point where I'm unsure if using it would he sped things up. But at the same time, the errors were genuinely small. For example, in proving in one subcase the system claimed that a specific number's norm needed to be at most 9, when it needed to be at most 10. These are not the sort of large jumps in reasoning that one saw with GPT4 or 4o. It might he been the case that if I had given this to GPT5 before proving it myself and then had corrected its errors I would he sed time. I generally doubt this is the case, but the fact that it is close to the point where that's now plausible is striking.