This is an interesting framing about agents writing meaningless tests:

Unless you are very clear and careful, AI coding assistants at the moment will look at your implementation and write tests that confirm your code does what it already does. They’re essentially saying: “This function returns X when given Y, so I’ll write a test to confirm it returns X when given Y.” This is the equivalent of a student writing their own exam after seeing the answers. It’s a tautology – true by definition, but not validating anything of value.

This is true in my experience, despite my best attempt to force a red-green testing methodology to have the agent write a failing test first and the passing code after, I still catch them adjusting the tests after a few loops.

This has obvious security implications. For example given latest research that shows that agents are terrible at not introducing regressions in the long term, there is always a significant chance that any test we put in place to catch security issues will be cheated at some point.

So how we write and engage the tests is the real important question.

I use ralph loops a lot and in my experience they are effective in ensuring quality outcomes. Perhaps a good design pattern could be to split every implementation task into two sequential:

Both tasks would get a clean context and different incentives reducing the likelihood of cheating.