I remembered it all wrong. With the app optimisation on Collatz, there is no benefit from running 2 WU per GPU. I think that thought was a hangover from trying it on a sprint.
a Collatz WU is now completing in 6 mins +/- a few seconds so comfortably over 13 million / day Collatz from one machine. It has 9 threads on WCG as a bonus