In this case, I am confirming that the query is returning the correct results. The order of filter() and annotate() is significant, and the result you have received is what should be expected.
However, I will concede that the documentation of this quirk could certainly be improved. I'll reopen this ticket, but I'll repurpose it to clarify the documentation of this feature since there isn't anything at a code level that requires fixing. If you (or anyone else) wants to take a swing at describing this quirk better, feel free to do so.