Opened 5 years ago

Closed 5 years ago

Last modified 2 years ago

#31135 closed Bug (invalid)

OuterRef generates invalid SQL with __in filter.

Reported by: Bernd Wechner Owned by: nobody
Component: Database layer (models, ORM) Version: 3.0
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is new to Django 3.0 and may relate to: https://code.djangoproject.com/ticket/31133

In migrating my project I found a SQL crash. I have traced it to a SubQuery that includes a filter using in on an OuterRef.

Here is the code for context:

        pfilter = Q(
            session__date_time=Subquery(
                (Performance.objects
                    .filter(Q(player__in=OuterRef('player')) & pfilter)
                    .values('player')
                    .annotate(max_date=Max('session__date_time'))
                    .values('max_date')[:1]
                ), output_field=models.DateField()
            )
        )

which in Django 2 produces this SQL (I remove irrelevant attributes on the first select that are identical across the two Django versions to help see the difference):

SELECT "Leaderboards_performance"."id" 
FROM "Leaderboards_performance"
INNER JOIN "Leaderboards_session" ON ("Leaderboards_performance"."session_id" = "Leaderboards_session"."id")
WHERE "Leaderboards_session"."date_time" =
        (SELECT MAX(U2."date_time") AS "max_date"
         FROM "Leaderboards_performance" U0
         INNER JOIN "Leaderboards_player" U1 ON (U0."player_id" = U1."id")
         INNER JOIN "Leaderboards_session" U2 ON (U0."session_id" = U2."id")
         WHERE (U0."player_id" IN ("Leaderboards_performance"."player_id")
                AND U2."game_id" = 29
                AND U2."date_time" <= '2019-11-02 08:30:00+00:00')
         GROUP BY U0."player_id",
                  U2."date_time",
                  U1."name_nickname"
         ORDER BY U2."date_time" DESC, U1."name_nickname" ASC
         LIMIT 1)
ORDER BY "Leaderboards_performance"."trueskill_eta_after" DESC

and in Django 3:

SELECT "Leaderboards_performance"."id"
FROM "Leaderboards_performance"
INNER JOIN "Leaderboards_session" ON ("Leaderboards_performance"."session_id" = "Leaderboards_session"."id")
WHERE "Leaderboards_session"."date_time" =
        (SELECT MAX(U2."date_time") AS "max_date"
         FROM "Leaderboards_performance" U0
         INNER JOIN "Leaderboards_player" U1 ON (U0."player_id" = U1."id")
         INNER JOIN "Leaderboards_session" U2 ON (U0."session_id" = U2."id")
         WHERE (U0."player_id" IN "Leaderboards_performance"."player_id"
                AND U2."game_id" = 29
                AND U2."date_time" <= '2019-11-02 08:30:00+00:00')
         GROUP BY U0."player_id",
                  U2."date_time",
                  U1."name_nickname"
         ORDER BY U2."date_time" DESC, U1."name_nickname" ASC
         LIMIT 1)
ORDER BY "Leaderboards_performance"."trueskill_eta_after" DESC

Essentially the clause:

Q(player__in=OuterRef('player'))

produces valid SQL syntax in 2 and invalid syntax in 3. It loses the braces around the OuterRef that the IN operator demands.

As it happens I can work around this by recasting this as:

Q(player=OuterRef('player'))

which is canonically more correct in any case given (on a code check) this Q object is only ever used inside a query that has a single player and not many and arguably was a code bug on this site. But the observation remains that Django3 is now producing invalid SQL for a query that Django2 produced valid SQL for and there is no mention or hint in the release notes as to an incompatible change in this area, so I've spent some considerable time drilling down to find the cause.

OuterRef should of course always be wrapped in braces when used with an IN ...

Change History (7)

comment:1 by Mariusz Felisiak, 5 years ago

Cc: Simon Charette added
Component: UncategorizedDatabase layer (models, ORM)
Resolution: invalid
Status: newclosed
Summary: OuterRef generates invalid SQL with __in filterOuterRef generates invalid SQL with __in filter.

Thanks for this report, behavior was changed in 3a505c70e7b228bf1212c067a8f38271ca86ce09, however as far as I'm concerned this usage is not supported (see documentation). So it revealed an issue in your code, not in Django.

comment:2 by Bernd Wechner, 5 years ago

Indeed an issue in my code. But also an issue with Django. There is no excuse to go from generating valid SQL to generating invalid SQL. If that usage is not supported, fine, but throw an exception before it bombs on a SQL syntax error. Half the raison d'etre of Django is surely to isolate a web designer from the innards of SQL queries as best possible, and not to thrust them into SQL land trying work out why their ORM is generating invalid SQL, when it could just as well through a lucid DJango exception.

To wit, I would suggest reopening it and flagging a fix if only for Django to provide more lucid feedback than a SQL syntax error.

comment:3 by Martín Gaitán, 3 years ago

Hi, I'm upgrading a project from 2.1 to 3.2 and also hit this issue.

My code does something like this

        agrupaciones_subquery = (
            AgrupacionCircuitos.objects.filter(id__in=self.agrupaciones_a_considerar())
            .filter(
                id__in=(OuterRef("carga__mesa_categoria__mesa__circuito__agrupaciones"))
            )
            .values_list("id", flat=True)
        )
        return (
            self.votos_reportados(categoria, mesas)
            .values_list("opcion__id")
            .annotate(id_agrupacion=Subquery(agrupaciones_subquery))
            .exclude(id_agrupacion__isnull=True)
            .annotate(sum_votos=Sum("votos"))
        )

I can't change id__in=(OuterRef("carga__mesa_categoria__mesa__circuito__agrupaciones")) by id=(OuterRef... ) because in my case, it may contains multiples related "agrupaciones" . Which should be the workaround?

Btw, I also agree with Bernd Wechner about the error raised should be more clear.

Last edited 3 years ago by Martín Gaitán (previous) (diff)

comment:4 by Vasili Korol, 2 years ago

I bumped into this after upgrading from 2.2 to 3.2. Not sure why was this ticket closed, the issue is still there!

Last edited 2 years ago by Vasili Korol (previous) (diff)

in reply to:  4 ; comment:5 by Mariusz Felisiak, 2 years ago

Replying to Vasili Korol:

I bumped into this after upgrading from 2.2 to 3.2. Not sure why was this ticket closed, the issue is still there!

It was closed because this usage has never been officially supported.

in reply to:  5 comment:6 by Bernd Wechner, 2 years ago

Replying to Mariusz Felisiak:

Replying to Vasili Korol:

I bumped into this after upgrading from 2.2 to 3.2. Not sure why was this ticket closed, the issue is still there!

It was closed because this usage has never been officially supported.

It remains a bug that it produces invalid SQL where a prior version did not, and not a sensible exception - causing much wasted time on the part of anyone who used the old (unsupported but functional) syntax.

comment:7 by Mariusz Felisiak, 2 years ago

It remains a bug that it produces invalid SQL where a prior version did not, ...

Yes but unintentionally and only on some databases.

and not a sensible exception - causing much wasted time on the part of anyone who used the old

Feel-free to prepare a patch with raising a Python level error when using many-to-many field in F() or OuterRef() (see also #32414). I would be happy to review.

Note: See TracTickets for help on using tickets.
Back to Top