Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#32711 closed Uncategorized (worksforme)

MySQL performance regression in forms.models.apply_limit_choices_to_to_formfield

Reported by: Thomas Achtemichuk Owned by: nobody
Component: Forms Version: 3.2
Severity: Normal Keywords:
Cc: Adam Johnson Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This commit related to #11707 in Django 3.2 introduced changes to apply_limit_choices_to_to_formfield. The query generated, for example, with this field on a model:

editor = models.ForeignKey(
    settings.AUTH_USER_MODEL,
    null=True,
    blank=True,
    limit_choices_to={
        'is_staff': True,
        'groups__name': 'Editor',
    },
    on_delete=models.SET_NULL)

results in the following query which takes ~25s to return on a db.r5.xlarge RDS instance running MySQL 5.7:

SELECT
  `auth_user`.`id`,
  `auth_user`.`password`,
  `auth_user`.`last_login`,
  `auth_user`.`is_superuser`,
  `auth_user`.`username`,
  `auth_user`.`first_name`,
  `auth_user`.`last_name`,
  `auth_user`.`email`,
  `auth_user`.`is_staff`,
  `auth_user`.`is_active`,
  `auth_user`.`date_joined`
FROM
  `auth_user`
WHERE EXISTS(
  SELECT
    (1) AS `a`
  FROM
    `auth_user` U0
    INNER JOIN
      `auth_user_groups` U1 ON (U0.`id` = U1.`user_id`)
    INNER JOIN
      `auth_group` U2 ON (U1.`group_id` = U2.`id`)
    WHERE (
      U2.`name` = 'Editor'
      AND U0.`is_staff`
      AND U0.`id` = `auth_user`.`id`
    )
  LIMIT 1
)
ORDER BY `auth_user`.`date_joined` DESC;

...
21 rows in set (26.18 sec)

Counts for the involved tables:

mysql> SELECT COUNT(*) FROM auth_user;
+----------+
| COUNT(*) |
+----------+
|  3360439 |
+----------+

mysql> SELECT COUNT(*) FROM auth_group;
+----------+
| COUNT(*) |
+----------+
|       14 |
+----------+

mysql> SELECT COUNT(*) FROM auth_user_groups;
+----------+
| COUNT(*) |
+----------+
|      460 |
+----------+

And the EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: auth_user
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3201287
     filtered: 100.00
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: U2
   partitions: NULL
         type: const
possible_keys: PRIMARY,name
          key: name
      key_len: 452
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: U1
   partitions: NULL
         type: eq_ref
possible_keys: user_id,auth_user_groups_403f60f,auth_user_groups_425ae3c4
          key: user_id
      key_len: 8
          ref: bso.auth_user.id,const
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 4. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: U0
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: bso.auth_user.id
         rows: 1
     filtered: 90.00
        Extra: Using where

Change History (3)

comment:1 by Mariusz Felisiak, 3 years ago

Cc: Adam Johnson added
Resolution: worksforme
Status: newclosed

Thanks for the report, however I cannot reproduce this issue on MySQL 8.0.23. All indexes are used correctly:

>>> User.objects.filter(Exists(User.objects.filter(Q(pk=OuterRef('pk')) & Q(is_staff=True) & Q(groups__name='group_480')))).explain(analyze=True)

-> Nested loop inner join  (cost=4.20 rows=4) (actual time=0.110..0.138 rows=2 loops=1)    
    -> Nested loop inner join  (cost=2.80 rows=4) (actual time=0.081..0.109 rows=4 loops=1)        
        -> Index lookup on U1 using auth_user_groups_group_id_97559544_fk_auth_group_id (group_id='481')  (cost=1.40 rows=4) (actual time=0.058..0.065 rows=4 loops=1)        
        -> Single-row index lookup on auth_user using PRIMARY (id=U1.user_id)  (cost=0.28 rows=1) (actual time=0.009..0.010 rows=1 loops=4)    
    -> Filter: (0 <> U0.is_staff)  (cost=0.27 rows=1) (actual time=0.006..0.006 rows=0 loops=4)        
        -> Single-row index lookup on U0 using PRIMARY (id=U1.user_id)  (cost=0.27 rows=1) (actual time=0.004..0.005 rows=1 loops=4)

It looks that you don't have an index on auth_user.id, but I don't know why.

comment:2 by Simon Charette, 3 years ago

It looks that you don't have an index on auth_user.id, but I don't know why.

I doubt that's the issue, MySQL is just notably really picky with regards to subqueries optimizations

https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html

It seems we are properly following the docs though with our outer_expr=inner_expr usage via U0.id = auth_user.id unless MySQL doesn't like us swapping to inner_expr=outer_expr format?

Thomas, are you using the latest version of MySQL 5.7 and does swapping AND U0.id = auth_user.id to AND auth_user.id = U0.id in the subquery make any difference?

Last edited 3 years ago by Simon Charette (previous) (diff)

comment:3 by Thomas Achtemichuk, 3 years ago

Thanks for following up Mariusz & Simon.

I agree that this smacks of typically poor MySQL subquery optimization, which is one of the reasons we're planning on migrating to PostgreSQL in the near future. In the meantime, I've monkeypatched in the apply_limit_choices_to_to_formfield from 3.1 which is doing the trick for us. And as 5.7 is quickly approaching EOL, probably not worth any more time/attention to the issue we're seeing.

Simon, we're on MySQL 5.7.25, so a few minor versions back from the current GA release of 5.7.34 (or 5.7.33, available on RDS). Swapping the order of the condition doesn't seem to have any effect.

Note: See TracTickets for help on using tickets.
Back to Top