Opened 15 months ago

Closed 14 months ago

Last modified 10 months ago

#34925 closed Bug (fixed)

refresh_from_db() will not iterate through all of the fields listed in the 'fields' parameter.

Reported by: Andrew Cordery Owned by: Trontelj
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

This one was killing me today. If you pass the 'fields' parameter to refresh_from_db it is supposed to only remove those fields from _prefetched_objects_cache as described in the docs: https://docs.djangoproject.com/en/4.2/ref/models/instances/#django.db.models.Model.refresh_from_db.

Unfortunately, it modifies the 'fields' variable as it iterates through it:

Line 694 of django.db.models.base.py:

            for field in fields:
                if field in prefetched_objects_cache:
                    del prefetched_objects_cache[field]
                    fields.remove(field)

Modifying the list that we are iterating over causes some list items to be skipped. For example here we would expected to see 'a', 'b', and 'c', removed from dict d but 'b' is skipped:

In [8]: d = dict(a=1,b=2, c=3)

In [9]: fields = ['a','b','c']

In [10]: for f in fields:
    ...:     print(f)
    ...:     if f in d:
    ...:         del d[f]
    ...:         fields.remove(f)
    ...: 
a
c

In [11]: fields
Out[11]: ['b']
In [12]: d
Out[12]: {'b': 2}

I beieve that all that needs to be done to correct this is to create a copy of the list by wrapping fields in list(), like so:

In [13]: fields = ['a','b','c']

In [14]: d=dict(a=1,b=2, c=3)

In [15]: for f in list(fields):
    ...:     print(f)
    ...:     if f in d:
    ...:         del d[f]
    ...:         fields.remove(f)
    ...: 
a
b
c

In [16]: d
Out[16]: {}

In [17]: fields
Out[17]: []

Therefore as far as I can tell the fix to the django code would be as easy as wrapping fields in list():

Line 694 of django.db.models.base.py: modified

            for field in list(fields):
                if field in prefetched_objects_cache:
                    del prefetched_objects_cache[field]
                    fields.remove(field)

Change History (16)

comment:2 by Simon Charette, 15 months ago

Triage Stage: UnreviewedAccepted
Version: 4.25.0

Wow it's surprising that this flew under the radar for so long. The bug has been around since we introduced prefetched relationship clearing in #29625.

We should definitely not alter the provided fields and obviously not remove members while iterating over it.

comment:3 by Andrew Cordery, 15 months ago

I searched through the codebase with some admitedly crap regex just looking for similar instances of for x in y: ... y.remove(x), and I found one other:

Line 93 of django/core/management/commands/compilemessages.py:

        for dirpath, dirnames, filenames in os.walk(".", topdown=True):
            for dirname in dirnames:
                if is_ignored_path(
                    os.path.normpath(os.path.join(dirpath, dirname)), ignore_patterns
                ):
                    dirnames.remove(dirname)
                elif dirname == "locale":
                    basedirs.append(os.path.join(dirpath, dirname))

This behavior is explicitly encouraged by python https://docs.python.org/3/library/os.html#os.walk, however in my tests this will still lead to the same kind of member skipping behavior. This particular block of code would still work correctly as long as there was never a 'locale' dirname directly after an ignored dirname in the same path.

For example, imagine the following directory structure, where we wanted to extract the two 'locale' dirs (./locale and ./a/locale) and ignore the 'b' dir.

root:
  a/
      locale/
  b/      
  locale/

If you were to run the following code:

import os
locale_paths=[]
for dirpath, dirnames, filenames in os.walk(".", topdown=True):
    # os.walk returns dirnames in arbitrary order https://docs.python.org/3/library/os.html#os.listdir, 
    # so sort the dirnames to force the skip directory 'b' to come directly before the 'locale' directory
    dirnames.sort()  
    print(f'Iterating through {dirpath}/{dirnames}')
    for dirname in dirnames:
        path = os.path.join(dirpath, dirname)
        print(f'Examining {path}...', end='')
        if dirname == 'b':
            print(f'Ignoring {path}')
            dirnames.remove(dirname)
        elif dirname == 'locale':
            print(f'Found {path}')
            locale_paths.append(path)
        else:
            print('')

You would get the following output:

Iterating through ./['a', 'b', 'locale']
Examining ./a...
Examining ./b...Ignoring ./b
Iterating through ./a/['locale']
Examining ./a/locale...Found ./a/locale
Iterating through ./a/locale/[]
Iterating through ./locale/[]

In [33]: locale_paths
Out[33]: ['./a/locale']

Notice that ./locale is never 'examined' nor does it appear in the 'locale_paths' output variable. It does however get traversed as shown by the 'Iterating through ./locale/[]' message, which makes sense as it was not removed from dirnames. This does mean though that if ./locale/ had had a 'locale' subdirectory in it, that directory would have been picked up, even though the parent was missed.

Naturally, changing the line to for dirname in list(dirnames) also completely resolves this issue.

Not sure any of that matters or is useful to you guys, but it was interesting to me. Moral of the story is don't ever mutate a list we are iterating through, even if the python docs seem to imply that you can! I think their docs should be amended to specifically state that you should NOT mutate it during the initial iteration, and that instead you should iterate through a copy (ex list(dirnames)) or only mutate it once you have examined every member of the list that you are interested in, right before your code exits.

comment:4 by David Sanders, 15 months ago

Hi Andrew,

Thanks for you investigation, interested in submitting a patch & test? 🏆

comment:5 by Trontelj, 15 months ago

Hello, I'm new to this community, but I have three years of experience working with Django, and I'm eager to contribute. Can I work on this ticket?

comment:6 by David Sanders, 15 months ago

@trontelj sure 👍 Submit a PR and assign yourself. If you need assistance please head over to Discord to seek help from a community member.

comment:7 by Trontelj, 15 months ago

Owner: changed from nobody to Trontelj
Status: newassigned

PR is ready

comment:8 by Tim Graham, 15 months ago

Needs tests: set

All patches require tests.

comment:9 by Mariusz Felisiak, 15 months ago

Has patch: set

comment:10 by Trontelj, 15 months ago

Needs tests: unset

I have added tests. The failed tests are not related to the changes I made in my code.

Last edited 15 months ago by Trontelj (previous) (diff)

comment:11 by David Sanders, 15 months ago

Needs tests: set

comment:12 by Natalia Bidart, 15 months ago

Needs tests: unset
Patch needs improvement: set

comment:13 by Mariusz Felisiak, 14 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 14 months ago

Resolution: fixed
Status: assignedclosed

In b0ec87b8:

Fixed #34925 -- Prevented Model.refresh_from_db() from mutating list of fields.

comment:15 by GitHub <noreply@…>, 14 months ago

In 978680d:

Refs #34925 -- Avoided altering passed by reference refresh_from_db(fields).

Follow up to b0ec87b8578147be4357c90eabcd2b916c780810.

comment:16 by Graham Herceg, 10 months ago

Hi there. Can anyone confirm that there are no plans to backport this to 4.2 since it is neither a data loss or security issue? It would be nice to have, but I understand if those of us working with the 4.2 LTS release just have to work around this bug. Thanks.

comment:17 by Tim Graham, 10 months ago

Correct. If applicable, backports are done immediately after merging a patch to the main branch.

Note: See TracTickets for help on using tickets.
Back to Top