Opened 7 years ago

Last modified 5 months ago

#27833 assigned Bug

prefetch_related fails with SQLite when used with 1000 parent records

Reported by: Jason Barnabe Owned by: Yashas Donthi
Component: Database layer (models, ORM) Version: 1.10
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

This is described in ticket:16937#comment:3, but I don't see any issue filed for it.

"[prefetch_related] will not work on some backends if you have a lot of objects in your queryset. For example the SQLite backend has a limitation of 999 parameters to single SQL query, so if you have 1000 objects, prefetch_related will fail as you need to supply 1000 id values to the query."'

Batch it up like in #16426 and #17788, which dealt with the same limitation?

Change History (12)

comment:1 by Tim Graham, 7 years ago

Description: modified (diff)
Triage Stage: UnreviewedAccepted

comment:2 by Raphael Michel, 7 years ago

Owner: changed from nobody to Raphael Michel
Status: newassigned

comment:3 by Raphael Michel, 7 years ago

Has patch: set

comment:4 by François Freitag, 7 years ago

Patch needs improvement: set

comment:5 by Raphael Michel, 7 years ago

Patch needs improvement: unset

comment:6 by Tim Graham, 7 years ago

Patch needs improvement: set

comment:7 by Raphael Michel, 7 years ago

Patch needs improvement: unset

comment:8 by Tim Graham, 7 years ago

Triage Stage: AcceptedReady for checkin

comment:9 by François Freitag, 7 years ago

Triage Stage: Ready for checkinAccepted

As mentioned in the PR, I think there is one major issue: prefetch_related expects the results of the prefetch query to be unique.
Splitting the query in batches do not enforce uniqueness and can lead to incorrect results. In short, if related objects are shared between several instances, when instances are split into batches, nothing prevents a related objects from appearing twice in the results. For example:

# Suppose there are 2 instances (A, B) and 2 related objects (1, 2)
A -> 1
B -> 2
A -> 1

Prefetching in batches of 2 results in prefetch queries for: [A, B] then [C].
Which gives the following mapping of related objects: [(A, 1), (B, 2)] then [(A, 1)]. The rel_obj_cache would then be:

{
    (1,): [A, A],  # A should only be present once
    (2,): [B]
}

A more detailed example is available on the PR.

Maybe the way forward is to use a set or a dict to store the related queries' results (either for all_related_objects or for rel_obj_cache).

comment:10 by Tim Graham, 7 years ago

Patch needs improvement: set

comment:11 by Mariusz Felisiak, 14 months ago

Owner: Raphael Michel removed
Status: assignednew

comment:12 by Yashas Donthi, 5 months ago

Owner: set to Yashas Donthi
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top