Opened 8 years ago

Closed 5 years ago

#27408 closed New feature (wontfix)

Make QuerySet.bulk_create() populate fields on related models

Reported by: Jarek Glowacki Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: bulk_create foreign key id
Cc: Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

As of dj110 bulk_create returns pks for the objects it has created. Great! I want more.

Going to hop right into the use case:

class A(models.Model):
    pass


class B(models.Model):
    a = models.ForeignKey(A, on_delete=models.PROTECT)
a = A()
b = B(a=a)  # doesn't set `a_id`, since `a` has no id assigned yet.
A.objects.bulk_create([a])  # sets `a`'s id.
B.objects.bulk_create([b])  # error!

The first bulk_create call sets the id on a, but b.a_id remains None.
When running the second bulk_create, even though b has an a set, it fails to save a onto the b instance, due to a_id being None.

So if you imagine the pregeneration of several related models in a big loop, followed by several consecutive bulk_create calls we run into trouble. It's nicer than having to manually feed in ids during the loop as we needed to in the past, but it still needs some magic.

My current solution is running a little function between each bulk_create call, to fill the foreignkey ids in:

def fill_foreignkey_ids(objects, fields):
    for f in fields:
        for o in objects:
            setattr(o, '%s_id' % f, getattr(o, f).id)

Which makes:

a = A()
b = B(a=a)  # doesn't set `a_id`, since `a` has no id assigned yet.
bulk_a = [a]
bulk_b = [b]
A.objects.bulk_create(bulk_a)  # sets `a`'s id.
fill_foreignkey_ids(bulk_b, ['a'])
B.objects.bulk_create(bulk_b)  # now it works. Lovely.

But this is quite ugly.

This is more of a brainstorm ticket, as I'm not sure how to solve this nicely. Expecting every unsaved model instance to update its foreignkey ids when a bulk_create inserts them sounds seems silly. But maybe we could have the bulk_create method check whether b.a.id is set when b.a_id isn't?

I can submit a PR if this sounds like a reasonable thing to do. It feels a bit magic, but it would greatly improve the usefulness of this new feature.

Change History (8)

comment:1 by Simon Charette, 8 years ago

I haven't tried it myself but I suspect this has little to do with bulk_create itself. Does the following fails in a similar manner?

a = A()
b = B(a=a)
a.save()
b.save()

comment:2 by Jarek Glowacki, 8 years ago

Yep, that fails too.

The difference is use case. For single saves, it's easy to pass the id through. For bulk_creates it's a bit messier to do that.

I suggested only updating bulk_create (to automatically figure out what to do in that second call) as I wasn't sure it was feasible to make this work in the general case - but maybe by putting some more magic in the foreignkey field's __get__ could work?

comment:3 by Tim Graham, 8 years ago

Description: modified (diff)
Summary: Multiple consecutive `bulk_create`s with FK.Make QuerySet.bulk_create() populate fields on related models
Triage Stage: UnreviewedSomeday/Maybe

I'm not sure if this is a good idea due to magic/complexity, but if we had a patch, we could better evaluate it. I guess your use case doesn't allow constructing B after A.objects.bulk_create()?

comment:4 by Jarek Glowacki, 8 years ago

That's correct, my use case requires foring through a large blob of data, parsing it and creating several different models per iteration. So all the bulk_creates happen at the end.

I'll have a crack at a PR that solves this nicely (if such a thing exists) and get back to you.

comment:5 by Simon Charette, 8 years ago

From what I remember there's already a ticket about keeping a foreign key attribute and its underlying _id one synchronized which is the real issue here as demonstrated by the failing example using save in comment:1.

comment:6 by Jarek Glowacki, 8 years ago

I believe you're referring to this: #9553

The proposed solution there was to proactively update all instances that had the just-saved instance as an FK. Big change, probably could hit performance pretty hard in some situations.
If we want to solve this in the general case I'd be more inclined to just use a fallback in the field descriptor's __get__, so that we're not updating instances that might not end up getting accessed.

It's tricky, because the general case really wouldn't benefit much from this, other than keeping things consistent. Hence I thought it might be better to focus on just making it work for bulk_create (maybe with an explicitly set flag even, to make it obvious).

I'll try both approaches over the weekend and see if I can come up with something nice.

comment:7 by David Grant, 5 years ago

My solution to this problem is the following:

class BulkHelperMixin:
    def bulk_create(self, objs, batch_size=None, ignore_conflicts=False):
        foreign_key_fields = [field for field in objs[0]._meta.get_fields() if type(field) == ForeignKey]
        for obj in objs:
            for foreign_key_field in foreign_key_fields:
                parent_obj = getattr(obj, foreign_key_field.name)
                if parent_obj and getattr(obj, foreign_key_field.column) is None:
                    setattr(obj, foreign_key_field.name, parent_obj)
        return super().bulk_create(objs, batch_size=batch_size, ignore_conflicts=ignore_conflicts)


class BulkHelperManager(BulkHelperMixin, models.Manager):
    pass

Then in my model I do something like:

    objects = models.Manager()
    bulk_manager = BulkHelperManager()

and then I just do bulk_create using my bulk_manager starting from the parent going down to the children. It works beautifully!

comment:8 by Mariusz Felisiak, 5 years ago

Resolution: wontfix
Status: newclosed

Closing in favor of #31539.

Note: See TracTickets for help on using tickets.
Back to Top