Opened 17 months ago

Closed 17 months ago

Last modified 17 months ago

#34207 closed Bug (duplicate)

Incorrect SQL query when adding a ManyToMany related object with a "through" table prevents adding a new relationship if the new relationship is identical except for a different value for "through_defaults"

Reported by: Credentive Owned by: nobody
Component: Database layer (models, ORM) Version: 4.1
Severity: Normal Keywords: ManyToManyField through
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I cannot add a related element to an object that is related via a ManyToMany relationship with a through field. I believe I have traced the error to a bug in the SQL query generated.

Some Code extracts to set the stage:

models.py:

class Version(models.Model):
uuid = models.UUIDField(primary_key=True, unique=True, editable=False, default=uuid.uuid4, )
<...>

class Section(models.Model):
uuid = models.UUIDField(primary_key=True, unique=True, editable=False, default=uuid.uuid4, )
<...>


class Statement(models.Model):
uuid = models.UUIDField(primary_key=True, unique=True, editable=False, default=uuid.uuid4, )
<...>
section = models.ManyToManyField(
        Section, through="StatementLocation", symmetrical=False, related_name="statements_in_section"
    )


class StatementLocation(models.Model):
    section = models.ForeignKey(Section, on_delete=models.CASCADE)
    statement = models.ForeignKey(Statement, on_delete=models.CASCADE)
    sl_version = models.ForeignKey(Version, on_delete=models.CASCADE)

    class Meta:
        constraints = [
            models.UniqueConstraint(
                fields=["section", "statement", "sl_version"], name="unique_statement_location_per_version"
            ),
            models.UniqueConstraint(fields=["statement", "sl_version"], name="statement_appears_once_per_version"),
        ]

The idea behind the model is that we are representing a document that has statements in sections. Statements can be moved around in different versions, so the relationship between a statement and a section, called "statement location", must be tied to a version.

I have a function under the Version model called "copy". The "copy" function produces a copy of the version and recreates the existing relationships.

models.Version (extract):

    def copy(self, name: str, public: bool) -> "Version":
        # Make a copy of a published version and return a reference to the copy
        # Create the new version object in the Database
        new_version = Version(
            name=name,
            state="DRAFT",
            public=public,
            obsoletes=self,
            authority=self.authority,
        )
        new_version.save()

        return new_version
        <...>

        for statement in self.version_data.statements:
            <...>
            sect = statement.section.filter(version=self).get()
            statement.section.add(sect, through_defaults={"sl_version": new_version})
            <...>

When executing the function, I noticed that the new "StatementLocation" objects were not being created. For example:

>>> Version.objects.all()
<QuerySet [<Version: Document - 2.2>]>
>>> version = Version.objects.get()
>>> StatementLocation.objects.count()
89
>>> StatementLocation.objects.filter(sl_version=version).count()
89
>>> new_verson = version.copy(name="2.3", public=True)
>>> new_verson
<Version: Document - 2.3>
>>> StatementLocation.objects.count()
89
>>> StatementLocation.objects.filter(sl_version=new_verson).count()
0
>>> 

If I try a simplified version, with SQL query logging enabled, I get the following:

# Create New Version
>>> new_version = Version(name="2.3", state="DRAFT", public=True, obsoletes=version, authority=version.authority)

# Save It
>>> new_version.save()
(0.002) SELECT "policypublisher_version"."uuid", "policypublisher_version"."name", "policypublisher_version"."state", "policypublisher_version"."public", "policypublisher_version"."published_date", "policypublisher_version"."effective_date", "policypublisher_version"."obsoletes_id", "policypublisher_version"."authority_id" FROM "policypublisher_version" WHERE "policypublisher_version"."uuid" = 'bcbb218ebdc5435b9e5caf879b55cd24' LIMIT 21; args=('bcbb218ebdc5435b9e5caf879b55cd24',); alias=default
(0.014) INSERT INTO "policypublisher_version" ("uuid", "name", "state", "public", "published_date", "effective_date", "obsoletes_id", "authority_id") VALUES ('bcbb218ebdc5435b9e5caf879b55cd24', '2.3', 'DRAFT', 1, NULL, NULL, 'be8225c85fab4a0ba8ea879a3d992abe', '833c1b9b30964f3a90b045e2a1254c1b'); args=('bcbb218ebdc5435b9e5caf879b55cd24', '2.3', 'DRAFT', True, None, None, 'be8225c85fab4a0ba8ea879a3d992abe', '833c1b9b30964f3a90b045e2a1254c1b'); alias=default

# Get a random statement
>>> statement=Statement.objects.first()
(0.003) SELECT "policypublisher_statement"."uuid", "policypublisher_statement"."text", "policypublisher_statement"."type", "policypublisher_statement"."public", "policypublisher_statement"."obsoletes_id" FROM "policypublisher_statement" ORDER BY "policypublisher_statement"."uuid" ASC LIMIT 1; args=(); alias=default

# Make sure we got one
>>> statement
<Statement: Rules for interpreting the pivFASC-N name>

# Add the new version to the statement's version list
>>> statement.version.add(new_version)
(0.000) BEGIN; args=None; alias=default
(0.006) INSERT OR IGNORE INTO "policypublisher_statement_version" ("statement_id", "version_id") VALUES ('0509afa269604028818dc1f53b8f93d2', 'bcbb218ebdc5435b9e5caf879b55cd24'); args=('0509afa269604028818dc1f53b8f93d2', 'bcbb218ebdc5435b9e5caf879b55cd24'); alias=default

# get the section for the statement under the old version
>>> sect=statement.section.filter(version=version).get()
(0.001) SELECT "policypublisher_section"."uuid", "policypublisher_section"."name", "policypublisher_section"."public" FROM "policypublisher_section" INNER JOIN "policypublisher_statementlocation" ON ("policypublisher_section"."uuid" = "policypublisher_statementlocation"."section_id") INNER JOIN "policypublisher_section_version" ON ("policypublisher_section"."uuid" = "policypublisher_section_version"."section_id") WHERE ("policypublisher_statementlocation"."statement_id" = '0509afa269604028818dc1f53b8f93d2' AND "policypublisher_section_version"."version_id" = 'be8225c85fab4a0ba8ea879a3d992abe') LIMIT 21; args=('0509afa269604028818dc1f53b8f93d2', 'be8225c85fab4a0ba8ea879a3d992abe'); alias=default

# Make Sure we got one
>>> sect
<Section: Rules for Interpreting Various Name Forms>

# Add a new section via the through field # NOTE NO INSERT!
>>> statement.section.add(sect, through_defaults={"sl_version":new_version})
(0.000) BEGIN; args=None; alias=default
(0.002) SELECT "policypublisher_statementlocation"."section_id" FROM "policypublisher_statementlocation" WHERE ("policypublisher_statementlocation"."section_id" IN ('1cc10c2e5bba4d238d5fabbfdfd72d83') AND "policypublisher_statementlocation"."statement_id" = '0509afa269604028818dc1f53b8f93d2'); args=('1cc10c2e5bba4d238d5fabbfdfd72d83', '0509afa269604028818dc1f53b8f93d2'); alias=default

# No new object was created
>>> StatementLocation.objects.filter(sl_version=new_version).count()
(0.003) SELECT COUNT(*) AS "__count" FROM "policypublisher_statementlocation" WHERE "policypublisher_statementlocation"."sl_version_id" = 'bcbb218ebdc5435b9e5caf879b55cd24'; args=('bcbb218ebdc5435b9e5caf879b55cd24',); alias=default
0

On the second to last statement, I believe Django is checking to see whether there is an existing M2M relationship, but it is not adding the version_id to the query, so it thinks the relationship already exists!

Note that I can manually create the relationship, so this is not a constraints problem:

>>> new_sl = StatementLocation(section=sect, statement=statement, sl_version=new_version)
>>> new_sl.save()
(0.027) INSERT INTO "policypublisher_statementlocation" ("section_id", "statement_id", "sl_version_id") VALUES ('1cc10c2e5bba4d238d5fabbfdfd72d83', '0509afa269604028818dc1f53b8f93d2', 'bcbb218ebdc5435b9e5caf879b55cd24'); args=['1cc10c2e5bba4d238d5fabbfdfd72d83', '0509afa269604028818dc1f53b8f93d2', 'bcbb218ebdc5435b9e5caf879b55cd24']; alias=default
>>> StatementLocation.objects.filter(sl_version=new_version).count()
(0.001) SELECT COUNT(*) AS "__count" FROM "policypublisher_statementlocation" WHERE "policypublisher_statementlocation"."sl_version_id" = 'bcbb218ebdc5435b9e5caf879b55cd24'; args=('bcbb218ebdc5435b9e5caf879b55cd24',); alias=default
1
>>> 

The FIX:

When a ManyToMany relationship is created with a "through" table, and through_defaults are provided, that should be part of the search query that django uses to determine if the relationship already exists.

e.g.
SELECT "policypublisher_statementlocation"."section_id" FROM "policypublisher_statementlocation" WHERE ("policypublisher_statementlocation"."section_id" IN ('1cc10c2e5bba4d238d5fabbfdfd72d83') AND "policypublisher_statementlocation"."statement_id" = '0509afa269604028818dc1f53b8f93d2') AND "policypublisher_statementlocation"."sl_version_id" = 'bcbb218ebdc5435b9e5caf879b55cd24';

Change History (4)

comment:1 by Simon Charette, 17 months ago

Resolution: duplicate
Status: newclosed

through_defaults are values that are used to create a missing relationship between two entities that related to each other through a many-to-many relationship.

If the relationship already exists between the origin and the destination, like it's the case when you do statement.section.add(sect, through_defaults={"sl_version":new_version}), then they are ignored as expected as ManyToManyField only supports a single related value between two entities.

This is reflected in other APIs such as statement.section.remove which doesn't specify through values.

In other words, while ManyToManyField supports relationship annotations these annotations cannot uniquely define the relationship between the related objects; ManyToManyField doesn't support multi-dimensional relationships.

Duplicate of #33209

comment:2 by Credentive, 17 months ago

Your example documentation strongly suggests otherwise (while cautioning about "remove"), see here: https://docs.djangoproject.com/en/4.1/topics/db/models/#extra-fields-on-many-to-many-relationships

>>> ringo = Person.objects.create(name="Ringo Starr")
>>> paul = Person.objects.create(name="Paul McCartney")
>>> beatles = Group.objects.create(name="The Beatles")
>>> m1 = Membership(person=ringo, group=beatles,
...     date_joined=date(1962, 8, 16),
...     invite_reason="Needed a new drummer.")
>>> m1.save()
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>]>
>>> ringo.group_set.all()
<QuerySet [<Group: The Beatles>]>
>>> m2 = Membership.objects.create(person=paul, group=beatles,
...     date_joined=date(1960, 8, 1),
...     invite_reason="Wanted to form a band.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>]>
>>> Membership.objects.create(person=ringo, group=beatles,
...     date_joined=date(1968, 9, 4),
...     invite_reason="You've been gone for a month and we miss you.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>, <Person: Ringo Starr>]>
>>> # This deletes both of the intermediate model instances for Ringo Starr
>>> beatles.members.remove(ringo)
>>> beatles.members.all()
<QuerySet [<Person: Paul McCartney>]>

comment:3 by Simon Charette, 17 months ago

Your example documentation strongly suggests otherwise

Could you elaborate on that? From what I can read none of the examples you linked suggest that add supports this feature?

There is effectively a section that points at the fact that remove(related) will remove all entries matching the (from, to) tuple but defined by the many-to-many but I don't see how that demonstrates that your reported use case should be supported.

In the example you've just provided you don't use members.add to add the second relationship to Ringo but rely on explicit Membership creation which is supported.

To make it clear, intermediary models that don't define a strict unique relationship on (from, to) are supported. However add is focused on enforcing that at least one entry of the (from, to) tuple exists and allows for insertion defaults to be provided if the relationship is missing via through_defaults. It doesn't make any assumption with regards to the unique constraints defined on the intermediary model.

If you want to enforce the unique existence of a tuple that is a superset of the the (from, to) relationship you should use get_or_create instead of expecting add to figure out which of your unique constraints should be enforced based on your provided through_defaults (as there could be many).

StatementLocation.objects.get_or_create(
    section=sect,
    statement=statement,
    sl_version=new_version,
)
Last edited 17 months ago by Simon Charette (previous) (diff)

comment:4 by Credentive, 17 months ago

I've re-read the section, and while it's not stated that it will work, it's also not clear to me that it won't.

Now that you have set up your ManyToManyField to use your intermediary model (Membership, in this case), you’re ready to start creating some many-to-many relationships. You do this by creating instances of the intermediate model:

>>> ringo = Person.objects.create(name="Ringo Starr")
>>> paul = Person.objects.create(name="Paul McCartney")
>>> beatles = Group.objects.create(name="The Beatles")
>>> m1 = Membership(person=ringo, group=beatles,
...     date_joined=date(1962, 8, 16),
...     invite_reason="Needed a new drummer.")
>>> m1.save()
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>]>
>>> ringo.group_set.all()
<QuerySet [<Group: The Beatles>]>
>>> m2 = Membership.objects.create(person=paul, group=beatles,
...     date_joined=date(1960, 8, 1),
...     invite_reason="Wanted to form a band.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>]>


You can also use add(), create(), or set() to create relationships, as long as you specify through_defaults for any required fields:

>>> beatles.members.add(john, through_defaults={'date_joined': date(1960, 8, 1)})
>>> beatles.members.create(name="George Harrison", through_defaults={'date_joined': date(1960, 8, 1)})
>>> beatles.members.set([john, paul, ringo, george], through_defaults={'date_joined': date(1960, 8, 1)})
You may prefer to create instances of the intermediate model directly.

If the custom through table defined by the intermediate model does not enforce uniqueness on the (model1, model2) pair, allowing multiple values, the remove() call will remove all intermediate model instances:

>>> Membership.objects.create(person=ringo, group=beatles,
...     date_joined=date(1968, 9, 4),
...     invite_reason="You've been gone for a month and we miss you.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>, <Person: Ringo Starr>]>
>>> # This deletes both of the intermediate model instances for Ringo Starr
>>> beatles.members.remove(ringo)
>>> beatles.members.all()
<QuerySet [<Person: Paul McCartney>]>

Looking at that text again, it is not clear that add() will not work for multiple references to the same object, it's just suggested that adding relationship objects directly is a matter of preference. It may be good to clarify in the documentation that add does not support adding multiple references to the same object. Would have saved me a few hours :)

In fact, all of the calls through the intermediate model kind of break down, because you can't filter on attributes of the intermediate model in the calls. I will be using the intermediate model for this, but actually removing the manytomany relationship, as it doesn't really work. For anyone who wants to create a self-referential relationship where the values of the intermediate objects are intended to be used to filter the relationship sets, the ManyToMany field won't support this, while the intermediate objects themselves work fine. The only thing I could think of that might make the work the way I want would be to write a custom ManyRelatedManager, but I haven't seen any documentation on how to do this, and it might not be worth the time just to get the ability to use the manytomany methods.

If, in the example in the documentation, you wanted to query for the members of the beatles as of a particular date, you would run into the same problems I have. My whole use case is built around this, so I will just use the intermediate table, and ignore all(), get(), add() etc.. The use of the date in the example kind of suggests this could be possible when in fact it isn't.

Note: See TracTickets for help on using tickets.
Back to Top