Opened 3 years ago

Closed 5 weeks ago

#17673 closed Bug (fixed)

Forbid field shadowing in model multi-inheritance

Reported by: akaariai Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords: validation
Cc: akaariai, krzysiumed@…, chrismedrela Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I was working on ticket #13803 and got a little carried away. I ended up forbidding field shadowing in model multi-inheritance cases. The attached patch also includes checks for field.attname clashing with field.name, disallowing 'pk' as a field name (what #13803 is about) and __ in field names.

Except for the multi-inheritance the validations should be easily seen as necessary ones.

Now, why forbid field shadowing for multi-inheritance? The basic problem is that in the DB you can have different values for the similarly named fields, but in Python you can have just a single value. The most trivial example is this:

class A:
    f = models.IntegerField()
class B:
    f = models.DateField()
class C(A, B)
    pass

In this case it is obvious that things will not work as expected as you simply can't save the C instance's value for f as both date and integer.

The above example has one even more serious problem: there is a clash between the id fields of A and B. This can lead to pretty confusing situations. The most obvious one is that both A and B get their id value from the sequence when saving a new model. However A can get 10 as the last value, B 11. Now, you have a C model with a value of both 10 and 11 for the id. Save it and suddenly you will duplicate, or even overwrite an object from either B or C model. If 10 is in effect, you will overwrite B with id 10, if 11 is in effect you will create a new A with value 11. In the last case the sequence will be out of sync, and next time you save A you will get a UniqueViolation.

I am not exactly sure if the above is what happens. Other possibility is that A's value will be used as B's id value on the first save already. In this case things work like: Save C -> Save A (gets 10 as id) -> Save B (sees 10 as id) -> overwrite of B instance of id value 10.

Thus, field shadowing should be forbidden. This will be backwards incompatible (there is even one model having this problem in the test suite). But there is little hope of fixing this without making the code a lot more complex. I think this case can be classified as a data corrupting bug (due to the id problem), and thus backwards incompatibility is not in effect. The backwards compatibility is solvable by changing the field names to non-clashing ones in one of the parent models. This is a problem only in multi-inheritance cases, which should not be too commonly used.

I think I have seen some tickets where the real underlying problem has been model-inheritance with field shadowing. However, I can't find them now. I don't think this is anything too critical, but this would be a nice thing to fix. Things will seem to work at first, but then later on you will get weird data overwrite problems which can be pretty hard to debug.

Attachments (5)

17673.diff (8.2 KB) - added by akaariai 3 years ago.
broken_inheritance.diff (859 bytes) - added by akaariai 3 years ago.
test demonstrating the problem with field shadowing
17673_v2.diff (7.9 KB) - added by krzysiumed 3 years ago.
akaariai's patch with my small improvements
17673_indentation.diff (29.6 KB) - added by krzysiumed 3 years ago.
proposition of improving indentation in django/core/management/validation.py
multi_inheritance_field_clash.patch (612 bytes) - added by aron45 6 weeks ago.
disallow multi inheritance (eg. grandfather) field clash

Download all attachments as: .zip

Change History (24)

Changed 3 years ago by akaariai

comment:1 Changed 3 years ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Attached (and inline) is a proof that infact multi-inheritance with shadowed fields is broken:

def test_broken_inheritance(self):
    w = Worker(name="Not Wilma", age=349)
    w.save()
    sw1 = StudentWorker()
    sw1.name = "Wilma"
    sw1.age = 35
    sw1.save()
    w = Worker.objects.get(pk=w.pk)
    self.assertEquals(w.name, "Not Wilma") # this fails!

The models are from model_inheritance tests. The important thing to note is that studentworker inherits from both student and worker, and both parents have id field defined separately. Above, the "not wilma" worker has been overwritten due to save of completely unrelated object => data loss.

What happens? When saving, the student parent gets an id from sequence (1 in this case), then worker parent is saved and the save sees the id of 1 already set in the model by the student save, and thus overwrites the existing worker with id 1.

Changed 3 years ago by akaariai

test demonstrating the problem with field shadowing

comment:2 Changed 3 years ago by krzysiumed

  • Cc krzysiumed@… added
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

The patch is good, but I would like to suggest some small improvements. There are very negligible, so feel free to ignore me if you want.

For example:

clash = used_fields.get(f.name) or used_fields.get(f.attname) or None
# Why not just:
clash = f.name in used_fields or f.attname in used_fields

Next one:

                    e.add(opts, 'The field "%s" from parent model "%s" clashes with the field '
                                '"%s" from another parent model "%s"' % (
                          f.name, f.model._meta, clash.name, clash.model._meta
                    ))

I prefer following (because of PEP8):

                    e.add(opts, 'The field "%s" from parent model "%s" clashes with the field '
                                '"%s" from another parent model "%s"' % (
                                    f.name, f.model._meta, clash.name, clash.model._meta))

Another thing is names. I think Parent1, Parent2 and Child1 are not meaningful enough. We can use ChildShadowingField or something like that. I really don't know how to call other two models. Maybe FirstParent and SecondParent would be better -- they are not more meaningful, but the difference between them is more visible.

And the last one which is less negligible, I think we should not comment old code. If somebody want to know code of StudentWorker model, he or she can just use version control system. We can just leave a small note that model StudentWorker was deleted and point to this ticket.

comment:3 Changed 3 years ago by akaariai

The clash variable is the clashing field, not a boolean. It is later on used (actually in the second snippet above). The last "or None" is redundant so it could be removed. The variable should probably be named clashing_field, so there would not be confusion.

I think the PEP8 makes sense, as do the renames.

You are absolutely correct that the StudentWorker model should be removed completely. I should have done that.

If you feel like it you can upgrade the patch. If you see places which are hard to understand add comments or ask for me to do that. In my opinion that is the best way to make sure the code is understandable - the reviewer adding comments to places where he has to think "too much". Anyways, thanks for the feedback!

comment:4 Changed 3 years ago by charettes

Just a small tweak... you could use django.db.models.sql.constants.LOOKUP_SEP instead of '__' .

Changed 3 years ago by krzysiumed

akaariai's patch with my small improvements

Changed 3 years ago by krzysiumed

proposition of improving indentation in django/core/management/validation.py

comment:5 Changed 3 years ago by krzysiumed

  • Patch needs improvement unset

I upgraded akaariai's patch (file 17673_v2.diff):

  • Renamed some models. Renamed clash to clashing_field.
  • Deleted commented code. Some changes in comments.
  • django.db.models.sql.constants.LOOKUP_SEP instead of '__'.
  • Improved indentation.

I left 'or None' in line 49 from validation.py because it makes clear that if 'used_fields.get(f.name)' and 'used_fields.get(f.attname)' are both None then clashing_field==None.

About indentation: what do you prefer?

                clashing_field = used_fields.get(f.name) or used_fields.get(f.attname) or None  ### more than 80 characters

or

                clashing_field = (used_fields.get(f.name) or 
                                  used_fields.get(f.attname) or 
                                  None) 

I attached another patch (file 17673_indentation.py) which is not connected to this ticket but it changes the same file (django/core/management/validation.py). It shows how the code looks when we stick with PEP8 especially with 'max 80 character per line' rule. What do you think about it?

akaariai, I agree. This kind of working in pairs let us to make code better.

Last edited 3 years ago by krzysiumed (previous) (diff)

comment:6 Changed 3 years ago by akaariai

The re-indentations look good to me (apart of the "clash -> clashing_field" typo there).

I would like to get a second opinion on the validations part. There the only controversial thing is forbidding field shadowing. But this is just to prevent model structures which are known to be buggy. The usual example is having two parent models, each containing the automatically created id field in them. When you try to save such a model things will break. So, removing this from allowed model structures should not be considered backwards incompatible, as the structure doesn't actually work currently.

Does this need doc changes? I don't remember if we have forbidden model attribute names section, or forbidden model structures section in the docs.

comment:7 Changed 3 years ago by akaariai

krzysiumed: I forgot to add that if possible, create a pull request of at least the indentations patch. This way you will get proper credit for your work. Look in git log (or just github commit history) for ways to format the commit message. I prefer the format:

50 char summary line

Fixed/Refs #NNNNN -- A message telling about what
was changed and why, 70 chars per line. Use past
tense.

the commit message format is still not completely decided, and thus you will see other formats, too.

comment:8 Changed 2 years ago by chrismedrela

  • Cc chrismedrela added

Hi, I changed nick (the old one is krzysiumed).

I updated the last patch 17673_v2.diff and created pull request https://github.com/django/django/pull/556. I didn't update the indentation patch -- it doesn't apply clearly, there would be a lot of work to update it and it's not so important.

comment:9 Changed 21 months ago by timo

  • Patch needs improvement set

The patch has gone stale again.

comment:10 Changed 21 months ago by chrismedrela

  • Patch needs improvement unset

The patch applies clearly now. See the new pull request: https://github.com/django/django/pull/1257 .

comment:11 Changed 21 months ago by russellm

I've just reviewed @chrismedrela's patch, and I'm not entirely comfortable with the assertion that @akaariai made in comment 1 - specifically, I'm not sure that I accept that the 'test_broken_inheritance' actually shows anything that is broken.

The test assumes that the PK of the subclass will be the same as the PK of the base class -- and while this will be a reasonable assumption in many cases, I don't believe it's a documented API guarantee. If you want to retrieve the Worker that is the base of a StudentWorker instance, you should be using Worker.objects.get(sw1.worker_ptr_id), or sw1.worker_ptr.

The implication of comment 1 is that *any* multiple inheritance of concrete models is an error, and I don't believe that this is the case. Multiple inheritance is a documented feature, and we have tests that validate that it works (tests that @chrismedrela's patch removes because the models now cause a validation error).

comment:12 Changed 21 months ago by akaariai

The point is that if you have not set primary key manually for the models then both A and B will have id = AutoField as primary key. The problem now is that C needs to contain two values for id, one for A's id and one for B's id.

The test_broken_inheritance test isn't correct, that is true. A better writing of that test is using 'id' as the lookup instead of pk. The model has two different values for the id and that is a problem.

I think the multi-inheritance tests should work if you just add differently named primary keys to the parents.

The StudentWorker class doesn't work currently in any practical way. The biggest problem is this: assume a StudentWorker has Student with id = 1 and Worker with id = 2 as parents. Fetch it from DB. I am not sure if it gets Student's or Worker's id value but that doesn't really matter. It gets just one, lets say 2. Now save back to DB: when the parent student is saved its id is seen as 2 (as there is only one id value for the whole class). Now you just updated wrong student or maybe saved a new one.

The same problem is present for any field shadowing. When you fetch you will coalesce two columns into one attribute. When you save the single value is used as the value for both database columns, so just fetch + save will overwrite data.

comment:13 Changed 21 months ago by akaariai

Seems like I was wrong about the studentworker save case. The save seems to work correctly. However it seems that instantiation via kwargs does not. For example:

sw = StudentWorker(name="Fred", age=25, job="Quarry worker", school_class="1B")
# sw.age will be None here

It will likely be possible to fix all the field shadowing issues, but this will be a lot of work for little benefit. Also, as a data modeling practice field shadowing seems to be problematic. (What does it mean if a studentworker has age 5 as student, but age 25 as worker?).

comment:14 Changed 13 months ago by timo

  • Patch needs improvement set

comment:15 Changed 13 months ago by chrismedrela

  • Patch needs improvement unset

New patch written from scratch: https://github.com/django/django/pull/2242.

comment:16 Changed 13 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from new to closed

In ee9fcb1672ddf5910ed8c45c37a00f32ebbe2bb1:

Fixed #17673 -- Forbid field shadowing.

Thanks Anssi Kääriäinen for the suggestion.

comment:17 Changed 6 weeks ago by aron45

I think this does not solve multi inheritance of grand father

class GrandFather(models.Model):
    column1 = models.IntegerField()


class Father(GrandFather):
    column2 = models.IntegerField()


class Child(Father):
    column1 = models.IntegerField()  # this should fail, but it does pass.


comment:18 Changed 6 weeks ago by aron45

  • Resolution fixed deleted
  • Status changed from closed to new

Changed 6 weeks ago by aron45

disallow multi inheritance (eg. grandfather) field clash

comment:19 Changed 5 weeks ago by timgraham

  • Resolution set to fixed
  • Status changed from new to closed

As this fix has already been released, please open a new ticket. Also a test is needed in order to merge the patch.

Note: See TracTickets for help on using tickets.
Back to Top