Opened 13 years ago
Closed 10 years ago
#17673 closed Bug (fixed)
Forbid field shadowing in model multi-inheritance
Reported by: | Anssi Kääriäinen | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | validation |
Cc: | Anssi Kääriäinen, krzysiumed@…, chrismedrela | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I was working on ticket #13803 and got a little carried away. I ended up forbidding field shadowing in model multi-inheritance cases. The attached patch also includes checks for field.attname clashing with field.name, disallowing 'pk' as a field name (what #13803 is about) and __
in field names.
Except for the multi-inheritance the validations should be easily seen as necessary ones.
Now, why forbid field shadowing for multi-inheritance? The basic problem is that in the DB you can have different values for the similarly named fields, but in Python you can have just a single value. The most trivial example is this:
class A: f = models.IntegerField() class B: f = models.DateField() class C(A, B) pass
In this case it is obvious that things will not work as expected as you simply can't save the C instance's value for f as both date and integer.
The above example has one even more serious problem: there is a clash between the id fields of A and B. This can lead to pretty confusing situations. The most obvious one is that both A and B get their id value from the sequence when saving a new model. However A can get 10 as the last value, B 11. Now, you have a C model with a value of both 10 and 11 for the id. Save it and suddenly you will duplicate, or even overwrite an object from either B or C model. If 10 is in effect, you will overwrite B with id 10, if 11 is in effect you will create a new A with value 11. In the last case the sequence will be out of sync, and next time you save A you will get a UniqueViolation
.
I am not exactly sure if the above is what happens. Other possibility is that A's value will be used as B's id value on the first save already. In this case things work like: Save C -> Save A (gets 10 as id) -> Save B (sees 10 as id) -> overwrite of B instance of id value 10.
Thus, field shadowing should be forbidden. This will be backwards incompatible (there is even one model having this problem in the test suite). But there is little hope of fixing this without making the code a lot more complex. I think this case can be classified as a data corrupting bug (due to the id problem), and thus backwards incompatibility is not in effect. The backwards compatibility is solvable by changing the field names to non-clashing ones in one of the parent models. This is a problem only in multi-inheritance cases, which should not be too commonly used.
I think I have seen some tickets where the real underlying problem has been model-inheritance with field shadowing. However, I can't find them now. I don't think this is anything too critical, but this would be a nice thing to fix. Things will seem to work at first, but then later on you will get weird data overwrite problems which can be pretty hard to debug.
Attachments (5)
Change History (24)
by , 13 years ago
Attachment: | 17673.diff added |
---|
comment:1 by , 13 years ago
by , 13 years ago
Attachment: | broken_inheritance.diff added |
---|
test demonstrating the problem with field shadowing
comment:2 by , 13 years ago
Cc: | added |
---|---|
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Accepted |
The patch is good, but I would like to suggest some small improvements. There are very negligible, so feel free to ignore me if you want.
For example:
clash = used_fields.get(f.name) or used_fields.get(f.attname) or None # Why not just: clash = f.name in used_fields or f.attname in used_fields
Next one:
e.add(opts, 'The field "%s" from parent model "%s" clashes with the field ' '"%s" from another parent model "%s"' % ( f.name, f.model._meta, clash.name, clash.model._meta ))
I prefer following (because of PEP8
):
e.add(opts, 'The field "%s" from parent model "%s" clashes with the field ' '"%s" from another parent model "%s"' % ( f.name, f.model._meta, clash.name, clash.model._meta))
Another thing is names. I think Parent1
, Parent2
and Child1
are not meaningful enough. We can use ChildShadowingField
or something like that. I really don't know how to call other two models. Maybe FirstParent
and SecondParent
would be better -- they are not more meaningful, but the difference between them is more visible.
And the last one which is less negligible, I think we should not comment old code. If somebody want to know code of StudentWorker
model, he or she can just use version control system. We can just leave a small note that model StudentWorker
was deleted and point to this ticket.
comment:3 by , 13 years ago
The clash variable is the clashing field, not a boolean. It is later on used (actually in the second snippet above). The last "or None" is redundant so it could be removed. The variable should probably be named clashing_field, so there would not be confusion.
I think the PEP8 makes sense, as do the renames.
You are absolutely correct that the StudentWorker
model should be removed completely. I should have done that.
If you feel like it you can upgrade the patch. If you see places which are hard to understand add comments or ask for me to do that. In my opinion that is the best way to make sure the code is understandable - the reviewer adding comments to places where he has to think "too much". Anyways, thanks for the feedback!
comment:4 by , 13 years ago
Just a small tweak... you could use django.db.models.sql.constants.LOOKUP_SEP
instead of '__'
.
by , 13 years ago
Attachment: | 17673_indentation.diff added |
---|
proposition of improving indentation in django/core/management/validation.py
comment:5 by , 13 years ago
Patch needs improvement: | unset |
---|
I upgraded akaariai's patch (file 17673_v2.diff
):
- Renamed some models. Renamed
clash
toclashing_field
. - Deleted commented code. Some changes in comments.
django.db.models.sql.constants.LOOKUP_SEP
instead of'__'
.- Improved indentation.
I left 'or None
' in line 49 from validation.py
because it makes clear that if 'used_fields.get(f.name)
' and 'used_fields.get(f.attname)
' are both None
then clashing_field==None
.
About indentation: what do you prefer?
clashing_field = used_fields.get(f.name) or used_fields.get(f.attname) or None ### more than 80 characters
or
clashing_field = (used_fields.get(f.name) or used_fields.get(f.attname) or None)
I attached another patch (file 17673_indentation.py
) which is not connected to this ticket but it changes the same file (django/core/management/validation.py
). It shows how the code looks when we stick with PEP8 especially with 'max 80 character per line' rule. What do you think about it?
akaariai, I agree. This kind of working in pairs let us to make code better.
comment:6 by , 13 years ago
The re-indentations look good to me (apart of the "clash -> clashing_field" typo there).
I would like to get a second opinion on the validations part. There the only controversial thing is forbidding field shadowing. But this is just to prevent model structures which are known to be buggy. The usual example is having two parent models, each containing the automatically created id field in them. When you try to save such a model things will break. So, removing this from allowed model structures should not be considered backwards incompatible, as the structure doesn't actually work currently.
Does this need doc changes? I don't remember if we have forbidden model attribute names section, or forbidden model structures section in the docs.
comment:7 by , 13 years ago
krzysiumed: I forgot to add that if possible, create a pull request of at least the indentations patch. This way you will get proper credit for your work. Look in git log (or just github commit history) for ways to format the commit message. I prefer the format:
50 char summary line Fixed/Refs #NNNNN -- A message telling about what was changed and why, 70 chars per line. Use past tense.
the commit message format is still not completely decided, and thus you will see other formats, too.
comment:8 by , 12 years ago
Cc: | added |
---|
Hi, I changed nick (the old one is krzysiumed
).
I updated the last patch 17673_v2.diff
and created pull request https://github.com/django/django/pull/556. I didn't update the indentation patch -- it doesn't apply clearly, there would be a lot of work to update it and it's not so important.
comment:10 by , 12 years ago
Patch needs improvement: | unset |
---|
The patch applies clearly now. See the new pull request: https://github.com/django/django/pull/1257 .
comment:11 by , 12 years ago
I've just reviewed @chrismedrela's patch, and I'm not entirely comfortable with the assertion that @akaariai made in comment 1 - specifically, I'm not sure that I accept that the 'test_broken_inheritance' actually shows anything that is broken.
The test assumes that the PK of the subclass will be the same as the PK of the base class -- and while this will be a reasonable assumption in many cases, I don't believe it's a documented API guarantee. If you want to retrieve the Worker that is the base of a StudentWorker instance, you should be using Worker.objects.get(sw1.worker_ptr_id), or sw1.worker_ptr.
The implication of comment 1 is that *any* multiple inheritance of concrete models is an error, and I don't believe that this is the case. Multiple inheritance is a documented feature, and we have tests that validate that it works (tests that @chrismedrela's patch removes because the models now cause a validation error).
comment:12 by , 12 years ago
The point is that if you have not set primary key manually for the models then both A and B will have id = AutoField as primary key. The problem now is that C needs to contain two values for id, one for A's id and one for B's id.
The test_broken_inheritance test isn't correct, that is true. A better writing of that test is using 'id' as the lookup instead of pk. The model has two different values for the id and that is a problem.
I think the multi-inheritance tests should work if you just add differently named primary keys to the parents.
The StudentWorker class doesn't work currently in any practical way. The biggest problem is this: assume a StudentWorker has Student with id = 1 and Worker with id = 2 as parents. Fetch it from DB. I am not sure if it gets Student's or Worker's id value but that doesn't really matter. It gets just one, lets say 2. Now save back to DB: when the parent student is saved its id is seen as 2 (as there is only one id value for the whole class). Now you just updated wrong student or maybe saved a new one.
The same problem is present for any field shadowing. When you fetch you will coalesce two columns into one attribute. When you save the single value is used as the value for both database columns, so just fetch + save will overwrite data.
comment:13 by , 12 years ago
Seems like I was wrong about the studentworker save case. The save seems to work correctly. However it seems that instantiation via kwargs does not. For example:
sw = StudentWorker(name="Fred", age=25, job="Quarry worker", school_class="1B") # sw.age will be None here
It will likely be possible to fix all the field shadowing issues, but this will be a lot of work for little benefit. Also, as a data modeling practice field shadowing seems to be problematic. (What does it mean if a studentworker has age 5 as student, but age 25 as worker?).
comment:14 by , 11 years ago
Patch needs improvement: | set |
---|
comment:15 by , 11 years ago
Patch needs improvement: | unset |
---|
New patch written from scratch: https://github.com/django/django/pull/2242.
comment:16 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:17 by , 10 years ago
I think this does not solve multi inheritance of grand father
class GrandFather(models.Model): column1 = models.IntegerField() class Father(GrandFather): column2 = models.IntegerField() class Child(Father): column1 = models.IntegerField() # this should fail, but it does pass.
comment:18 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
by , 10 years ago
Attachment: | multi_inheritance_field_clash.patch added |
---|
disallow multi inheritance (eg. grandfather) field clash
comment:19 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
As this fix has already been released, please open a new ticket. Also a test is needed in order to merge the patch.
Attached (and inline) is a proof that infact multi-inheritance with shadowed fields is broken:
The models are from model_inheritance tests. The important thing to note is that studentworker inherits from both student and worker, and both parents have id field defined separately. Above, the "not wilma" worker has been overwritten due to save of completely unrelated object => data loss.
What happens? When saving, the student parent gets an id from sequence (1 in this case), then worker parent is saved and the save sees the id of 1 already set in the model by the student save, and thus overwrites the existing worker with id 1.