Opened 7 months ago

Last modified 7 months ago

#28099 new Bug

Reuse of UpdateQuery breaks complex delete updates

Reported by: milosu Owned by: nobody
Component: Database layer (models, ORM) Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I've discovered some obscure bug, which triggers randomly and only under certain circumstances, based on the address space layout.

Try to apply attached delete_on_update_bug.py and run delete_regress multiple times.

The test case should fail with:

======================================================================
FAIL: test_delete_with_custom_handlers (delete_regress.tests.CollectorUpdateTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\data\django\tests\delete_regress\tests.py", line 381, in test_delete_with_custom_handlers
    self.assertEqual(cost_claim_in_report.threatened_id, t_org_in_report.pk)
AssertionError: None != 2

----------------------------------------------------------------------
Ran 19 tests in 0.970s

I can reproduce this bug with latest Django when I make several tweaks, that change the address space layout of the Python process, besides others:

  • use psycopg2 backend while testing instead of the dummy default
  • try to put pdb near the random shuffle
  • add somehow additional DLLs to the Python process etc.

The issue can be fixed by applying delete_on_update_fix.diff.

The root cause is the reuse of sql.UpdateQuery across the inner for-loop, due to the way update_batch is implemented (by calling add_update_values etc.).

Caveats:
The print statement in delete_on_update_bug.diff prints the order of the items when iterating over the field_updates dictionary.

The order that breaks the test case is one of the following:

organization None set([<ThreatenedOrganization: reported threatened org>])
threatened None set([<CostClaim: cost claim>])
organization None set([<CostClaim: reported cost claim>])

i.e. update of "organization" fkey on <CostClaim: reported cost claim> must follow after the update of the "threatened" attr.

Attachments (3)

delete_on_update_bug.diff (4.8 KB) - added by milosu 7 months ago.
code to reproduce the bug
delete_on_update_fix.diff (4.6 KB) - added by milosu 7 months ago.
patch to fix the bug
delete_on_update_bug_update.diff (3.1 KB) - added by milosu 7 months ago.

Download all attachments as: .zip

Change History (11)

Changed 7 months ago by milosu

Attachment: delete_on_update_bug.diff added

code to reproduce the bug

Changed 7 months ago by milosu

Attachment: delete_on_update_fix.diff added

patch to fix the bug

comment:1 Changed 7 months ago by milosu

Summary: Global use of UpdateQuery breaks complex delete updatesReuse of UpdateQuery breaks complex delete updates

comment:2 Changed 7 months ago by Tim Graham

Patch needs improvement: set

I haven't investigated this is detail but it would be nice to understand the root cause of the issue a bit more to confirm that the fix is the best one. The patch should be reworked to apply to master (which supports Python 3 only) and the test simplified as much as possible. Existing test models should be used as much as possible. If any new models are needed they should have as few fields as possible. The test should use create() rather than get_or_create() and I don' think the DoesNotExist exception catching is necessary.

comment:3 Changed 7 months ago by Tim Graham

Triage Stage: UnreviewedAccepted

comment:4 Changed 7 months ago by milosu

The root cause is the reuse of the sql.UpdateQuery(model) instance in the inner forloop.
Repeated call to add_update_values called from update_batch is essentially adding value items to self.values array of the Query object.

This can not be tested explicitly in the test case due to the way how the delete method on the collector is structured.

The best way how to deal with this bug is to prevent the reuse of UpdateQuery.

I can polish the testcase somehow per your instructions, but keep in mind that the reproducibility of the issue is hard.

This is a critical ORM bug, which is present in Django since ages, with ability to randomly corrupt data in the database.

comment:5 Changed 7 months ago by Shai Berger

Hi,

When you say a bug in Django is triggered by process memory layout, it sounds suspicious -- Python code should generally not be affected by such low-level concerns. So, it sounds more likely that Django's use triggers a bug in the underlying components which are written in C -- Python itself, Psycopg2, or the PostgreSQL client libraries. Such bugs are, indeed, more likely on Windows, where these components are less used and tested.

It may be the case that there is such a bug, and your suggested fix makes Django work around it -- which may make the fix worth while, but the upstream projects should be notified.

Or, unlikely as it seems to me, it may be that Django is actually doing something wrong here. But to show that, in my opinion, you need to show not just that something bad can happen when using Django -- you need to show where Django is violating some contract of the APIs it is using.

comment:6 Changed 7 months ago by milosu

Hi, guys, I'm attaching new simplified patch. Fails against Django 1.11 on Windows in my test setup. Even against SQL Lite.

Clear enough now?

Changed 7 months ago by milosu

comment:7 Changed 7 months ago by Simon Charette

Milosu, are you able to reproduce without a custom on_delete handler?

comment:8 Changed 7 months ago by milosu

Depends on what you mean by custom on_delete handler.

Collector.add_field_update is called from models.CASCADE - this sets None => can not trigger this bug. Same holds true for models.SET_NULL.

Than there are handlers like models.SET() and models.SET_DEFAULT => bug could be triggered, if callable with variable output is used in these handlers.
I do not have real-world use cases using these "default" handlers, but I can produce silly looking regression test cases using specially crafted field.default or models.SET callables that will also trigger assertion failure.

The bug report use case is the real-world one, anyway.

Note: See TracTickets for help on using tickets.
Back to Top