Opened 8 years ago

Closed 4 years ago

Last modified 4 years ago

#3615 closed Bug (fixed)

Can't define forward references in fixtures using MySQL with InnoDB

Reported by: russellm Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: mysql innodb myisam reference fixture
Cc: andrew@…, tonightslastsong@…, eduardocereto@…, chris@…, maxirobaina@…, tomasz.zielinski@…, jsdalton Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The problem is highlighted by the test case introduced in [4610].

When using MySQL with the InnoDB backend, fixture data files (or any serialized data) can't contain forward references. If a forward reference is encountered, a row-level contraint error is raised. This is because MySQL checks constraints at the end of each command.

Postgres has similar behaviour to InnoDB by default, but [4610] modified the table declarations to disable constraint checking until the end of a transaction using DEFFERABLE INITIALLY DEFERRED. However, MySQL doesn't implement this feature or an equivalent.

If you have your MySQL setup to use MyISAM (the default backend), you won't see the problem, because MyISAM doesn't enforce constraints.

One way around the problem would be to use SET FOREIGN_KEY_CHECKS=0 at the start and SET FOREIGN_KEY_CHECKS=1 at the end of each transaction (or around the serialization/fixture use). However, this wouldn't validate any keys modified while the checks were disabled.

Attachments (12)

set-foreign-key-checks-on-managed-trans.diff (5.3 KB) - added by Andrew Sutherland <andrew@…> 8 years ago.
ugly implementation of workaround proposed in ticket description
t3615-r9725.diff (12.3 KB) - added by ramiro 7 years ago.
Patch updated to trunk as of r9725
defer_constraint_checks.diff (8.5 KB) - added by kmtracey 7 years ago.
allow_forward_refs_in_fixtures.diff (7.2 KB) - added by jsdalton 4 years ago.
Not there yet, but includes better tests
allow_forward_refs_in_fixtures_v2.diff (10.6 KB) - added by jsdalton 4 years ago.
Still needs a bit of work, but this is basically a solution
allow_forward_refs_in_fixtures_v3.diff (15.8 KB) - added by jsdalton 4 years ago.
allow_forward_refs_in_fixtures_v4.diff (16.6 KB) - added by jsdalton 4 years ago.
allow_forward_refs_in_fixtures_v5.diff (19.0 KB) - added by jsdalton 4 years ago.
allow_forward_refs_in_fixtures_v6.diff (20.7 KB) - added by jsdalton 4 years ago.
allow_forward_refs_in_fixtures_v7.diff (25.4 KB) - added by jsdalton 4 years ago.
allow_forward_refs_in_fixtures_v8.diff (25.7 KB) - added by jsdalton 4 years ago.
allow_forward_refs_in_fixtures_v9.diff (29.4 KB) - added by jsdalton 4 years ago.

Download all attachments as: .zip

Change History (63)

comment:1 Changed 8 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Changed 8 years ago by Andrew Sutherland <andrew@…>

ugly implementation of workaround proposed in ticket description

comment:2 Changed 8 years ago by Andrew Sutherland <andrew@…>

  • Cc andrew@… added

I had this problem and needed a quick solution; I've attached a patch but am not setting the patch flag because even if the work-around is reasonable, this patch is ugly. I'm providing it to make life easier for anyone else in the same boat. (Which be most people using InnoDB; I'm sure fancy people just go straight to Postgres.)

The patch adds _disable_broken_foreign_key_checks/_enable_broken_foreign_key_checks to all DatabaseWrapper classes and modifies enter_transaction_management/leave_transaction_management in db/transaction.py to invoke these methods. According to the existing idiom, it seems like it would be more appropriate to have the SQL logic exist inside get_* functions in the database back-end's base.py file. However, that still doesn't feel right because only MySQL/InnoDB has the issue. A better solution feels like letting the MySQL db back-end decorate the transaction process, but that still looks likes all back-ends would need to be changed in some fashion.

comment:3 Changed 7 years ago by russellm

Discussion on django-dev revealed this script which can apparently check for foreign key checks. It might be worth seeing if we can integrate this script (or something like it) into a patch that uses SET FOREIGN_KEY_CHECKS.

comment:4 follow-up: Changed 7 years ago by matthew

Ummm, I might be crazy but is this really an issue? When I ran into the error message "Cannot add or update a child row: a foreign key constraint fails" I just switched the order of the applications in my settings.py config. By putting the fixture with the data that is referenced before the fixture that is referencing it, everything just worked. Maybe others have a more complex scenario than I have...

Later,

Matthew Purdon

comment:5 in reply to: ↑ 4 Changed 7 years ago by russellm

Replying to matthew:

Ummm, I might be crazy but is this really an issue?

Yes - it is. Think of circular references: a refers to b, and b refers to a. There is no order that you can use for that fixture that will will not have a forward reference.

Even if there wasn't the circular reference problem, there is the issue that Django doesn't roll out fixtures so as to avoid forward references - i.e., even if there is a clear order that will work, Django doesn't attempt to discover or use it. Yes, you can manually modify a fixture, but the point is that you shouldn't have to.

Changed 7 years ago by ramiro

Patch updated to trunk as of r9725

comment:6 Changed 7 years ago by ramiro

I've updated the patch, following notes apply:

  • It moves the disabling/enabling of the FK checking from the transaction infrastructure to the loaddata management command. This means the scope of the fixes this could introduce is limited to fixture loading, I feel a bit nervous about disabling it in the general case. This means that tests like modeltests/mutually_referencial (where forward references are introduced through the ORM API instead of doing it through the serializers) will keep failing for MySQL/InnoDB
  • I haven't added the MySQL stored procedure pointed to by Russell, I tried to implement creating it in the database programatically but failed miserably. What would need to be done is to solve that (possibly reducing it's size/complexity by removing the feature that suggest the SQL code to fix the inconsistencies found) and try to find the list of tables corresponding to the models going to be affected by the fixture loading and hand it to the SP when invoking it after re-enabling the FK checks. Also, perhaps we should first clear any code licensing doubts about the SQL code fragment.
  • A can_defer_constraint_checks backend feature has been added to describe the relevant RDBMS ability. MySQL is the only one with the value set to False whatever the storage engine in use is, seems SET FOREIGN_KEY_CHECKS=X is a no-op when the storage engine is MyISAM and wouldn't be worth trying detecting the storage engine in use in the connection and base SET FOREIGN_KEY_CHECKS usage on it because it can also be set per-table at creation time.
  • It includes tests, hopefully modeltests/fixtures is the correct place for them.

SQL Server suffers from similar problems so it would also make use of can_defer_constraint_checks=False feature. But disabling/enabling of constraint checks is done in a per-table basis by actually executing explicit "ALTER <table_name> SET [NO]CHECK CONSTRAINT..." clauses, I've written a preliminary implementation for django-pyodbc that allows the loading of fixtures with forward and circular references.

comment:7 Changed 7 years ago by kmtracey

Thanks for the updated patch Ramiro. I'm going to attach another based on a similar approach but with these differences:

First, instead of having a boolean can_defer_constraint_checks and a pair of routines I just have a pair of routines. I don't see that having the boolean buys anything, the added routines should just be callable regardless of what the DB supports and not do any harm if they are not necessary on the actual DB in use. What to name the routines caused me some grief....

I did not like disable_forward_ref_checks and enable_forward_ref_checks because they seem slightly inaccurate in many cases. diaable_forward_ref_checks actually disables all reference checks on MySQL/InnoDB, not just forward ones. enable_forward_ref_checks doesn't actually re-enable immediate constraint checking on Postgres (and I don't think we'd want it to), but the name sort of implies forward references will be checked (and therefore cause errors) in code following a call to enable_forward_ref_checks.

The names I chose (which aren't necessarily the best, but the best I could come up with), are begin_defer_constraint_checks and end_defer_constraint_checks. They are intended to be placed around code that requires deferred constraint checking for proper operation -- thus to be read as a declaration from the programmer about the requirements of the enclosed code, not necessarily instructions as to what, exactly, the underlying DB should do to support those requirements. They could be documented as being database-dependent, likely doing nothing on DBs that by default support deferred constraint checking, and perhaps disabling all constraint checking on DBs that don't support deferring constraint checking. Whether the end routine actually implements checking of what was done in the block would also be implementation-dependent (and not done, at least initially, for MySQL/InnoDB). (I realize these names are still a bit inaccurate -- begin_deferred_constraint_checking_required and end_deferred_constraint_checking_required would be more accurate but those (I think) are way too long.)

Second, instead of a new test specifically for this, I adjusted the existing tests that are failing on InnoDB to make use of the routines. With the patch I will attach we get a clean run of the Django test suite on InnoDB. (I'd really like to get something that enables a clean run on InnoDB because with the addition of the aggregate fixtures that trip this bug, the amount of "expected" test failure output on InnoDB has become unmanageable.)

I'm not sure whether/how this approach generalizes to SQL Server. It sounds like you need to tell it the specific tables involved, and there's nothing here that does that....?

Changed 7 years ago by kmtracey

comment:8 Changed 7 years ago by russellm

Karen - I'm not convinced about this implementation. I looked at using SET FOREIGN KEY CHECKS=0 back when I opened this ticket, but decided that it wasn't a complete solution.

As I recall (and I haven't looked into this for a while, so feel free to correct me), the reason I walked away from the problem was that using SET FOREIGN_KEY_CHECKS=0 will make the failures in the Django test system go away, but at the cost of allowing users to completely screw up their database. If your try to load a badly formed fixture -- for example, a fixture that contains a foreign key reference to a non-existent object -- and you have foreign key checks turned off, the database doesn't re-validate when you turn foreign key checks back on. This means you have an InnoDB table that doesn't have referential integrity, which kind of defeats one of the reasons for using InnoDB in the first place. Given the ease with which broken fixtures can be produced, if we're going to turn this flag on and off automatically, I'd rather see a method that guaranteed actual referential integrity.

My other comment is whether the calls to defer constraint checks that are required around calls to delete() should be a manual requirement of the user (as demonstrated by the test case), or if we should include them as top and tail calls in Django's delete implementation. Ideally, it would be nice for InnoDB users to not have to care about the requirements of their backend of choice. My only hesitation to suggesting this is the potential for an edge case that requires two (or more) successive calls to delete to completely purge an object network. I can't think of an obvious example that wouldn't be cleaned up by Django's normal dependency following delete logic, but I can't help but think that there might be one.

Of course, the alternative approach to this entire problem is to openly state that InnoDB's constraint handling is broken, provide the (potentially dangerous) constraint checking calls as part of MySQL's operations backend, but make use of those calls _completely_ manual. Rather than embedding the calls into loaddata, we manually make the calls in Django's test framework as appropriate, and modify the documentation to tell InnoDB users that they may need to do the same in their code. It would be a big hairy wart on the API, but it would allow us to avoid a class of quite annoying test failures, and provides a solution for InnoDB users where no solution currently exists.

As for the aggregation test failures - regardless of the design outcome for this ticket, we may be able to resolve the aggregation test failures by simply reordering the test fixture. There's nothing in the test fixture that requires the ordering presented - if we move the list of Authors to higher in the fixture, most of the serialization problems should go away. The only potential complication is the list of friends of each author - I haven't dug into detail to see if there are any circular references, but even if there are, I don't believe that they are essential for testing aggregation - we can easily change the test case to remove the circular dependencies, making the fixture InnoDB friendly.

comment:9 follow-ups: Changed 7 years ago by kmtracey

You are right, it's not a complete solution. Turning foreign key checks off and then back on lets you introduce integrity errors in the meantime. A complete solution would incorporate something like that referenced stored procedure to check what-all had been done during the "defer check" block and raise an error if there was a constraint violation found. However, I don't have a whole lot of interest in implementing that myself. It'd be nice...but if the database itself can't be bothered to do this, I'm having a hard time working up motivation to devote time to it. However, if we do provide these routines, we do start to have a hook in place for saying "here's where the logic needs to go if you want to make this safer on InnoDB" (or any other DB that does not do deferred constraint checking).

I did not realize it was a concern that it was very easy to generate invalid fixtures, and thus not safe for loaddata to simply use this by default. I can look at changing that, maybe via a flag to loaddata? I think that was how I originally thought I would approach it but then one of the other patches just put it in loaddata by default, so I did the same.

On the deletes, I didn't know how safe it would be to put these inside the innards of the delete processing, so I just put them in the tests. Since the tests pass on other, more advanced w.r.t. constraint checking, DBs, I thought it was safe to bracket the calls in the tests, but didn't know if it would be safe in general. Since we implement cascading delete manually I suppose it should be safe...but didn't quite feel comfortable making that leap myself.

I also did first approach this by just fixing up the new fixtures to not trip this bug. That's do-able via re-ordering and removing forward references in the ManyToManys...but it's error prone, can't be done for all (e.g. circular) fixtures, has to be re-done if the fixtures are ever re-generated programmatically, or whenever new ones are introduced...as I thought about it more and more the list of downsides to that approach got so long I thought it might be worthwhile trying to do something to address the underlying problem, even if at this point it isn't a complete answer.

(I did also briefly consider adding checks for the DB in use (there's already at least one such check) to the tests and issuing the SET FOREIGN_KEY_CHECKS=0/1 there when necessary, but that is really ugly and does nothing to help for similar situations on other DBs. As it is it isn't clear to me if this approach really helps at all for MS SQL Server, since it apparently wants table names on which to turn off constraint checking, but maybe someone who knows more (Ramiro?) on that topic can comment...)

comment:10 in reply to: ↑ 9 Changed 7 years ago by russellm

Replying to kmtracey:

However, I don't have a whole lot of interest in implementing that myself. It'd be nice...but if the database itself can't be bothered to do this, I'm having a hard time working up motivation to devote time to it.

This is exactly the reason that I opened this ticket and moved on two years ago.

I did not realize it was a concern that it was very easy to generate invalid fixtures, and thus not safe for loaddata to simply use this by default. I can look at changing that, maybe via a flag to loaddata? I think that was how I originally thought I would approach it but then one of the other patches just put it in loaddata by default, so I did the same.

It's more that I'm not in favor of providing a feature that can silently lead to a corrupted database. A flag to loaddata sounds like a reasonable approach though, as long as the option itself is suitably admonishing and scary (something like --allow-corrupted-databases), with no shortcut version of the option, and a documentation string that says "make sure you really need to use this option". It doesn't need to be that name specifically; I just want to make sure that the user is going to have to make an active and obvious choice to invoke a call that could have nasty consequences. This includes implicit calls to loaddata, such as the ones that happen when test fixtures are loaded.

On the deletes, I didn't know how safe it would be to put these inside the innards of the delete processing, so I just put them in the tests. Since the tests pass on other, more advanced w.r.t. constraint checking, DBs, I thought it was safe to bracket the calls in the tests, but didn't know if it would be safe in general. Since we implement cascading delete manually I suppose it should be safe...but didn't quite feel comfortable making that leap myself.

I'm inclined to agree. Given that it will only affect InnoDB users, I see no problems requiring that users manually call enable/disable if they need to. This also follows on from the general philosophy of providing the tools to allow InnoDB to work, but not making it easy to bust things up completely.

I also did first approach this by just fixing up the new fixtures to not trip this bug. That's do-able via re-ordering and removing forward references in the ManyToManys...but it's error prone, can't be done for all (e.g. circular) fixtures, has to be re-done if the fixtures are ever re-generated programmatically, or whenever new ones are introduced...as I thought about it more and more the list of downsides to that approach got so long I thought it might be worthwhile trying to do something to address the underlying problem, even if at this point it isn't a complete answer.

(I did also briefly consider adding checks for the DB in use (there's already at least one such check) to the tests and issuing the SET FOREIGN_KEY_CHECKS=0/1 there when necessary, but that is really ugly and does nothing to help for similar situations on other DBs. As it is it isn't clear to me if this approach really helps at all for MS SQL Server, since it apparently wants table names on which to turn off constraint checking, but maybe someone who knows more (Ramiro?) on that topic can comment...)

I know absolutely nothing about MSSQL. However, it sounds like we could make the enable/disable_key_checks() calls take a list of table names. Postgres/SQlite will do nothing at all in the implementation; MySQL/InnoDB can ignore the list, but makes the SET call; MSSQL can then make per-table disable calls as required. It's not pretty, but then nothing about this whole problem is :-)

comment:11 in reply to: ↑ 9 Changed 7 years ago by ramiro

Replying to kmtracey:

As it is it isn't clear to me if this approach really helps at all for MS SQL Server, since it apparently wants table names on which to turn off constraint checking, but maybe someone who knows more (Ramiro?) on that topic can comment...)

You are right, implementing this in SQL server doesn't imply controlling things by toggling a global variable but by ALTERing TABLE so the names of the tables are needed.

So far, I've solved this by cloning the loaddata management command and modifying the copy to insert calls to the two new methods in the same way it's done in the patches we have posted. But they mostly only serve to mark the boundaries of a period of time in which insertions of forward references are possible. Real work is done in a hook that is a no-op except when running during one of such periods, it's called every time fixtures for a model are going to be loaded, in these edges the model is introspected in search of outgoing FKs and M2M, if there are any the corresponding ALTER TABLEs are performed.

http://code.google.com/p/django-pyodbc/source/browse/trunk/sql_server/extra/management/commands/ss_loaddata.py#165

(Currently all the constraint checks for a given table are deactivated, but it can be made more fine grained and to have the less overall impact possible in the DB by even introspecting the names of the constraints and only deactivating these. This is in my TODO list)

So, I think a backend for SQL Server could make use of the begin_defer_constraint_checks()/end_defer_constraint_checks() methods but when it comes to table names, being able to do thing in a way as granular as possible (by no touching tables that aren't involved) would need an additional hook (similar to the one I created) being provided in some way.

comment:12 Changed 5 years ago by russellm

#3620 is the same problem, as exposed by flush.

comment:13 Changed 5 years ago by boxm

The main concern about this patch appears to be that it introduces the possibility of loading a malformed database into MySQL/InnoDB because the referential integrity checks won't be done properly.

Put another way - if the user asks Django to load some fixtures that are malformed, then Django won't spot the problem and it will only surface at runtime.

However, if the user was using MyISAM tables, then we have already accepted that Django can't do referential integrity checks - so why is it worse for it to not do it on InnoDB, where the database can't handle loading data with integrity checks properly?

At the moment, we have the following truth table:

MyISAM: loads all possible fixtures, can't guarantee integrity
InnoDB: Can't load all fixtures, guarantees integrity

Relaxing the integrity checking to allow all fixtures to load seems to not make things much worse - if the user wants to load an incomplete set of fixtures, presumably the problem will show up quickly, whereas at present there's some data that simply *can't* be loaded.

This has just bitten me on test fixtures as I migrate from MyISAM to InnoDB. The test fixtures were actually broken (ie didn't have referential integrity) but that didn't matter since they supported the testcases needed. On migration, I now can't load the data at all, even once I fix the integrity problems.

comment:14 follow-up: Changed 5 years ago by russellm

@boxm - your MyISAM/InnoDB distinction isn't correct. The problem is that InnoDB's interpretation of referential integrity is broken -- it checks references at the end of every model change, not at the end of the transaction. Therefore, it isn't possible to insert an object that refers to an object that will be loaded later in the transaction. A completely legal fixture that can be loaded under PostgreSQL will break under InnoDB because of the way referential integrity is implemented.

The issue is that the only apparent ways to fix this is to *turn off* reference checking under InnoDB for the duration of fixture loading, and then to turn it back on again once the fixture is loaded. However, anything invalid loaded during the fixture won't be identified and flagged as an invalid reference.

comment:15 Changed 5 years ago by tiliv

  • Cc tonightslastsong@… added

This patch doesn't cleanly apply to trunk right now..

This is a little troubling, since it means I either can't run InnoDB and properly use fixtures with json (which is what I gathered was the affected scope in #3620 ), or else I must change my db backend (!) to keep the referential integrity feature. Neither of those routes is life-threatening, but neither is very convenient. My personal thought on this is that somebody ought not be doing an unverified syncdb with initial_data on a live production database, therefore I find the reasoning for temporarily disallowing the foreign key checks a little on the insulting side. A person's got to live with their own mistakes at some point. Since the documentation is ultimately the "we told you so", I'd very much like to see this problem addressed in a way that helps those of us victimized for the forward references issue, accompanied by a nice big warning, while still trying to uphold the spirit of referential integrity.

Perhaps a special flag, as discussed in the comments above, could be allowed on loaddata for MySQL backends. This forces you to know what it is you are asking Django to do and why, and to explicitly authorize the required action.

I'll be following this ticket in hopes of some consensus in the upcoming months.

comment:16 Changed 4 years ago by eduardocereto

  • Cc eduardocereto@… added

comment:17 in reply to: ↑ 14 Changed 4 years ago by acdha

  • Cc chris@… added

Replying to russellm:

The issue is that the only apparent ways to fix this is to *turn off* reference checking under InnoDB for the duration of fixture loading, and then to turn it back on again once the fixture is loaded. However, anything invalid loaded during the fixture won't be identified and flagged as an invalid reference.

I think the situation might be *slightly* less horrible than that: some limited testing during the sprints using MySQL 5.1.55 suggests that at least some erroneous data will be checked when adding a constraint which violates constraints (https://gist.github.com/873201). It seems like it would be possible to implement a huge hack for MySQL which would basically drop all django-created constraints before loading and recreate them afterward, rather than simply disabling them. I've started a branch working on this, which allows much of the test suite to run:

https://github.com/acdha/django/tree/mysql-loaddata-relaxed-constraints

This is slow but tolerable for testing and could be optimized in several obvious ways such as using the schema tables to retrieve the list of indexes for every app rather than introspecting each model in turn; it's using a sledgehammer approach because I didn't spend the time figuring out how to programmatically list all of the reverse indices for a given model and instead drops and recreates the indexes for every model in every application. Another obvious optimization would simply be switching to call loaddata normally and attempt this hack only if the fixture fails to load due to a foreign key exception or if requested by the user or test suite.

comment:18 Changed 4 years ago by maxi

  • Cc maxirobaina@… added

comment:19 Changed 4 years ago by lrekucki

  • Severity set to Normal
  • Type set to Bug

comment:20 Changed 4 years ago by TomaszZielinski

  • Cc tomasz.zielinski@… added
  • Easy pickings unset

comment:21 follow-up: Changed 4 years ago by jsdalton

  • Cc jsdalton added
  • Has patch set
  • Patch needs improvement set
  • UI/UX unset

It looks like this ticket has a patch that's almost there. The primary issue is the patch opens the possibility of inserting rows with bad referential integrity while foreign_key_checks is disabled. Presumably, if we could find a solution to this problem, the remaining issues Russel identifies here https://code.djangoproject.com/ticket/3615#comment:8 could be addressed.

I did a bit of research and there are certainly other people running into this issue (not in a Django context per se). There are also some suggested solutions out there, which mostly involve creating MySQL procedures that run a report of referential integrity issues in the DB. Here's a quick list of some of what I found:

I'm far from being a MySQL expert, but presumably someone with more expertise could adapt one of the stored procedure solutions above and run this check before committing the transaction.

This is still not entirely satisfying as a solution, since really the point is not to identify referential integrity issues in the whole of the database, but only those that were introduced in the context of the transaction that is loading the fixture data. I.e. it's only those rows that we care about. So that's really something that would have to be addressed.

Here's another thought I would like to contribute in the "so dumb it might just work" category: What if we loaded fixture data in two rounds? In the first round, we disable foreign_key_checks, so the records get added to the db without regard for their referential integrity. When all records have been loaded, we re-enable foreign_key_checks and walk through each record a second time, saving it again with the exact same data. Assuming all of the references are correct, the save should work even for forward references, since the record was added in the first round. However, if the key refers to a non-existent reference at this stage, then there is an underlying referential issue with the fixture data and a referential integrity error will be triggered. This can then be propagated up and the whole transaction rolled back.

I'm having a hard time finding something wrong with the above suggestion, though someone smarter than me can surely an issue I'm overlooking. Presumably one could finesse the implementation such that the double trip could only be taken if the foreign key checks were actually being disabled for a given DB implementation and skip it for those that handle it properly.

If anyone has any thoughts on either the stored procedure route or the "double load" idea, let me know. I'd love to assist in pushing this old ticket toward a resolution and would happily do anything in the scope of my abilities to further that goal.

comment:22 in reply to: ↑ 21 Changed 4 years ago by anonymous

Replying to jsdalton:

It looks like this ticket has a patch that's almost there. The primary issue is the patch opens the possibility of inserting rows with bad referential integrity while foreign_key_checks is disabled. Presumably, if we could find a solution to this problem, the remaining issues Russel identifies here https://code.djangoproject.com/ticket/3615#comment:8 could be addressed.

Recreating the indexes will also perform the checks - the main problem with that approach is that it's slow, which makes it complicated for testing but not so bad for most normal uses. Based on testing in my fork (above) it seems like the best answer might just be finding the right way to sequence it so you could do most of the work with foreign_key_checks off and simply recreate the indexes once when all of the fixture loading is complete.

Chris

comment:23 follow-up: Changed 4 years ago by jsdalton

Okay, worked on this a bit last night. While I don't have a solution yet, hopefully I can help advance the cause a bit further.

I'm uploading a modified, *not passing* version of Karen's patch from two years ago. Here's the basic gist of my modifications:

  • Most importantly, I added two tests dedicated to this issue to regression_tests/fixtures_regress. One test reproduces the original forward reference problem (which Karen's solution then fixes). A second test introduces a nonexistent reference which *should* raise an IntegrityError. This should allow us to test for a good solution when we have one.
  • For now, I did not include the parts of Karen's patch that fix other tests.
  • The only part of the operation that requires us to relax the foreign key check in loaddata is when we save each object to the db. Therefore, I'm only wrapping the foreign_key_check disable around that line.
  • I add a flag foreign_key_checks_disabled to the connection object, which is set to true while it's in effect. This allowed me to do some work after the fixtures were loaded to re-check the constraints.
  • I left the method names as is for now, but I'm not sure I agree that begin/end_defer_constraint_checks is the best name. There is an implication here that constraint checks have merely been deferred and will run at some point without user intervention, which is simply not true. I feel like we should just use the straightforward disable_foreign_key_checks and enable_foreign_key_checks since that's exactly what we are doing.

I attempted to re-save each object as a way to get the constraint checks to be run after disabling foreign_key_checks. Sadly this did *not* work, though it was close enough to working that I wonder if it doesn't have potential to work? I outputted the SQL statements being sent and here's what the test fixtures_regress.TestFixtures.test_loaddata_raise_when_fixture_has_forward_refs was doing:

SET foreign_key_checks=0
SELECT (1) AS `a` FROM `fixtures_regress_book` WHERE `fixtures_regress_book`.`id` = 1  LIMIT 1
INSERT INTO `fixtures_regress_book` (`id`, `name`, `author_id`) VALUES (1, 'Cryptonomicon', 3)
SELECT (1) AS `a` FROM `fixtures_regress_person` WHERE `fixtures_regress_person`.`id` = 4  LIMIT 1
INSERT INTO `fixtures_regress_person` (`id`, `name`) VALUES (4, 'Neal Stephenson')
SET foreign_key_checks=1
UPDATE `fixtures_regress_book` SET `name` = 'Cryptonomicon', `author_id` = 3 WHERE `fixtures_regress_book`.`id` = 1 
UPDATE `fixtures_regress_person` SET `name` = 'Neal Stephenson' WHERE `fixtures_regress_person`.`id` = 4

So basically it firsts inserts the records with foreign_key_checks off, then foreign_key_checks is reenabled and it updates them.

The second round of updates *should* raise an IntegrityError, since author_id = 3 does not exist -- but unfortunately, they do not. Interestingly, if you attempt to update the book record with either author_id = 3 or author_id = 4 both will work without raising an error; however, any other id will raise the error. Somehow, for reasons I don't quite understand, MySQL is not performing a foreign key constraint check here when the id value is not changed.

If anyone has any deeper MySQL insights as to why this might be, perhaps there is a solution -- or perhaps this is a dead end.

I'll try to think on it some more later today. The patch still needs a fair amount of work obviously, but hopefully the tests I've included provide an easier framework to explore a solution.

@Chris - I took a closer look at your solution yesterday. My main concern, aside from the slowness you mention, is that recreating the index impacts data outside those records being loaded by the fixture. This makes it less than ideal I think, esp. since loaddata can be used outside of testing e.g. to dump initial data into a production db when sync_db is run. IDK, it just seems like there might be a solution out there that checks the integrity of only those records we have just loaded.

Changed 4 years ago by jsdalton

Not there yet, but includes better tests

comment:24 in reply to: ↑ 23 Changed 4 years ago by acdha

Replying to jsdalton:

I attempted to re-save each object as a way to get the constraint checks to be run after disabling foreign_key_checks. Sadly this did *not* work, though it was close enough to working that I wonder if it doesn't have potential to work?

My understanding is that MySQL only applies the checks when the row's data actually changes.

@Chris - I took a closer look at your solution yesterday. My main concern, aside from the slowness you mention, is that recreating the index impacts data outside those records being loaded by the fixture. This makes it less than ideal I think, esp. since loaddata can be used outside of testing e.g. to dump initial data into a production db when sync_db is run. IDK, it just seems like there might be a solution out there that checks the integrity of only those records we have just loaded.

I'd like to see that as well but I'm not sure there's a good option unless there's a big change coming from upstream or essentially recreating the fkey checks in Python. One idea which I had was simply to create a "recreate_indexes" management command which could be used as a post-load step. It'd be less than ideal but would provide a simple way for users to know when they have invalid data, and could be used with loaddata so you could ensure that your database was correct and that any errors reported after loading a fixture were caused by the fixture.

comment:25 Changed 4 years ago by jsdalton

Okay, I'm uploading a new patch which, while a bit rough around the edges, solves the core problem of executing constraint checks on table after data has been loaded to it with foreign_key_checks off. It passes both tests that I introduced (loading fixtures with future refs work, loading fixtures with nonexistent references errors) and since it entails just using a few SELECT statements it should be highly performant.

Patch to follow, but here's a brief rundown.

The basic approach was cribbed from this answer http://stackoverflow.com/questions/2250775/force-innodb-to-recheck-foreign-keys-on-a-table-tables . Whoever wrote the answer must be a DBA who loves wrapping it all up in a neat and tidy stored procedure package, but it took me about 10 minutes or so to sit down and unpack from all the abstractions. At its core, all it does is select all the key columns for a given table, then it does a SELECT on each of those that joins them to each other when the referring column id is null. This is what that statement looks like when I use concrete values (for a test table I have):

SELECT * FROM `fixtures_regress_book` as REFERRING
LEFT JOIN `fixtures_regress_person` as REFERRED
ON (REFERRING.`author_id` = REFERRED.`id`)
WHERE REFERRING.`author_id` IS NOT NULL
AND REFERRED.`id` IS NULL

Not so bad. Basically, any rows this statement return represent bad data.

Implementing this wasn't too big of a deal. In loaddata, for each model that was loaded we now run the check on the associated table. The check is implemented in the MySQL connection class as check_foreign_key_constraints_for_table(). It first grabs all the key columns via a new method on the MySQL Introspection class, and then executes a version of the above SQL SELECT on each key column. If any bad rows are found, it raises an IntegrityError.

This patch still needs work to clean a few up and ensure there is enough test coverage. Specifically:

  • check_foreign_key_constraints_for_table() is kind of weak right now. It probably should not raise IntegrityError directly, but instead just return an empty list if all goes well and a list of bad rows if not. The IntegrityError is probably best handled in whatever code is implementing (i.e. loaddata can figure out what to report)
  • We should probably strive to output a meaningful list of bad data.
  • The mysql.DatabaseIntrospectionIntrospection.get_key_columns() might need testing. I just extracted it from get_relations() which was already running the correct statement but not returning results in a form that could be used for this.
  • We need to do whatever we need to do elsewhere in the test suite to make sure MySQL isn't breaking when forward references are loaded (since not all cases uses loaddata).

I'd really like some feedback before I plunge into any next steps to make sure this sounds like I'm on the right track. Please do let me know if you have any thoughts on this approach.

I'm really hopeful that the basic solution has been nailed though.

Changed 4 years ago by jsdalton

Still needs a bit of work, but this is basically a solution

comment:26 Changed 4 years ago by jsdalton

  • Patch needs improvement unset

Okay, an improved patch is now attached. I'm unchecking the patch needs improvement flag; if anyone has any issues or concerns please feel free to point them out and re-check the flag and I will take a look.

As mentioned in my previous comment, this patch allows for loading of data with forward references in MySQL by turning foreign key checks off temporarily. It also offers a mechanism to check the data after it was entered to ensure invalid foreign keys were not entered while checks were disabled.

The improved patch just cleans up a few rough edges from the previous version.

  • Added tests for DatabaseIntrospection.get_key_columns()
  • Create meaningful error message when invalid row found. The message includes which row, values and columns triggered the missing foreign key error, which is useful for debugging bad data in fixture.
  • Raise error on first bad row. As soon as check_for_invalid_foreign_keys() hits a bad row, it raises an error. Reason for this discussed below.
  • Tweaked the API. The API is now more explicit: disable_foreign_key_checks() and enable_foreign_key_checks(). These now return boolean values (always True) when implemented. This means you can determine if foreign key checks were disabled and then re-enable and check for invalid foreign keys only if they were.
  • Added tests disable/enable checks around necessary tests. I ran through the test suite and tried to find places where MySQL required this pattern for the test to load properly. I found one or two spots that Karen had in her original patch and included those.
  • Document new methods. Documentation was added as needed.

I didn't want to spend any more time gold plating this, but here are possible ways this could be improved. I don't think any are really necessary or in the scope of this ticket though.

  • Improve fixture error reporting. As mentioned above, I decided to trigger foreign key errors when using check_for_invalid_foreign_keys() instead of collecting all bad rows and reporting them in the loaddata error. The reason is that if we are going to make a feature for MySQL like this, it should probably be implemented in all backends. It could be useful when debugging bad fixture data, but I think it's unnecessary.
  • Scope foreign key checks to given IDs. Right now, check_for_invalid_foreign_keys() looks at the entire table and raises an IntegrityError if it finds any bad data. loaddata would then roll back the fixture load. An issue could arise where bad data is already preexisting in the table where data is being loaded, and thus new data would not be loaded because of the IntegrityError this would raise, even though the new fixture data loaded does not actually have any problems. Conceivably, you could pass a list of IDs along with the table name and limit the checks to only those, but I felt this was an edge case and not worth the complexity.
  • Implement as context manager. It'd be cool to do...
with connection.foreign_key_checks_disabled():
    # Load some data with forward references

...and then implement the checks in the __exit__ method of the context manager, but I couldn't think of a clean way to pass any of the tables touched during any load data activities back to that method for checks. Since this is really only used in loaddata and not much anywhere else it's also probably overkill IMO.

I've tested this on MySQL 5.1.4 with InnoDB tables and it looks good. As I said, if anyone spots any issues let me know.

Changed 4 years ago by jsdalton

comment:27 follow-up: Changed 4 years ago by jacob

A few specific comments on the patch beyond what I said on the mailing list:

  • I agree that the other features aren't in scope. Perhaps they could be added later, but keeping this patch as small as possible is a good thing.
  • I don't quite get why you set connection.foreign_key_checks_disabled and *also* return whether they've been disabled (in disable_foreign_key_checks()). Seems one or the other would be sufficient.
  • I think you're missing a "()" in tests/regressiontests/serializers_regress/tests.py on line 388.
  • I think this deserves a mention in the docs -- probably in the database notes document? We're essentially tricking MySQL into doing something it technically can't do, so we should document it somewhere.

Thanks again!

comment:28 in reply to: ↑ 27 Changed 4 years ago by jsdalton

  • Patch needs improvement set

Replying to jacob:

  • I don't quite get why you set connection.foreign_key_checks_disabled and *also* return whether they've been disabled (in disable_foreign_key_checks()). Seems one or the other would be sufficient.

I had previously checked that boolean value before determining whether to re-enable and check for invalid foreign keys and later opted to instead go with a return value from the original call to disable. I felt this was a stronger way to represent that the enable/check routines only run if the disable call did its thing. I was going to remove the boolean attribute, but my only concern was if these methods were ever called e.g. in separate functions where perhaps the second function didn't have access to the return value of disable_foreign_key_checks(). It just seems like if you do disable something as critical as this, we should provide a means to determine whether the current state is disabled or enabled.

  • I think you're missing a "()" in tests/regressiontests/serializers_regress/tests.py on line 388.

Thanks. Fixed.

  • I think this deserves a mention in the docs -- probably in the database notes document? We're essentially tricking MySQL into doing something it technically can't do, so we should document it somewhere.

I'll take a look at that section of the docs and take a stab at writing up an explanation. BTW I'm not sure I agree that it's really a trick :). I mean we're essentially just using a pretty straightforward SELECT statement that is telling us if we have bad data. I don't think it's a trick or a hack, it's just applying an existing capability of a SQL-based database. But I agree a short explanation would be useful.

I'll try to get an improved patch up later tonight or tomorrow. Thanks for your input Jacob.

comment:29 Changed 4 years ago by jsdalton

  • Patch needs improvement unset

Okay, uploading a patch that addresses the points Jacob made above, as well as a few minor things brought up in the mailing list discussion here: http://groups.google.com/group/django-developers/browse_thread/thread/4b0870f599021f61 .

At this point we're hoping to get some people to test out the patch on a few different MySQL implementation. It has not yet been tested on MySQL 4.x to my knowledge. If you have a chance to test and can report back success or failure, we should be in a position to move this to ready for checkin.

Changed 4 years ago by jsdalton

comment:30 follow-up: Changed 4 years ago by jacob

Well, the good news is that I've got a MySQL setup now and can verify that this patch fixes the bug for me as well. The bad news, however, is that I'm now getting test failures on everything *besides* MySQL. For example, python2.7-sqlite:

======================================================================
FAIL: test_loaddata_raise_when_fixture_has_forward_refs (regressiontests.fixtures_regress.tests.TestFixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jacob/Projects/Django/upstream/tests/regressiontests/fixtures_regress/tests.py", line 392, in test_loaddata_raise_when_fixture_has_forward_refs
    stderr.getvalue().startswith('Problem installing fixture')
AssertionError: False is not True

This fails on both SQLite and PostgreSQL 9. Is it just a test that ought to only run on MySQL?

comment:31 in reply to: ↑ 30 ; follow-up: Changed 4 years ago by jsdalton

  • Patch needs improvement set

Replying to jacob:

This fails on both SQLite and PostgreSQL 9. Is it just a test that ought to only run on MySQL?

Hrm. I can't believe I forgot to test this on SQLite before uploading the patch. Apologies.

Also that method name test_loaddata_raise_when_fixture_has_forward_refs() is way off. It should be called test_loaddata_raises_error_when_fixture_has_invalid_foreign_key() or something quite close to that. Basically it loads a fixture that I created that purposefully introduces a nonexistent foreign key.

So here's the thing: Part of why I didn't test on other backends is because I made the faulty assumption that they would not allow a bad foreign key to be loaded. I assumed they would raise an error and thus this test would pass, even though they don't implement all that other junk we added for MySQL.

I would observe that all backends *should* raise an error if you try to load an invalid foreign relation into a table, and the fact that this test is failing indicates some other problem is at work. It may (I don't know) be an artifact of the test environment? I mean, maybe it's something to do with the transaction we wrap things in to manage tests, so those constraints don't get checked? You or someone else might have a better idea what could be underlying this.

For now though I'll argue that this test is meaningful and should pass on all backends, so we should really try to figure out what the problem is rather than skipping it for other backend. Please do let me know if you or anyone else have any ideas or insights? If no one has anything I'll try to take a deeper look tonight.

I will make that change to the test method name for the sake of clarity but I'll hold off uploading a new patch until we've managed to clarify this new issue. Marking patch as needing improvement for the time being.

comment:32 in reply to: ↑ 31 Changed 4 years ago by ramiro

Replying to jsdalton:

I would observe that all backends *should* raise an error if you try to load an invalid foreign relation into a table, and the fact that this test is failing indicates some other problem is at work. It may (I don't know) be an artifact of the test environment? I mean, maybe it's something to do with the transaction we wrap things in to manage tests, so those constraints don't get checked? You or someone else might have a better idea what could be underlying this.

I think so, at least in Postgres, see #11665 -- opinions about the latest patch attached to it are welcome.

comment:33 Changed 4 years ago by jsdalton

  • Patch needs improvement unset

Okay, new patch. The fundamental implementation has remained the same, but there are a few tweaks and other changes that were made both to Jacob's comments here as well as some of the conversation on IRC with regards to #11665. This ticket and that are converging to a certain extent and some of the alterations here reflect what is happening there to an extent.

Here are some of the key points about this latest patch:

  • This patch should now pass all tests for all backends. Some structural tweaks have been made to accommodate this.
  • The API has also changed as a result. I have renamed the primary methods at work here to disable_constraint_checking() (was disable_foreign_key_checks()), enable_constraint_checking() (was enable_foreign_key_checks()), and check_constraints() (was check_for_invalid_foreign_keys()). The former API was MySQL specific; the new API is more generalized for all backends.
  • The default implementation is now for disable_constraint_checking() to return True and do nothing else. Backends can override this to implement any steps (e.g. "SET foreign_key_checks=0") necessary to accomplish this, and return True if it is necessary for check_constraints to be executed after checks have been re-enabled. If disable_constraint_checking returns False, these checks will not be run.
  • check_constraints() is now implemented on the base DatabaseWrapper class. It accepts a list of table names and raises an IntegrityError if any of the tables have missing foreign key relations.
  • This base implementation is backend-agnostic -- it will run properly (presumably) on any SQL backend. The only requirement is that the backend implement a get_key_columns() method on its Introspection class, which should return a tuple of (column_name, referenced_table_name, referenced_column_name) for each key column in the given table.
  • The base Introspection class implements get_key_columns() as simply returning an empty list. This means that, by default, the current patch will only run if either the DatabaseWrapper overrides the check_constraints() method, or the Introspection class overrides get_key_columns() as described above. check_constraints() will do nothing otherwise, since it will just iterate over an empty list.

All of the above should be covered under the tests that are present here, and with this patch SQLite is now supported (i.e. fixtures are checked for problems after the load).

Postgresql will be pretty easy via what's going on in #11665. This can just be handled by overridding check_constraints.

Hopefully someone with an Oracle implementation can whip up something fairly easy. As described above, implementing get_key_columns is all that's really required. The same holds true for any third party implementations as well.

Let me know if there are any other questions or issues here. Flipping the patch needs improvement flag back to 0 now.

Changed 4 years ago by jsdalton

comment:34 follow-up: Changed 4 years ago by ramiro

Unfortunately I don't have time at all to play with this. Dumping some notes here:

I've tried the v5 patch against sqlite3 and it creates/uncovers a bunch and errors and failures in our test suite.

Also, it makes errors like the following one to be reported when loading certain regressiontests/admin_views fixtures (admin-views-books.xml) but they don't cause the test cases to fail. The traceback isn't very useful:

Problem installing fixture 'tests/regressiontests/admin_views/fixtures/admin-views-books.xml': Traceback (most recent call last):
  File "django/core/management/commands/loaddata.py", line 177, in handle
    obj.save(using=using)
  File "django/core/serializers/base.py", line 169, in save
    models.Model.save_base(self.object, using=using, raw=True)
  File "django/db/models/base.py", line 529, in save_base
    rows = manager.using(using).filter(pk=pk_val)._update(values)
  File "django/db/models/query.py", line 491, in _update
    return query.get_compiler(self.db).execute_sql(None)
  File "django/db/models/sql/compiler.py", line 888, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "django/db/models/sql/compiler.py", line 754, in execute_sql
    cursor.execute(sql, params)
  File "django/db/backends/sqlite3/base.py", line 226, in execute
    return Database.Cursor.execute(self, query, params)
IntegrityError: column chap_id is not unique

testCustomAdminSiteView (regressiontests.admin_views.tests.CustomModelAdminTest) ... ok

To reproduce, run the admin_views tests, all the tests methods from CustomModelAdminTest suffer from this (--verbosity 2 helps to see that). The IntegrityError's appear only when check_constraints() is run.

Some tickets I've found that are related to this (some of them very tangentially): #14204, #16026, #8317, #15933.

Perhaps efforts in this ticket can be focused only on enhancing the situation of tests with MySQL (i.e. move back the new sql code checks in check_constraints() to the mysql backend) and leaving the other backend alone for now?

Also, there is this thread about some efforts that go somewhat in the opposite direction: Avoid the overhead associated with fixture (re)loading (and if work on this ticket lands maybe also foreign key constraint checking) before _every_ test method when fixtures are something that essentially don't change. The DB used in such experiments is precisely MySQL: http://groups.google.com/group/django-developers/browse_thread/thread/65a36a8a5433144c/81bc4d2d4fd8b866

Last edited 4 years ago by ramiro (previous) (diff)

comment:35 in reply to: ↑ 34 Changed 4 years ago by jsdalton

  • Patch needs improvement set

Ugh, what a mess.

Something funky is going on here and I'm not sure what. The tests in fixtures_regress are failing now using my patch with sqlite. Something that's happening with the call to get_indexes is mucking with transaction or something, because fixture data is not getting properly cleared between tests when that method is called. This is likely what is behind the problems you are seeing elsewhere, I have to imagine.

Let me take a closer look at what is happening with SQLite and see if it can't be salvaged. Certainly it's quite easy to not enable constraint checking for sqlite, but I think a solution is within reach.

I'm just annoyed that I uploaded the patch with breaking tests. I could have sworn I tested sqlite several times with it. Maybe I introduced a bug somewhere, or else I just ran some of the test methods in isolation. Hm.

Anyhow, I'll see what I can do.

comment:36 Changed 4 years ago by jsdalton

Just to ensure nobody wastes any undue time on this, I've tracked down the problem with the SQLite implementation of this.

What's happening is check_constraints() is calling get_indexes() on the SQLite Introspection class. The implementation of get_indexes() executes a PRAGMA statement (a few actually).

From what I've been able to determine by testing a few things out, PRAGMA is the problem. It appears that calling PRAGMA is somehow committing the transaction prematurely. The result is that the fixtures being loaded here are being carried over test to test.

I can't find anything online about why PRAGMA statements would do this (or even whether they do). However, I did run these tests a bunch of times and the problem *only* arose once the PRAGMA statement was executed.

In the a.m. I can re-think this patch a bit and perhaps just focus on MySQL for the time being.

comment:37 Changed 4 years ago by jsdalton

Okay, fixed the patch and it is back to working for both MySQL and SQLite. Again, my apologies for the sqlite last time around. I ran the entire test suite this time around before uploading this, no errors or unexpected failures.

Here's a brief run down on the changes in this patch:

  • The problem for SQLite, as I mentioned, was the PRAGMA statement used in get_indexes(). Someone who knows more than me can probably explain how or why that statement causes the transaction to commit immediately.
  • To get around this, I added a new method to the BaseDatabaseIntrospection class called get_primary_key_column() and implemented it for both MySQL and SQLite. This method replaces the call to get_indexes() in the default check_constraints() method. So now, if a backend wants to use check_constraints(), it has to implement both get_key_columns() and get_primary_key_column in its Introspection class. If one or both of these is not implemented, then check_constraints() will do nothing for all intents and purposes.
  • The new implementation of get_primary_key_column() uses an alternative way of getting the primary key, avoiding PRAGMA. No more problems with fixtures and transactions.

I'm leaving this as needs improvement for the time being because a few additional decisions need to be made with regards to the API and how we work with other backends not covered here:

  • This patch should work with any other backend as is; however, there will be a test failure in fixtures_regress.TestFixtures.test_loaddata_raises_error_when_fixture_has_invalid_foreign_key() if the backend does not either override check_constraints() or implement both get_key_columns() and get_primary_key_column.
  • This brings me to a second issue/question...how does this relate to the database feature can_defer_constraint_checks (which I only recently discovered)? The way I've written the API implies that both MySQL and SQLite now support this. In a way they do, if they are used properly (i.e. explicitly disable, then enable, then check). I don't know enough about how this feature is used throughout the rest of Django to understand how my changes might work with that.

In summary though, I believe all of the elements are here in place to defer constraint checks on all backend and to then check them immediately after for invalid data (with the exception of Oracle actually). It should be doable on Oracle, SQL Server and any other SQL-based backend pretty easily though. We might need someone with a deeper knowledge of django.db than I to come in and figure out how to orchestrate the last mile here, but I do think we're close.

ramiro or jacob, maybe one of you guys can take a closer look?

Changed 4 years ago by jsdalton

comment:38 follow-up: Changed 4 years ago by graham_king

With this patch, I'm getting errors in the multiple_database tests. (./runtests.py multiple_database). The error is always an IntegrityError 'Cannot add or update a child row: a foreign key constraint fails', for various keys. I'll try and narrow it down further. My settings.DATABASES has two entires, a 'default' and an 'other', otherwise these tests can't run.

All the other tests pass, which is a huge improvement on behavior without this patch, so congrats and thanks on getting this far.

I'm using MySQL 5.1.54 / InnoDB on Ubuntu.

comment:39 in reply to: ↑ 38 Changed 4 years ago by jsdalton

Replying to graham_king:

With this patch, I'm getting errors in the multiple_database tests. (./runtests.py multiple_database). The error is always an IntegrityError 'Cannot add or update a child row: a foreign key constraint fails', for various keys. I'll try and narrow it down further. My settings.DATABASES has two entires, a 'default' and an 'other', otherwise these tests can't run.

All the other tests pass, which is a huge improvement on behavior without this patch, so congrats and thanks on getting this far.

I'm getting the same errors in the multiple_database tests using InnoDB with or without my patch. Not sure it's related or not, but I'll take a closer look as well. FWIW, I'm noticing I don't get any errors when I run the tests in isolation, e.g.

$ ./runtests.py --settings=testproject.settings multiple_database.QueryTestCase.test_generic_key_reverse_operations

Whereas if I do this:

$ ./runtests.py --settings=testproject.settings multiple_database.QueryTestCase

The test_generic_key_reverse_operations() raises an IntegrityError. Something is not happening right with the database between tests, but I'm not sure what that is either.

comment:40 Changed 4 years ago by jsdalton

Okay, made some additional changes tonight:

  • Added support for PostgreSQL and Oracle. With these additions, all backends support a check_constraints() method to verify foreign keys. Note that Oracle *has not* been tested because I don't have it. Also note that simply doing SET CONSTRAINTS ALL IMMEDIATE for postrgresql caused some issues. Mainly, I think these were due to the fact that we catch errors and continue our business running tests. So I added a finally block to ensure that the constraint check gets reset back to deferred after we're done checking it.
  • I also reworked the API. Since 1.4 supports Python 2.5 and up we can safely use context managers, and this API makes a ton more sense as a context manager. In particular, it does a much better job of ensuring constraint checks are re-enabled if an exception occurs while they are turned off. I also reworked a few other areas and streamlined others so that everything works together more seamlessly across all backends.

Changed 4 years ago by jsdalton

comment:41 Changed 4 years ago by jsdalton

  • Patch needs improvement unset

Here's a last patch that addresses and fixes this issue specifically. It also ensures that all backends provide a check_constraints() implementation, which is used at the end of a fixture load to ensure that the fixtures entered do not contain invalid foreign key references. The new tests pass on all backends and the full test suite passes on Postgresql as well as SQLite. Relevant tests pass on MySQL.

Minor change from last version:

  • Made the enable/disable_constraint_checks methods public. In some additional work I'd like to do to improve the test framework these methods will need to be used outside of their class so they should not be private.
  • Added from __future__ import with statement where needed to support Python 2.5.
  • Removed try/finally block from check_constraints() in Postgresql and Oracle, since it was unneeded. The second statement, which reverts back to DEFERRED, *is* needed for the happy condition, since we need to go back to DEFERRED for tests to pass. However, if IMMEDIATE raises an IntegrityError, the transaction gets rolled back anyway and the next transaction will open with DEFERRED, so a finally clause is not needed.

There still remains the issue that our backends have fundamentally inconsistent approaches toward constraint checking during test execution. MySQL InnoDB checks constraints immediately during tests, while Oracle, Postgresql and SQLite defer constraint checks until the end. This patch preserves the status quo, since addressing it is beyond the scope of the ticket (#11665 will hopefully address it shortly). The point is, this implementation will have to change when and if the approach to constraint checking changes. So for now, the patch solves the fixture loading issue, but represents an interim solution if we go on to change the way constraint checks are handled.

Unmarking patch needs improvement because this applies clean and is ready to go. Hopefully someone else can review and mark as ready for checkin.

Changed 4 years ago by jsdalton

comment:42 Changed 4 years ago by jsdalton

In pursuing a bit of work for #11665, the solution to which requires use of the constraint check features introduced here, I need to make a few small changes to this patch. I also added more direct testing for these features, which revealed a minor bug (the wrong type of IntegrityError was being raised.)

Patch is still ready for review...

Changed 4 years ago by jsdalton

comment:43 Changed 4 years ago by ramiro

  • Triage Stage changed from Accepted to Ready for checkin

comment:44 follow-ups: Changed 4 years ago by russellm

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

@jsdalton -- first off, a *huge* thank you for all your work on this patch (and #11665 for that matter)

One quick observation with this patch -- it looks to me like it does a database-wide enable/disable of *all* constraints. While this isn't a problem for a Django-only database, there are plenty of features in Django that exist purely to support the fact that Django might not be the only resident in a database (e.g., managed tables). Won't this patch obliterate any constraint policies that have been deliberately installed, and then unilaterally restore them to DEFERRED, regardless of their original values?

comment:45 in reply to: ↑ 44 Changed 4 years ago by jsdalton

@russellm - I don't believe so. According to the Postgres docs (http://www.postgresql.org/docs/current/static/sql-set-constraints.html), the SET CONSTRAINTS statement is *only* relevant to constraint checks within the current transaction, despite what the wording implies.

SET CONSTRAINTS sets the behavior of constraint checking within the current transaction. IMMEDIATE constraints are checked at the end of each statement. DEFERRED constraints are not checked until transaction commit. Each constraint has its own IMMEDIATE or DEFERRED mode.

Furthermore:

This command only alters the behavior of constraints within the current transaction. Thus, if you execute this command outside of a transaction block (BEGIN/COMMIT pair), it will not appear to have any effect.

The alterative to SET ALL is SET a list of constraints. This just effects constraint checking for a subset of constraints within the transaction. Either way, this isn't going to impact anything outside the transaction, let alone other tables.

comment:46 in reply to: ↑ 44 Changed 4 years ago by anonymous

Replying to russellm:

@jsdalton -- first off, a *huge* thank you for all your work on this patch (and #11665 for that matter)

One quick observation with this patch -- it looks to me like it does a database-wide enable/disable of *all* constraints. While this isn't a problem for a Django-only database, there are plenty of features in Django that exist purely to support the fact that Django might not be the only resident in a database (e.g., managed tables). Won't this patch obliterate any constraint policies that have been deliberately installed, and then unilaterally restore them to DEFERRED, regardless of their original values?

Additional info, if it is of help here:
The constraint itself can be defined to be either INITIALLY DEFERRED or INITIALLY IMMEDIATE, with a default to IMMEDIATE, That is set either on CREATE or ALTER for that specific object. SET CONSTRAINTS only has effect within the current transaction, so effectively overriding the global setting for the constraint.

comment:47 Changed 4 years ago by jsdalton

@russelm - Okay, I've tried to address your concerns about the behavior of SET CONSTRAINTS in the following Python script: http://paste.pocoo.org/show/448792/ . To run, just create a Postgres database called "testdb" or change the dbname to something else in lines 5 and 6.

Basically we are trying to demonstrate the hypothesis that the SET CONSTRAINTS statements will in no way alter of affect the constraint policies on an existing table. In my script I tried running through SET CONSTRAINTS both inside and outside a transaction and at every step print the deferrability of an existing table column. No change whatsoever. I tried to throw a variety of things at it to ensure a thorough demonstration. The script should be pretty self explanatory.

This pretty much demonstrates the behavior of SET CONSTRAINTS as described in the Postgresql docs. SET CONSTRAINTS only affects constraint checking within the given transaction; it has no affect on anything outside the transaction.

If anyone remains unconvinced that SET CONSTRAINTS does not affect anything aside from constraint checks within the current transaction, let me know your objection and I'll attempt to demonstrate it. I believe this is the only remaining obstacle for this ticket to be committed.

comment:48 Changed 4 years ago by kmtracey

  • Resolution set to fixed
  • Status changed from new to closed

In [16590]:

Fixed #3615: Added support for loading fixtures with forward references on database backends (such as MySQL/InnoDB) that do not support deferred constraint checking. Many thanks to jsdalton for coming up with a clever solution to this long-standing issue, and to jacob, ramiro, graham_king, and russellm for review/testing. (Apologies if I missed anyone else who helped here.)

comment:49 follow-up: Changed 4 years ago by Kirill Panshin <kipanshi@…>

Great that this bug was finally resolved.

We faced the same problem and I managed to port this fix to Django 1.3.1, as we need this for production. There still need some testing, but is it worth to make a patch for future support releases of 1.3.X ? I think that this might be useful as quite a lot users use MySQL still.

comment:50 in reply to: ↑ 49 ; follow-up: Changed 4 years ago by ramiro

Replying to Kirill Panshin <kipanshi@…>:

Great that this bug was finally resolved.

We faced the same problem and I managed to port this fix to Django 1.3.1, as we need this for production. There still need some testing, but is it worth to make a patch for future support releases of 1.3.X ? I think that this might be useful as quite a lot users use MySQL still.

Maintenance branches like 1.3.X don't receive new features like this. But if you want to share your work with other MySQL users you always can maintain it the code sharing/public repository of your choice. Fortunately (precisely for the same reasons: Code in these branches isn't subject to heavy-weigh refactorings, it is mostly static) there is a very low maintenance burden, you've already done the harder part by porting it from trunk.

comment:51 in reply to: ↑ 50 Changed 4 years ago by Kirill Panshin

Replying to ramiro:

Maintenance branches like 1.3.X don't receive new features like this. But if you want to share your work with other MySQL users you always can maintain it the code sharing/public repository of your choice. Fortunately (precisely for the same reasons: Code in these branches isn't subject to heavy-weigh refactorings, it is mostly static) there is a very low maintenance burden, you've already done the harder part by porting it from trunk.


Thanks Ramiro. I've forked and updated my fork, so here is it: https://github.com/kipanshi/django/tree/1.3.1.1

To install via pip:

pip install -e git+git://github.com/kipanshi/django.git@1.3.1.1#egg=django-1.3.1.1
Note: See TracTickets for help on using tickets.
Back to Top