Opened 14 years ago
Closed 9 years ago
#14415 closed Bug (fixed)
Multiple aliases for one database: testing problems
Reported by: | Shai Berger | Owned by: | Boaz Shvartzman |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | multidb, multiple databases, multiple aliases |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
In a setting where the multiple-databases feature is used to create multiple aliases for one database, the testing framework can get a little confused.
For ease of reference, assume we have the following dictionary defining database parameters:
db_params = dict ( NAME = 'main', ENGINE = 'django.db.backends.whatever', USER = 'user', PASSWORD = 'password', HOST = '', PORT = '' )
The relatively benign cases
In these cases, the test framework just treats the two aliases as separate databases; which means it first creates the test database for one alias, then tries to create the test database for the second one, running into an error because (of course) it already exists. The user is asked to choose between destroying the existing database (in order to recreate it) and canceling the test. In cases where database routers constrict models to a specific alias, this may completely prevent testing (or at least some of the tests).
This happens if at least one of two conditions hold:
The dictionaries used to define the alias are distinct
That is, the databases are defined by
DATABASES = { 'default' : db_params, 'other' : db_params.copy() }
or some equivalent
Test name is explicitly specified
That is, db_params
looks like:
db_params = dict ( NAME = 'main', TEST_NAME = 'test_main', ... # same as above )
The databases may then be specified by the more natural
DATABASES = { 'default' : db_params, 'other' : db_params }
The more dangerous case
In this case, the testing framework creates spurious databases and
finalizes be destroying the main, non-test database.
This happens when none of the above applies -- that is, db_params
does not include
the TEST_NAME
entry, relying on the default test_
prefix addition, and databases
are specified by two references to the same dictionary (as in the last definition above).
Regretfully, one may expect this to be the common use case in a multiple-aliases-for-one-database
approach.
In detail: With the definitions above, the testing framework first creates a test_main
database, then a test_test_main
database, and runs its tests with both of them present; then, during tear-down, it drops the test_test_main
and *main
* databases, leaving test_main
in place.
A word for fixers
For whoever wants to work on this ticket (perhaps myself...), Russel Keith-Magee has said that
...this almost rates as an 'easy pickings' ticket. This is a situation where you'll get a pass on having a test as part of the patch; testing test setup is one of those areas that's almost impossible to validate, so whoever commits (probably me) will just take it for a very good walk before checking in.
Attachments (4)
Change History (31)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 14 years ago
Triage Stage: | Accepted → Design decision needed |
---|
by , 14 years ago
Attachment: | 14415.1.diff added |
---|
comment:3 by , 14 years ago
Has patch: | set |
---|
comment:4 by , 14 years ago
milestone: | → 1.3 |
---|---|
Patch needs improvement: | set |
I think you've tried to solve this in the wrong place. Rather than fixing this at the level of the database backend, it seems to me that the solution lies in fixing the test runner's setup_databases call. There is already some handling for avoiding duplicate database setup (via the handling for the TEST_MIRROR setting); it seems to me that the problem described here is really just an TEST_MIRROR that is implied because the database name is the same.
So - if we refactor the handling of TEST_MIRROR so that we pre-evaluate the list of database that need to be created and the list that needs to be redirected, we should be able to avoid this problem entirely.
Of course, this is just based on first inspection; there might be some detail that I'm missing. I'll take a closer look later tonight (my time).
Also - bumping to 1.3 milestone because of the potential for unintentional data loss.
comment:5 by , 14 years ago
comment:6 by , 14 years ago
I think Russell's approach is closer to home, but there are still problems.
For starters, the patch has a "surface" problem of identifying databases by their (Engine,Name) combination. The current docs about TEST_MIRROR (http://docs.djangoproject.com/en/dev/topics/testing/#testing-master-slave-configurations) show an example where this would be invalid (same name, different hosts). To fix this, it seems, we would need a notion of equivalence of settings -- ==
between dicts seems close enough for my taste (if a user defines two MySql databases, one with PORT=''
and another with PORT='3306'
, I think they deserve the brouhaha they'll get), but I may be wrong (different OPTIONS'es, for example, for different transactional behavior, would seem legitimate).
But there is a deeper problem: solving the issue by directing the actual connections -- not just fixing the creation/destruction issue -- would cause the test process to have different transaction semantics. Consider:
def do_stuff(alias1, alias2): with transaction.commit_manually(using=alias1) # Make an invalid change in an object m = MyModel.objects.get(pk=17) m.fld = 'invalid' m.save(using=alias1) # Commit other changes, but not the invalid change transaction.commit(using=alias2) # Roll back the invalid change transaction.rollback(using=alias1) # Check the outcome m1 = MyModel.objects.get(pk=17) assert m1.fld1 != 'invalid'
This will work in production, but fail in testing under t14415.diff.
comment:7 by , 14 years ago
@shai - Good call on the HOST issue - I've made that change; new patch incoming shortly.
However, regarding the connection issue - I'm unclear whether what you consider correct behavior, and under what circumstances.
The 'connection copying' behavior is only performed for the TEST_MIRROR case, and that behavior exists historically. To my mind, this is a bug; using connection obtained from the alias that is a TEST_MIRROR shouldn't affect transactions created using the connection that is being mirrored. The new patch addresses this bug.
However, you seem to be implying that the same connection copying behavior exists for connections with a copied settings dictionary, which isn't the case. The patch I've provided causes your sample code to raise an exception at "transaction.commit(using=alias2)", because the code isn't under transaction management (for alias2) at that point.
Have I misunderstood something here?
comment:8 by , 14 years ago
@russelm -- Sorry, I had misread your original patch; you are right, the connections seem to be handled correctly.
I'll look at it again tonight (Israel time), to comment on the change in TEST_MIRROR handling.
One note, though, about adding HOST: While it is, as far as I know, good enough for Django's officially supported DBMSs, it is not enough for SqlServer or Sybase, where a host includes a set of databases, each containing a set of "schemata". Not sure if preparing support for this is easy, and if not easy, if it is worthwhile, but I thought it's worth mentioning.
comment:9 by , 14 years ago
After a second look: Yes, the new TEST_MIRROR behavior seems to be correct and even required to preserve production semantics (interleaved operations on master and slave with no commit may yield different result in production and test with current code).
One other note, though: PORT needs to be treated just like HOST, orthogonal to the SqlServer issue mentioned in my earlier comment.
If there is an easy way to provide tests for these issues, I'd like to do that.
Thanks,
Shai.
comment:10 by , 14 years ago
Fixed the patch, ran some tests with it. It seems to work (though using the same parameters for the two databases makes many tests fail, nothing seems to blow up, and nothing is generated or destroyed more than once).
comment:11 by , 14 years ago
To clarify -- when you say "makes many tests fail", is that a good thing or a bad thing? Is this an indication that the test suite is now running and revealing problems in your code, or that the test setup is broken in some way?
comment:12 by , 14 years ago
I think it's a good thing -- essentially, the Django test suite assumes that different aliases point at different databases. I didn't check every failure in detail, but as an example, there was a test in multiple_database
that failed because an object saved through one alias was accessible through the other.
comment:13 by , 14 years ago
Django's test suite has a very specific set of requirements for the test database, and yes, independence between default and other is one of those requirements. If you're running Django's test suite with a mirrored or aliased database, there will be failures.
Thanks for the help testing this, Shai.
comment:14 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:15 by , 14 years ago
(In [14697]) [1.2.X] Fixed #14415 -- Corrected the process of creating and destroying test databases to prevent accidental deletion of the source database. Also improves the mirroring process to prevent some cross-connection effects. Thanks to Shai Berger for the help diagnosing and testing.
Backport of r14696 from trunk.
comment:17 by , 9 years ago
Easy pickings: | set |
---|---|
Has patch: | unset |
Resolution: | fixed |
Severity: | → Normal |
Status: | closed → new |
Type: | → Uncategorized |
UI/UX: | unset |
Version: | 1.2 → master |
It's quite amazing to realize, more than 5 years later, that the bug was actually not properly fixed.
The problem is in django.db.backends.base.creation
:
def test_db_signature(self): """ Returns a tuple with elements of self.connection.settings_dict (a DATABASES setting value) that uniquely identify a database accordingly to the RDBMS particularities. """ settings_dict = self.connection.settings_dict return ( settings_dict['HOST'], settings_dict['PORT'], settings_dict['ENGINE'], settings_dict['NAME'] )
The decision whether a database is a mirror of another, when TEST['MIRROR']
is not set explicitly, is taken based on the production settings rather than test settings. You are not likely to notice until you try to test separation of a model into a dedicated database -- where you try to set only the test databases to be separate.
comment:18 by , 9 years ago
Could you give an example where the tests currently fail? I'm not very clear on the model separation.
comment:19 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Let's say you have two db's - "default" and "other".
In db "other" you have table x, which you don't have In db "default" (I used db router to accomplish that).
During a test, you are trying to add a row to table x, only on db "other".
Let's assume this is your DATABASES setting:
DATABASES = { 'default': { 'NAME': 'some_name', 'ENGINE': 'django.db.backends.postgresql_psycopg2'', 'USER': ..., 'PASSWORD': ... } } other = DATABASES['default'].copy() other['TEST']['NAME'] = 'other_name' DATABASES['other'] = other
Django will ignore
other['TEST']['NAME'] = 'other_name'
as long as you don't set
other['NAME']
to a value different from
DATABASES['default']['name']
comment:20 by , 9 years ago
I have changed test_db_signature to use
settings_dict['TEST']['NAME']
if set, and
settings_dict['NAME']
as a fallback.
comment:21 by , 9 years ago
Has patch: | set |
---|
comment:22 by , 9 years ago
Patch needs improvement: | unset |
---|
comment:23 by , 9 years ago
Triage Stage: | Design decision needed → Accepted |
---|
comment:26 by , 9 years ago
Needs tests: | unset |
---|
This is what I've found:
I think the maximum that Django can provide is what I've tried to implement in the patch:
DATABASES['alias']['NAME']
andDATABASES['alias']['TEST_NAME']
happens between DB aliases/connections, even when they are described by the same Python dict so in no event the productionDATABASES['alias']['NAME']
DB gets removed.What's described above is generally applicable to Postgres, MySQL and SQlite3.
But after fixing that, there are other problems (at test DB teardown time): In Postgres and MySQL the test databases are removed by using the production one as a pivot (a connection is made to the production DB and from there the test one is DROPped). If you get the setup you describe, it might happen (It happens to me) that 'default' will get is test DB ('test_main') dropped and when 'other' tries to connect to its production one to do the same (remove it 'test_test_main' DB) it won't find it.
Solving this means Django would need to detect and keep track of the fact that one alias' test DB name is the same as other alias' production DB, and DATABASES being a dict, there is no guaranteed order of creation/removal of DBs.
The conclusion I get is this is what you describe as the The more dangerous case and one may expect this to be the common use case in a multiple-aliases-for-one-database approach is enough of an edge case that can't be completely solved by Django. If you really need to create more that one DATABASES aliases to the same database and you decide to represent them by pointing them to the same dict, then maybe you will need to implement your own test runner that will need to have some inside knowledge of this and e.g. will need to setup/tear down the test DBs in the right order, etc.
The Oracle backed has its own tools so (again) the user, not Django, tries to implement a workaround to the scenarios you describe (see wiki:OracleTestSetup Understanding the database test setup section.)