Code

Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#14415 closed (fixed)

Multiple aliases for one database: testing problems

Reported by: shai Owned by: nobody
Component: Testing framework Version: 1.2
Severity: Keywords: multidb, multiple databases, multiple aliases
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

In a setting where the multiple-databases feature is used to create multiple aliases for one database, the testing framework can get a little confused.

For ease of reference, assume we have the following dictionary defining database parameters:

db_params = dict (
    NAME = 'main',
    ENGINE = 'django.db.backends.whatever',
    USER = 'user',
    PASSWORD = 'password',
    HOST = '',
    PORT = ''
)

The relatively benign cases

In these cases, the test framework just treats the two aliases as separate databases; which means it first creates the test database for one alias, then tries to create the test database for the second one, running into an error because (of course) it already exists. The user is asked to choose between destroying the existing database (in order to recreate it) and canceling the test. In cases where database routers constrict models to a specific alias, this may completely prevent testing (or at least some of the tests).

This happens if at least one of two conditions hold:

The dictionaries used to define the alias are distinct

That is, the databases are defined by

DATABASES = {
    'default' : db_params,
    'other' :   db_params.copy()
}

or some equivalent

Test name is explicitly specified

That is, db_params looks like:

db_params = dict (
    NAME = 'main',
    TEST_NAME = 'test_main',
    ... # same as above
)

The databases may then be specified by the more natural

DATABASES = {
    'default' : db_params,
    'other' :   db_params
}

The more dangerous case

In this case, the testing framework creates spurious databases and
finalizes be destroying the main, non-test database.

This happens when none of the above applies -- that is, db_params does not include
the TEST_NAME entry, relying on the default test_ prefix addition, and databases
are specified by two references to the same dictionary (as in the last definition above).
Regretfully, one may expect this to be the common use case in a multiple-aliases-for-one-database
approach.

In detail: With the definitions above, the testing framework first creates a test_main database, then a test_test_main database, and runs its tests with both of them present; then, during tear-down, it drops the test_test_main and *main* databases, leaving test_main in place.

A word for fixers

For whoever wants to work on this ticket (perhaps myself...), Russel Keith-Magee has said that

...this almost rates as an 'easy pickings' ticket. This is a situation where you'll get a pass on having a test as part of the patch; testing test setup is one of those areas that's almost impossible to validate, so whoever commits (probably me) will just take it for a very good walk before checking in.

Attachments (4)

14415.1.diff (2.7 KB) - added by ramiro 4 years ago.
t14415.diff (3.9 KB) - added by russellm 4 years ago.
Alternative solution based around the test runner
t14415.2.diff (4.2 KB) - added by russellm 4 years ago.
Updated patch for test database issue
t14415.3.diff (4.2 KB) - added by shai 4 years ago.
just adding PORT to t14415.2.diff

Download all attachments as: .zip

Change History (20)

comment:1 Changed 4 years ago by gabrielhurley

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 4 years ago by ramiro

  • Triage Stage changed from Accepted to Design decision needed

This is what I've found:

I think the maximum that Django can provide is what I've tried to implement in the patch:

  • Make sure no cross pollution of DATABASES['alias']['NAME'] and DATABASES['alias']['TEST_NAME'] happens between DB aliases/connections, even when they are described by the same Python dict so in no event the production DATABASES['alias']['NAME'] DB gets removed.
  • Don't report an error if the test database doesn't exist when trying to drop it (or, for SQLite3, don't try to remove the DB file).

What's described above is generally applicable to Postgres, MySQL and SQlite3.

But after fixing that, there are other problems (at test DB teardown time): In Postgres and MySQL the test databases are removed by using the production one as a pivot (a connection is made to the production DB and from there the test one is DROPped). If you get the setup you describe, it might happen (It happens to me) that 'default' will get is test DB ('test_main') dropped and when 'other' tries to connect to its production one to do the same (remove it 'test_test_main' DB) it won't find it.

Solving this means Django would need to detect and keep track of the fact that one alias' test DB name is the same as other alias' production DB, and DATABASES being a dict, there is no guaranteed order of creation/removal of DBs.

The conclusion I get is this is what you describe as the The more dangerous case and one may expect this to be the common use case in a multiple-aliases-for-one-database approach is enough of an edge case that can't be completely solved by Django. If you really need to create more that one DATABASES aliases to the same database and you decide to represent them by pointing them to the same dict, then maybe you will need to implement your own test runner that will need to have some inside knowledge of this and e.g. will need to setup/tear down the test DBs in the right order, etc.

The Oracle backed has its own tools so (again) the user, not Django, tries to implement a workaround to the scenarios you describe (see wiki:OracleTestSetup Understanding the database test setup section.)

Changed 4 years ago by ramiro

comment:3 Changed 4 years ago by ramiro

  • Has patch set

comment:4 Changed 4 years ago by russellm

  • milestone set to 1.3
  • Patch needs improvement set

I think you've tried to solve this in the wrong place. Rather than fixing this at the level of the database backend, it seems to me that the solution lies in fixing the test runner's setup_databases call. There is already some handling for avoiding duplicate database setup (via the handling for the TEST_MIRROR setting); it seems to me that the problem described here is really just an TEST_MIRROR that is implied because the database name is the same.

So - if we refactor the handling of TEST_MIRROR so that we pre-evaluate the list of database that need to be created and the list that needs to be redirected, we should be able to avoid this problem entirely.

Of course, this is just based on first inspection; there might be some detail that I'm missing. I'll take a closer look later tonight (my time).

Also - bumping to 1.3 milestone because of the potential for unintentional data loss.

Changed 4 years ago by russellm

Alternative solution based around the test runner

comment:5 Changed 4 years ago by ramiro

(In [14666]) Added printing of the name of test database being created/destroyed, at verbosity level 2. Refs #14415.

comment:6 Changed 4 years ago by shai

I think Russell's approach is closer to home, but there are still problems.

For starters, the patch has a "surface" problem of identifying databases by their (Engine,Name) combination. The current docs about TEST_MIRROR (http://docs.djangoproject.com/en/dev/topics/testing/#testing-master-slave-configurations) show an example where this would be invalid (same name, different hosts). To fix this, it seems, we would need a notion of equivalence of settings -- == between dicts seems close enough for my taste (if a user defines two MySql databases, one with PORT='' and another with PORT='3306', I think they deserve the brouhaha they'll get), but I may be wrong (different OPTIONS'es, for example, for different transactional behavior, would seem legitimate).

But there is a deeper problem: solving the issue by directing the actual connections -- not just fixing the creation/destruction issue -- would cause the test process to have different transaction semantics. Consider:

def do_stuff(alias1, alias2):

    with transaction.commit_manually(using=alias1)

        # Make an invalid change in an object

        m = MyModel.objects.get(pk=17)
        m.fld = 'invalid'
        m.save(using=alias1)

        # Commit other changes, but not the invalid change
    
        transaction.commit(using=alias2) 
    
        # Roll back the invalid change
    
        transaction.rollback(using=alias1)
    
        # Check the outcome
    
        m1 = MyModel.objects.get(pk=17)
        assert m1.fld1 != 'invalid'

This will work in production, but fail in testing under t14415.diff.

Changed 4 years ago by russellm

Updated patch for test database issue

comment:7 Changed 4 years ago by russellm

@shai - Good call on the HOST issue - I've made that change; new patch incoming shortly.

However, regarding the connection issue - I'm unclear whether what you consider correct behavior, and under what circumstances.

The 'connection copying' behavior is only performed for the TEST_MIRROR case, and that behavior exists historically. To my mind, this is a bug; using connection obtained from the alias that is a TEST_MIRROR shouldn't affect transactions created using the connection that is being mirrored. The new patch addresses this bug.

However, you seem to be implying that the same connection copying behavior exists for connections with a copied settings dictionary, which isn't the case. The patch I've provided causes your sample code to raise an exception at "transaction.commit(using=alias2)", because the code isn't under transaction management (for alias2) at that point.

Have I misunderstood something here?

comment:8 Changed 4 years ago by shai

@russelm -- Sorry, I had misread your original patch; you are right, the connections seem to be handled correctly.

I'll look at it again tonight (Israel time), to comment on the change in TEST_MIRROR handling.

One note, though, about adding HOST: While it is, as far as I know, good enough for Django's officially supported DBMSs, it is not enough for SqlServer or Sybase, where a host includes a set of databases, each containing a set of "schemata". Not sure if preparing support for this is easy, and if not easy, if it is worthwhile, but I thought it's worth mentioning.

comment:9 Changed 4 years ago by shai

After a second look: Yes, the new TEST_MIRROR behavior seems to be correct and even required to preserve production semantics (interleaved operations on master and slave with no commit may yield different result in production and test with current code).

One other note, though: PORT needs to be treated just like HOST, orthogonal to the SqlServer issue mentioned in my earlier comment.

If there is an easy way to provide tests for these issues, I'd like to do that.

Thanks,

Shai.

comment:10 Changed 4 years ago by shai

Fixed the patch, ran some tests with it. It seems to work (though using the same parameters for the two databases makes many tests fail, nothing seems to blow up, and nothing is generated or destroyed more than once).

Changed 4 years ago by shai

just adding PORT to t14415.2.diff

comment:11 Changed 4 years ago by russellm

To clarify -- when you say "makes many tests fail", is that a good thing or a bad thing? Is this an indication that the test suite is now running and revealing problems in your code, or that the test setup is broken in some way?

comment:12 Changed 4 years ago by shai

I think it's a good thing -- essentially, the Django test suite assumes that different aliases point at different databases. I didn't check every failure in detail, but as an example, there was a test in multiple_database that failed because an object saved through one alias was accessible through the other.

comment:13 Changed 4 years ago by russellm

Django's test suite has a very specific set of requirements for the test database, and yes, independence between default and other is one of those requirements. If you're running Django's test suite with a mirrored or aliased database, there will be failures.

Thanks for the help testing this, Shai.

comment:14 Changed 4 years ago by russellm

  • Resolution set to fixed
  • Status changed from new to closed

(In [14696]) Fixed #14415 -- Corrected the process of creating and destroying test databases to prevent accidental deletion of the source database. Also improves the mirroring process to prevent some cross-connection effects. Thanks to Shai Berger for the help diagnosing and testing.

comment:15 Changed 4 years ago by russellm

(In [14697]) [1.2.X] Fixed #14415 -- Corrected the process of creating and destroying test databases to prevent accidental deletion of the source database. Also improves the mirroring process to prevent some cross-connection effects. Thanks to Shai Berger for the help diagnosing and testing.

Backport of r14696 from trunk.

comment:16 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.