Opened 5 weeks ago

Last modified 3 weeks ago

#35967 assigned Bug

TransactionTestCase.serialized_rollback reads from real database rather than test when using read replica for a model instance created in a migration with a ManyToManyField

Reported by: Jake Howard Owned by: Simon Charette
Component: Testing framework Version: dev
Severity: Normal Keywords:
Cc: Ryan Cheley, Jacob Walls, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

(Yes, this is a rather specific bug...)

When:

  1. Using a database router to create a read-replica database, configured as a MIRROR in tests, and
  2. Using TransactionTestCase.serialized_rollback, and
  3. Having a model instance created in a migration which has a ManyToMany field

The serializer for serialized_rollback tries to read from the non-test database. If that database doesn't exist yet (for example, in CI), this throws an error:

django.db.utils.OperationalError: no such table: auth_user

If migrations are run (manage.py migrate), thus creating the tables for the non-test database, tests pass correctly. Prooving it's reading from the wrong connection.

I've created a minimal reproduction of this issue, and confirmed it happens on SQLite, PostgreSQL and Django 4.2, 5.0, 5.1 and main

Change History (11)

comment:1 by Ryan Cheley, 5 weeks ago

Cc: Ryan Cheley added

comment:2 by Sarah Boyce, 5 weeks ago

Cc: Jacob Walls added

comment:3 by Jacob Walls, 5 weeks ago

Component: Database layer (models, ORM)Core (Serialization)

Thanks for the crystal clear reproduction. I haven't tested to be sure, but it certainly looks a lot like #23979.

This hack passes your test case, but I'm not going to opine on whether it's correct; I just put this together as a learning exercise and to verify my hunch that this was more to do with the serializer and less to do with the testing framework. If someone more knowledgable can opine on correctness and backwards compatibility, I'm happy to push this forward.

  • django/core/serializers/base.py

    diff --git a/django/core/serializers/base.py b/django/core/serializers/base.py
    index 1fbca9244b..f2f0d4d8d6 100644
    a b Module for abstract serializer/unserializer base classes.  
    55from io import StringIO
    66
    77from django.core.exceptions import ObjectDoesNotExist
    8 from django.db import models
     8from django.db import models, DEFAULT_DB_ALIAS
    99
    1010DEFER_FIELD = object()
    1111
    class Serializer:  
    9191        use_natural_primary_keys=False,
    9292        progress_output=None,
    9393        object_count=0,
     94        using=DEFAULT_DB_ALIAS,
    9495        **options,
    9596    ):
    9697        """
    class Serializer:  
    141142                        self.selected_fields is None
    142143                        or field.attname in self.selected_fields
    143144                    ):
    144                         self.handle_m2m_field(obj, field)
     145                        self.handle_m2m_field(obj, field, using=using)
    145146            self.end_object(obj)
    146147            progress_bar.update(count)
    147148            self.first = self.first and False
    class Serializer:  
    192193            "subclasses of Serializer must provide a handle_fk_field() method"
    193194        )
    194195
    195     def handle_m2m_field(self, obj, field):
     196    def handle_m2m_field(self, obj, field, *, using=DEFAULT_DB_ALIAS):
    196197        """
    197198        Called to handle a ManyToManyField.
    198199        """
  • django/core/serializers/python.py

    diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py
    index 57edebbb70..93898d3801 100644
    a b class Serializer(base.Serializer):  
    6464            value = self._value_from_field(obj, field)
    6565        self._current[field.name] = value
    6666
    67     def handle_m2m_field(self, obj, field):
     67    def handle_m2m_field(self, obj, field, *, using=DEFAULT_DB_ALIAS):
    6868        if field.remote_field.through._meta.auto_created:
    6969            if self.use_natural_foreign_keys and hasattr(
    7070                field.remote_field.model, "natural_key"
    class Serializer(base.Serializer):  
    8686                        getattr(obj, field.name)
    8787                        .select_related(None)
    8888                        .only("pk")
     89                        .using(using)
    8990                        .iterator()
    9091                    )
    9192
  • django/db/backends/base/creation.py

    diff --git a/django/db/backends/base/creation.py b/django/db/backends/base/creation.py
    index 6856fdb596..8adfa0d7ca 100644
    a b class BaseDatabaseCreation:  
    142142
    143143        # Serialize to a string
    144144        out = StringIO()
    145         serializers.serialize("json", get_objects(), indent=None, stream=out)
     145        serializers.serialize(
     146            "json", get_objects(), indent=None, stream=out, using=self.connection.alias
     147        )
    146148        return out.getvalue()
    147149
    148150    def deserialize_db_from_string(self, data):

comment:4 by Jacob Walls, 5 weeks ago

Owner: set to Jacob Walls
Status: newassigned
Triage Stage: UnreviewedAccepted

I'm looking into something a little more backwards compatible.

comment:5 by Simon Charette, 5 weeks ago

If someone more knowledgable can opine on correctness and backwards compatibility, I'm happy to push this forward.

Hey Jacob I admittedly haven't looked into the issue in depth but I think that forcing the usage of the current alias like you did here is a certainly a key towards the solution.

I have a hunch that the actual problems lives in djang.test.utils.setup_databases though and particularly how BaseDatabaseCreation.create_test_db is implemented. setup_databases currently follows this sequence of operations for each DATABASES entries

  1. Create the test db replacement
  2. Repoint settings.DATABASES[alias][name] to the test db replacement
  3. Peform migrations
  4. Serialize content

I think that instead what it should do is 1, 2, 3 for each DATABASES entries (making sure that all the test databases are setup) and then do 4 for each of them. Something like

  • django/db/backends/base/creation.py

    diff --git a/django/db/backends/base/creation.py b/django/db/backends/base/creation.py
    index 6856fdb596..7a0e2a0622 100644
    a b def create_test_db(  
    9191        # who are testing on databases without transactions or who are using
    9292        # a TransactionTestCase still get a clean database on every test run.
    9393        if serialize:
     94            # XXX: Emit a deprecation warnings when `serialize` is provided.
    9495            self.connection._test_serialized_contents = self.serialize_db_to_string()
    9596
    9697        call_command("createcachetable", database=self.connection.alias)
  • django/test/utils.py

    diff --git a/django/test/utils.py b/django/test/utils.py
    index ddb85127dc..a2fe8b14cc 100644
    a b def setup_databases(  
    189189    test_databases, mirrored_aliases = get_unique_databases_and_mirrors(aliases)
    190190
    191191    old_names = []
     192    serialize_connections = []
    192193
    193194    for db_name, aliases in test_databases.values():
    194195        first_alias = None
    def setup_databases(  
    200201            if first_alias is None:
    201202                first_alias = alias
    202203                with time_keeper.timed("  Creating '%s'" % alias):
    203                     serialize_alias = (
    204                         serialized_aliases is None or alias in serialized_aliases
    205                     )
    206204                    connection.creation.create_test_db(
    207205                        verbosity=verbosity,
    208206                        autoclobber=not interactive,
    209207                        keepdb=keepdb,
    210                         serialize=serialize_alias,
    211208                    )
     209                    if serialized_aliases is None or alias in serialized_aliases:
     210                        serialize_connections.append(connection)
    212211                if parallel > 1:
    213212                    for index in range(parallel):
    214213                        with time_keeper.timed("  Cloning '%s'" % alias):
    def setup_databases(  
    223222                    connections[first_alias].settings_dict
    224223                )
    225224
     225    # Serialize content of test databases only once all of them are setup
     226    # to account for database routing during serialization.
     227    for serialize_connection in serialize_connections:
     228        serialize_connection._test_serialized_contents = (
     229            serialize_connection.serialize_db_to_string()
     230        )
     231
    226232    # Configure the test mirrors.
    227233    for alias, mirror_alias in mirrored_aliases.items():
    228234        connections[alias].creation.set_as_test_mirror(

comment:6 by Simon Charette, 5 weeks ago

Cc: Simon Charette added

comment:7 by Jacob Walls, 4 weeks ago

Thanks for the idea, I had a look. Turns out minimally adjusting comment:5 to run (by adding .creation in one of a couple places) doesn't fix OP's reproduction. It also doesn't pass the unit test I had written to go with comment:3 below, but that is to be expected since the test has nothing to do with setup_databases(). (It does pass with comment:3.)

  • tests/backends/base/test_creation.py

    diff --git a/tests/backends/base/test_creation.py b/tests/backends/base/test_creation.py
    index 7e760e8884..84eb3f4a5f 100644
    a b class TestDbCreationTests(SimpleTestCase):  
    172172
    173173class TestDeserializeDbFromString(TransactionTestCase):
    174174    available_apps = ["backends"]
     175    databases = {"default", "other"}
    175176
    176177    def test_circular_reference(self):
    177178        # deserialize_db_from_string() handles circular references.
    class TestDeserializeDbFromString(TransactionTestCase):  
    267268        self.assertIn('"model": "backends.schoolclass"', data)
    268269        self.assertIn(f'"schoolclasses": [{sclass.pk}]', data)
    269270
     271    def test_serialize_db_to_string_with_m2m_field_and_router(self):
     272        class OtherRouter:
     273            def db_for_read(self, model, **hints):
     274                return "other"
     275
     276        with override_settings(DATABASE_ROUTERS=[OtherRouter()]):
     277            obj1 = Object.objects.create()
     278            obj2 = Object.objects.create()
     279            obj2.related_objects.set([obj1])
     280            with mock.patch("django.db.migrations.loader.MigrationLoader") as loader:
     281                # serialize_db_to_string() serializes only migrated apps, so mark
     282                # the backends app as migrated.
     283                loader_instance = loader.return_value
     284                loader_instance.migrated_apps = {"backends"}
     285                data = connection.creation.serialize_db_to_string()
     286        self.assertIn(f'"related_objects": [{obj1.pk}]', data)
     287        # Test serialize() directly, in all four cases (json/xml, natural key/without)
     288
    270289
    271290class SkipTestClass:
    272291    def skip_function(self):

My plan was to repeat these tests with plain calls to serialize() in tests/serializers and test all four cases (json vs. xml, natural key vs. without). If this test is fair, wouldn't this be an issue with the serializers? Maybe I'm missing a reason this test isn't realistic.

comment:8 by Simon Charette, 4 weeks ago

Sorry for sending on a wild goose chase Jacob, the patch above was meant to be a draft that needs tweaking and not a final solution.

Once adjusted like the following it addresses the problem reported by Jake

  • django/test/utils.py

    diff --git a/django/test/utils.py b/django/test/utils.py
    index ddb85127dc..7d66140efa 100644
    a b def setup_databases(  
    189189    test_databases, mirrored_aliases = get_unique_databases_and_mirrors(aliases)
    190190
    191191    old_names = []
     192    serialize_connections = []
    192193
    193194    for db_name, aliases in test_databases.values():
    194195        first_alias = None
    def setup_databases(  
    200201            if first_alias is None:
    201202                first_alias = alias
    202203                with time_keeper.timed("  Creating '%s'" % alias):
    203                     serialize_alias = (
    204                         serialized_aliases is None or alias in serialized_aliases
    205                     )
    206204                    connection.creation.create_test_db(
    207205                        verbosity=verbosity,
    208206                        autoclobber=not interactive,
    209207                        keepdb=keepdb,
    210                         serialize=serialize_alias,
     208                        serialize=False,
    211209                    )
     210                    if serialized_aliases is None or alias in serialized_aliases:
     211                        serialize_connections.append(connection)
    212212                if parallel > 1:
    213213                    for index in range(parallel):
    214214                        with time_keeper.timed("  Cloning '%s'" % alias):
    def setup_databases(  
    229229            connections[mirror_alias].settings_dict
    230230        )
    231231
     232    # Serialize content of test databases only once all of them are setup
     233    # to account for database mirroring and routing during serialization.
     234    for serialize_connection in serialize_connections:
     235        serialize_connection._test_serialized_contents = (
     236            serialize_connection.creation.serialize_db_to_string()
     237        )
     238
    232239    if debug_sql:
    233240        for alias in connections:
    234241            connections[alias].force_debug_cursor = True

The above include three tweaks that were not included in comment:5

  1. It explicitly pass serialize=False to create_test_db as we want to defer this operation to a time when all connections are repointed to test databases
  2. Use .creation.serialize_db_to_string as you've noticed
  3. Make sure to perform serialization only once mirrors have been appropriately setup

I want to re-iterate that what appears to be the problem here, at least to me, is less about the routing of queries during serialization and more that we attempt to perform any form of reads against non-test databases before DATABASES entries are all re-pointed to their test equivalent.

If this test is fair, wouldn't this be an issue with the serializers? Maybe I'm missing a reason this test isn't realistic.

This is something that is effectively hard to test as it relates to the nature of the test suite bootstraping sequence but we do have a few examples in tests.test_runner where the reported scenario could be reproduced.

Last edited 4 weeks ago by Simon Charette (previous) (diff)

comment:9 by Jacob Walls, 4 weeks ago

Component: Core (Serialization)Testing framework

Perfect, thanks for clarifying.

comment:10 by Simon Charette, 3 weeks ago

Has patch: set

Jacob & Jake, I submitted this PR for review and I'd like to have your thoughts on it. I didn't manage to create an exact integration test as it would require creating a nested test database and ensure that no database queries can be performed against the testing database but it seems the chosen approach in there so far has been to rely heavily on mocking.

comment:11 by Jacob Walls, 3 weeks ago

Owner: changed from Jacob Walls to Simon Charette
Note: See TracTickets for help on using tickets.
Back to Top