Opened 5 weeks ago
Last modified 3 weeks ago
#35967 assigned Bug
TransactionTestCase.serialized_rollback reads from real database rather than test when using read replica for a model instance created in a migration with a ManyToManyField
Reported by: | Jake Howard | Owned by: | Simon Charette |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Ryan Cheley, Jacob Walls, Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
(Yes, this is a rather specific bug...)
When:
- Using a database router to create a read-replica database, configured as a
MIRROR
in tests, and - Using
TransactionTestCase.serialized_rollback
, and - Having a model instance created in a migration which has a
ManyToMany
field
The serializer for serialized_rollback
tries to read from the non-test database. If that database doesn't exist yet (for example, in CI), this throws an error:
django.db.utils.OperationalError: no such table: auth_user
If migrations are run (manage.py migrate
), thus creating the tables for the non-test database, tests pass correctly. Prooving it's reading from the wrong connection.
I've created a minimal reproduction of this issue, and confirmed it happens on SQLite, PostgreSQL and Django 4.2, 5.0, 5.1 and main
Change History (11)
comment:1 by , 5 weeks ago
Cc: | added |
---|
comment:2 by , 5 weeks ago
Cc: | added |
---|
comment:3 by , 5 weeks ago
Component: | Database layer (models, ORM) → Core (Serialization) |
---|
comment:4 by , 5 weeks ago
Owner: | set to |
---|---|
Status: | new → assigned |
Triage Stage: | Unreviewed → Accepted |
I'm looking into something a little more backwards compatible.
comment:5 by , 5 weeks ago
If someone more knowledgable can opine on correctness and backwards compatibility, I'm happy to push this forward.
Hey Jacob I admittedly haven't looked into the issue in depth but I think that forcing the usage of the current alias like you did here is a certainly a key towards the solution.
I have a hunch that the actual problems lives in djang.test.utils.setup_databases
though and particularly how BaseDatabaseCreation.create_test_db
is implemented. setup_databases
currently follows this sequence of operations for each DATABASES
entries
- Create the test db replacement
- Repoint
settings.DATABASES[alias][name]
to the test db replacement - Peform migrations
- Serialize content
I think that instead what it should do is 1, 2, 3 for each DATABASES
entries (making sure that all the test databases are setup) and then do 4 for each of them. Something like
-
django/db/backends/base/creation.py
diff --git a/django/db/backends/base/creation.py b/django/db/backends/base/creation.py index 6856fdb596..7a0e2a0622 100644
a b def create_test_db( 91 91 # who are testing on databases without transactions or who are using 92 92 # a TransactionTestCase still get a clean database on every test run. 93 93 if serialize: 94 # XXX: Emit a deprecation warnings when `serialize` is provided. 94 95 self.connection._test_serialized_contents = self.serialize_db_to_string() 95 96 96 97 call_command("createcachetable", database=self.connection.alias) -
django/test/utils.py
diff --git a/django/test/utils.py b/django/test/utils.py index ddb85127dc..a2fe8b14cc 100644
a b def setup_databases( 189 189 test_databases, mirrored_aliases = get_unique_databases_and_mirrors(aliases) 190 190 191 191 old_names = [] 192 serialize_connections = [] 192 193 193 194 for db_name, aliases in test_databases.values(): 194 195 first_alias = None … … def setup_databases( 200 201 if first_alias is None: 201 202 first_alias = alias 202 203 with time_keeper.timed(" Creating '%s'" % alias): 203 serialize_alias = (204 serialized_aliases is None or alias in serialized_aliases205 )206 204 connection.creation.create_test_db( 207 205 verbosity=verbosity, 208 206 autoclobber=not interactive, 209 207 keepdb=keepdb, 210 serialize=serialize_alias,211 208 ) 209 if serialized_aliases is None or alias in serialized_aliases: 210 serialize_connections.append(connection) 212 211 if parallel > 1: 213 212 for index in range(parallel): 214 213 with time_keeper.timed(" Cloning '%s'" % alias): … … def setup_databases( 223 222 connections[first_alias].settings_dict 224 223 ) 225 224 225 # Serialize content of test databases only once all of them are setup 226 # to account for database routing during serialization. 227 for serialize_connection in serialize_connections: 228 serialize_connection._test_serialized_contents = ( 229 serialize_connection.serialize_db_to_string() 230 ) 231 226 232 # Configure the test mirrors. 227 233 for alias, mirror_alias in mirrored_aliases.items(): 228 234 connections[alias].creation.set_as_test_mirror(
comment:6 by , 5 weeks ago
Cc: | added |
---|
comment:7 by , 4 weeks ago
Thanks for the idea, I had a look. Turns out minimally adjusting comment:5 to run (by adding .creation
in one of a couple places) doesn't fix OP's reproduction. It also doesn't pass the unit test I had written to go with comment:3 below, but that is to be expected since the test has nothing to do with setup_databases()
. (It does pass with comment:3.)
-
tests/backends/base/test_creation.py
diff --git a/tests/backends/base/test_creation.py b/tests/backends/base/test_creation.py index 7e760e8884..84eb3f4a5f 100644
a b class TestDbCreationTests(SimpleTestCase): 172 172 173 173 class TestDeserializeDbFromString(TransactionTestCase): 174 174 available_apps = ["backends"] 175 databases = {"default", "other"} 175 176 176 177 def test_circular_reference(self): 177 178 # deserialize_db_from_string() handles circular references. … … class TestDeserializeDbFromString(TransactionTestCase): 267 268 self.assertIn('"model": "backends.schoolclass"', data) 268 269 self.assertIn(f'"schoolclasses": [{sclass.pk}]', data) 269 270 271 def test_serialize_db_to_string_with_m2m_field_and_router(self): 272 class OtherRouter: 273 def db_for_read(self, model, **hints): 274 return "other" 275 276 with override_settings(DATABASE_ROUTERS=[OtherRouter()]): 277 obj1 = Object.objects.create() 278 obj2 = Object.objects.create() 279 obj2.related_objects.set([obj1]) 280 with mock.patch("django.db.migrations.loader.MigrationLoader") as loader: 281 # serialize_db_to_string() serializes only migrated apps, so mark 282 # the backends app as migrated. 283 loader_instance = loader.return_value 284 loader_instance.migrated_apps = {"backends"} 285 data = connection.creation.serialize_db_to_string() 286 self.assertIn(f'"related_objects": [{obj1.pk}]', data) 287 # Test serialize() directly, in all four cases (json/xml, natural key/without) 288 270 289 271 290 class SkipTestClass: 272 291 def skip_function(self):
My plan was to repeat these tests with plain calls to serialize()
in tests/serializers and test all four cases (json vs. xml, natural key vs. without). If this test is fair, wouldn't this be an issue with the serializers? Maybe I'm missing a reason this test isn't realistic.
comment:8 by , 4 weeks ago
Sorry for sending on a wild goose chase Jacob, the patch above was meant to be a draft that needs tweaking and not a final solution.
Once adjusted like the following it addresses the problem reported by Jake
-
django/test/utils.py
diff --git a/django/test/utils.py b/django/test/utils.py index ddb85127dc..7d66140efa 100644
a b def setup_databases( 189 189 test_databases, mirrored_aliases = get_unique_databases_and_mirrors(aliases) 190 190 191 191 old_names = [] 192 serialize_connections = [] 192 193 193 194 for db_name, aliases in test_databases.values(): 194 195 first_alias = None … … def setup_databases( 200 201 if first_alias is None: 201 202 first_alias = alias 202 203 with time_keeper.timed(" Creating '%s'" % alias): 203 serialize_alias = (204 serialized_aliases is None or alias in serialized_aliases205 )206 204 connection.creation.create_test_db( 207 205 verbosity=verbosity, 208 206 autoclobber=not interactive, 209 207 keepdb=keepdb, 210 serialize= serialize_alias,208 serialize=False, 211 209 ) 210 if serialized_aliases is None or alias in serialized_aliases: 211 serialize_connections.append(connection) 212 212 if parallel > 1: 213 213 for index in range(parallel): 214 214 with time_keeper.timed(" Cloning '%s'" % alias): … … def setup_databases( 229 229 connections[mirror_alias].settings_dict 230 230 ) 231 231 232 # Serialize content of test databases only once all of them are setup 233 # to account for database mirroring and routing during serialization. 234 for serialize_connection in serialize_connections: 235 serialize_connection._test_serialized_contents = ( 236 serialize_connection.creation.serialize_db_to_string() 237 ) 238 232 239 if debug_sql: 233 240 for alias in connections: 234 241 connections[alias].force_debug_cursor = True
The above include three tweaks that were not included in comment:5
- It explicitly pass
serialize=False
tocreate_test_db
as we want to defer this operation to a time when all connections are repointed to test databases - Use
.creation.serialize_db_to_string
as you've noticed - Make sure to perform serialization only once mirrors have been appropriately setup
I want to re-iterate that what appears to be the problem here, at least to me, is less about the routing of queries during serialization and more that we attempt to perform any form of reads against non-test databases before DATABASES
entries are all re-pointed to their test equivalent.
If this test is fair, wouldn't this be an issue with the serializers? Maybe I'm missing a reason this test isn't realistic.
This is something that is effectively hard to test as it relates to the nature of the test suite bootstraping sequence but we do have a few examples in tests.test_runner
where the reported scenario could be reproduced.
comment:9 by , 4 weeks ago
Component: | Core (Serialization) → Testing framework |
---|
Perfect, thanks for clarifying.
comment:10 by , 3 weeks ago
Has patch: | set |
---|
Jacob & Jake, I submitted this PR for review and I'd like to have your thoughts on it. I didn't manage to create an exact integration test as it would require creating a nested test database and ensure that no database queries can be performed against the testing database but it seems the chosen approach in there so far has been to rely heavily on mocking.
comment:11 by , 3 weeks ago
Owner: | changed from | to
---|
Thanks for the crystal clear reproduction. I haven't tested to be sure, but it certainly looks a lot like #23979.
This hack passes your test case, but I'm not going to opine on whether it's correct; I just put this together as a learning exercise and to verify my hunch that this was more to do with the serializer and less to do with the testing framework. If someone more knowledgable can opine on correctness and backwards compatibility, I'm happy to push this forward.
django/core/serializers/base.py
)):django/core/serializers/python.py
):django/db/backends/base/creation.py