#34009 closed Uncategorized (invalid)
migrations.RunPython runs queries against wrong database
| Reported by: | Marcel Moreaux | Owned by: | nobody | 
|---|---|---|---|
| Component: | Uncategorized | Version: | 4.1 | 
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description
In an application with multiple databases, RunPython in migrations appears to always run queries on the default database, instead of the currently selected database, resulting in database errors. This can be observed with manage.py migrate --database, but the problem is also triggered by manage.py test:
(venv) [marcelm@carbon foo .]% ./manage.py test
Found 1 test(s).
Creating test database for alias 'default'...
Creating test database for alias 'extra'...
Traceback (most recent call last):
  File "/home/marcelm/meh/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/home/marcelm/meh/venv/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 357, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: table bar_bar has no column named name
The relevant migration:
from django.db import migrations
# BUG: This always uses the default database, even when the migrations are
# run to set up the extra database.
def add_bar(apps, schema_editor):
    print('RunPython starts here')
    Bar = apps.get_model('bar', 'Bar')
    Bar.objects.create(name='hello')
    print('RunPython ends here')
class Migration(migrations.Migration):
    dependencies = [ ('bar', '0001_initial'), ]
    operations = [
        migrations.RunPython(add_bar, reverse_code=migrations.RunPython.noop),
    ]
(migration 0001 adds the name field)
More debug info
Adding debug code to print the DB queries and the database on which they're run confirms this. Selected output:
% QDEBUG=1 ./manage.py test -v3
Creating test database for alias 'default' ('file:memorydb_default?mode=memory&cache=shared')...
Found 1 test(s).
CONNECTING TO {'database': 'file:memorydb_default?mode=memory&cache=shared', 'detect_types': 3, 'check_same_thread': False, 'uri': True}
QUERY AGAINST file:memorydb_default?mode=memory&cache=shared: 
            SELECT name, type FROM sqlite_master
            WHERE type in ('table', 'view') AND NOT name='sqlite_sequence'
            ORDER BY name
[...]
Creating test database for alias 'extra' ('file:memorydb_extra?mode=memory&cache=shared')...
CONNECTING TO {'database': 'file:memorydb_extra?mode=memory&cache=shared', 'detect_types': 3, 'check_same_thread': False, 'uri': True}
QUERY AGAINST file:memorydb_extra?mode=memory&cache=shared: 
            SELECT name, type FROM sqlite_master
            WHERE type in ('table', 'view') AND NOT name='sqlite_sequence'
            ORDER BY name
[...]
  Applying bar.0002_add_bar...QUERY AGAINST file:memorydb_extra?mode=memory&cache=shared: PRAGMA foreign_keys = OFF
QUERY AGAINST file:memorydb_extra?mode=memory&cache=shared: PRAGMA foreign_keys
QUERY AGAINST file:memorydb_extra?mode=memory&cache=shared: BEGIN
RunPython starts here
QUERY AGAINST file:memorydb_default?mode=memory&cache=shared: INSERT INTO "bar_bar" ("name") VALUES (%s)
QUERY AGAINST file:memorydb_extra?mode=memory&cache=shared: PRAGMA foreign_key_check
QUERY AGAINST file:memorydb_extra?mode=memory&cache=shared: PRAGMA foreign_keys = ON
Traceback (most recent call last):
  File "/home/marcelm/meh/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/home/marcelm/meh/foo/foo/settings.py", line 70, in wrapped
    return func(self, query, params)
  File "/home/marcelm/meh/venv/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 357, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: table bar_bar has no column named name
(note QUERY AGAINST file:memorydb_default just after RunPython starts here)
Similar problems can be observed by running manage.py migrate --database extra.
Reproducing
I attached the full output of both QDEBUG=1 manage.py test -v3 and QDEBUG=1 manage.py migrate --database extra. Also attached is a tar archive of a minimal project that reproduces the bug. I tested on Django 3.2.15, 4.1.1, and the main branch on github.
To reproduce, extract the minimal reproducible example somewhere appropriate and follow these steps:
tar xvzf foo.tgz virtualenv venv . venv/bin/activate pip install Django cd foo QDEBUG=1 ./manage.py test QDEBUG=1 ./manage.py migrate --database=extra
QDEBUG triggers some monkey patching in settings.py to print database queries. Queries on default are printed in magenta, queries on extra are printed in cyan, and all other queries are printed in yellow. This makes the problematic queries easy to spot.
Attachments (3)
Change History (8)
by , 3 years ago
| Attachment: | test-full-output.txt added | 
|---|
by , 3 years ago
| Attachment: | migrate-full-output.txt added | 
|---|
by , 3 years ago
comment:1 by , 3 years ago
comment:2 by , 3 years ago
| Resolution: | → invalid | 
|---|---|
| Status: | new → closed | 
comment:3 by , 3 years ago
| Resolution: | invalid | 
|---|---|
| Status: | closed → new | 
Hi Alex, thanks for you feedback!
That's not how I'm reading the documentation though. What it says there is that if you want your migrations to only run on a specific database, you need to perform that check.
That's not what's happening here; all migrations are correctly run against the extra database, except the RunPython migration, which is run against default. The documentation you link does not imply that RunPython migrations must select the database themselves, only that they may check the selected database.
In fact, the documentation for database routers (https://docs.djangoproject.com/en/4.1/topics/db/multi-db/#automatic-database-routing) explicitly mentions that the Django's default database router ensures 'stickyness' to objects' original database, which seems violated here.
And finally, how would the migrations even set this? When running the migrations in the context of the default database, the database should be default, and when the migrations are run to set up the extra database, the database should be extra. Hardcoding either database in the migration won't work.
I maintain this is a bug, so I'm reopening this ticket.
comment:4 by , 3 years ago
| Resolution: | → invalid | 
|---|---|
| Status: | new → closed | 
I maintain this is a bug, so I'm reopening this ticket.
It's not a bug but a documented behavior:
"RunPython does not magically alter the connection of the models for you; any model methods you call will go to the default database unless you give them the current database alias (available from schema_editor.connection.alias, where schema_editor is the second argument to your function)."
comment:5 by , 3 years ago
Ah, I see, I missed that.
That's still very unfortunate behaviour though; it means RunPython methods should be littered with explicit .using() clauses to ensure they actually use the DB specified in schema_editor.connection.alias. Miss one, and it may inadvertently wreak havoc on the default database... (as was happening in our project)
A quick search turned up this bit of the docs: https://docs.djangoproject.com/en/4.1/howto/writing-migrations/#data-migrations-and-multiple-databases
So, IIUC, you're supposed to select the database yourself in a RunPython operation. Or maybe I've missed something entirely, in which case, I apologise :)