Opened 9 years ago
Last modified 4 years ago
#27734 new Cleanup/optimization
Add a helpful error message when a parallel test worker is assigned an unexpected index
| Reported by: | Adam Wróbel | Owned by: | nobody |
|---|---|---|---|
| Component: | Testing framework | Version: | 1.10 |
| Severity: | Normal | Keywords: | parallel |
| Cc: | Chris Jerdonek | Triage Stage: | Accepted |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
I've had this problem since upgrading to Python 3.6 on macOS Sierra – when I run my tests in parallel many of them blow up trying to use database with index that is higher than the number of parallel test processes. On my MacBook the default process number is 4 and tests fail trying to connect to test_my_db_5. On my iMac the default process number is 8 and test fail connecting to test_my_db_9. The same thing happens if I set the number of processes explicitly, e.g. --parallel=2. Higher number of processes yields less failures. If I explicitly set the number of processes to a number that is greater than the number of my test files then the tests succeed.
Traceback (most recent call last):
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 216, in __call__
self._post_teardown()
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 908, in _post_teardown
self._fixture_teardown()
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 943, in _fixture_teardown
inhibit_post_migrate=inhibit_post_migrate)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/__init__.py", line 130, in call_command
return command.execute(*args, **defaults)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/commands/flush.py", line 54, in handle
allow_cascade=allow_cascade)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/sql.py", line 15, in sql_flush
tables = connection.introspection.django_table_names(only_existing=True, include_views=False)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/introspection.py", line 88, in django_table_names
existing_tables = self.table_names(include_views=include_views)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/introspection.py", line 55, in table_names
with self.connection.cursor() as cursor:
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 233, in cursor
cursor = self.make_cursor(self._cursor())
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 204, in _cursor
self.ensure_connection()
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 171, in connect
self.connection = self.get_new_connection(conn_params)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
connection = Database.connect(**conn_params)
File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/psycopg2/__init__.py", line 164, in connect
conn = _connect(dsn, connection_factory=connection_factory, async=async)
django.db.utils.OperationalError: FATAL: database "test_my_db_5" does not exist
My requirements.txt file:
boto3==1.4.3 daemonize==2.4.7 Django==1.10.5 django-appconf==1.0.2 django-compressor==2.1 django-debug-toolbar==1.6 django-dotenv==1.4.1 django-extensions==1.7.5 django-ipware==1.1.6 djangorestframework==3.5.3 filemagic==1.6 inflection==0.3.1 jmespath==0.9.0 psycopg2==2.6.2 python-dateutil==2.6.0 rcssmin==1.0.6 rjsmin==1.0.12 s3transfer==0.1.10 six==1.10.0 sqlparse==0.2.2 uWSGI==2.0.14
Change History (12)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
I don't know if it's Django's fault, but I do know that it's Django that is creating test databases and configuring connection parameters. If you take a look on the backtrace you'll see that the only other Python package on it is psycopg2.
Parallel nature of this problem makes it hard to investigate with my debugger. The exception occurs in the child process, but I don't know where the incorrect information is generated. I was hoping for a hint for where to start the search.
comment:3 by , 9 years ago
The parallel test runner is in django/test/runner.py. Can you provide a small sample project (without any third-party dependencies) that reproduces the issue?
comment:4 by , 9 years ago
| Resolution: | → needsinfo |
|---|---|
| Status: | new → closed |
comment:5 by , 8 years ago
I'm having this problem. It's not Django's fault, although there could be better error reporting.
--parallel creates N databases for the N workers to use. Each worker claims the database it will use by incrementing a shared counter. When the multiprocessing.Pool is initialized with N processes, the workers are assigned databases 1 through N.
The problem happens when a worker exits. Normally, the workers in a pool will stay alive for the whole life of the pool. In my case, workers are exiting because of seg faults. When this happens, the pool repopulates with new workers. When the first new worker is initialized, it increments the counter to N + 1 and tries to use that database, and this error results.
Only some of my tests cause the seg fault, so if I run --parallel with other tests, it works fine.
It would have saved some time diagnosing the issue if Django raised an error when a worker incremented the counter over N, which seems like a reasonable thing for the initializer to do.
comment:6 by , 8 years ago
| Resolution: | needsinfo |
|---|---|
| Status: | closed → new |
| Triage Stage: | Unreviewed → Accepted |
I agree it would be nice to have a better error message in this case.
comment:7 by , 8 years ago
| Summary: | Parallel tests try to use database with too high indices → Add a helpful error message when a parallel test worker is assigned an unexpected index |
|---|---|
| Type: | Bug → Cleanup/optimization |
comment:8 by , 8 years ago
I don't think my issue was with segfaulting processes, but it's certain that something caused the process to be restarted. Better error message might be nice to have, but won't be helpful unless it helps determining what caused the process segfault/restart.
Maybe there would be a way to reuse a connection of a dead process?
comment:9 by , 6 years ago
To anyone else who's still having this problem, it seems to be specific to OSX. Or at least this reason and fix is.
Putting this somewhere in your code before the test runner is initialized (settings worked good for me) fixes the issue:
import os
import platform
import sys
if 'test' in sys.argv and platform.system().lower() == 'darwin':
os.environ.setdefault('OBJC_DISABLE_INITIALIZE_FORK_SAFETY', 'YES')
comment:11 by , 4 years ago
| Cc: | added |
|---|
I haven't see that issue in my Python 3.6 testing. Could you investigate and tell us why Django is at fault?