Opened 7 years ago

Last modified 3 years ago

#27734 new Cleanup/optimization

Add a helpful error message when a parallel test worker is assigned an unexpected index

Reported by: Adam Wróbel Owned by: nobody
Component: Testing framework Version: 1.10
Severity: Normal Keywords: parallel
Cc: Chris Jerdonek Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've had this problem since upgrading to Python 3.6 on macOS Sierra – when I run my tests in parallel many of them blow up trying to use database with index that is higher than the number of parallel test processes. On my MacBook the default process number is 4 and tests fail trying to connect to test_my_db_5. On my iMac the default process number is 8 and test fail connecting to test_my_db_9. The same thing happens if I set the number of processes explicitly, e.g. --parallel=2. Higher number of processes yields less failures. If I explicitly set the number of processes to a number that is greater than the number of my test files then the tests succeed.

Traceback (most recent call last):
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 216, in __call__
    self._post_teardown()
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 908, in _post_teardown
    self._fixture_teardown()
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 943, in _fixture_teardown
    inhibit_post_migrate=inhibit_post_migrate)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/__init__.py", line 130, in call_command
    return command.execute(*args, **defaults)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/base.py", line 345, in execute
    output = self.handle(*args, **options)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/commands/flush.py", line 54, in handle
    allow_cascade=allow_cascade)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/sql.py", line 15, in sql_flush
    tables = connection.introspection.django_table_names(only_existing=True, include_views=False)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/introspection.py", line 88, in django_table_names
    existing_tables = self.table_names(include_views=include_views)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/introspection.py", line 55, in table_names
    with self.connection.cursor() as cursor:
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 233, in cursor
    cursor = self.make_cursor(self._cursor())
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 204, in _cursor
    self.ensure_connection()
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
    self.connect()
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
    self.connect()
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 171, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/psycopg2/__init__.py", line 164, in connect
    conn = _connect(dsn, connection_factory=connection_factory, async=async)
django.db.utils.OperationalError: FATAL:  database "test_my_db_5" does not exist

My requirements.txt file:

boto3==1.4.3
daemonize==2.4.7
Django==1.10.5
django-appconf==1.0.2
django-compressor==2.1
django-debug-toolbar==1.6
django-dotenv==1.4.1
django-extensions==1.7.5
django-ipware==1.1.6
djangorestframework==3.5.3
filemagic==1.6
inflection==0.3.1
jmespath==0.9.0
psycopg2==2.6.2
python-dateutil==2.6.0
rcssmin==1.0.6
rjsmin==1.0.12
s3transfer==0.1.10
six==1.10.0
sqlparse==0.2.2
uWSGI==2.0.14

Change History (12)

comment:1 by Tim Graham, 7 years ago

I haven't see that issue in my Python 3.6 testing. Could you investigate and tell us why Django is at fault?

comment:2 by Adam Wróbel, 7 years ago

I don't know if it's Django's fault, but I do know that it's Django that is creating test databases and configuring connection parameters. If you take a look on the backtrace you'll see that the only other Python package on it is psycopg2.

Parallel nature of this problem makes it hard to investigate with my debugger. The exception occurs in the child process, but I don't know where the incorrect information is generated. I was hoping for a hint for where to start the search.

comment:3 by Tim Graham, 7 years ago

The parallel test runner is in django/test/runner.py. Can you provide a small sample project (without any third-party dependencies) that reproduces the issue?

comment:4 by Tim Graham, 7 years ago

Resolution: needsinfo
Status: newclosed

comment:5 by David Breeden, 6 years ago

I'm having this problem. It's not Django's fault, although there could be better error reporting.

--parallel creates N databases for the N workers to use. Each worker claims the database it will use by incrementing a shared counter. When the multiprocessing.Pool is initialized with N processes, the workers are assigned databases 1 through N.

The problem happens when a worker exits. Normally, the workers in a pool will stay alive for the whole life of the pool. In my case, workers are exiting because of seg faults. When this happens, the pool repopulates with new workers. When the first new worker is initialized, it increments the counter to N + 1 and tries to use that database, and this error results.

Only some of my tests cause the seg fault, so if I run --parallel with other tests, it works fine.

It would have saved some time diagnosing the issue if Django raised an error when a worker incremented the counter over N, which seems like a reasonable thing for the initializer to do.

comment:6 by Aymeric Augustin, 6 years ago

Resolution: needsinfo
Status: closednew
Triage Stage: UnreviewedAccepted

I agree it would be nice to have a better error message in this case.

comment:7 by Tim Graham, 6 years ago

Summary: Parallel tests try to use database with too high indicesAdd a helpful error message when a parallel test worker is assigned an unexpected index
Type: BugCleanup/optimization

comment:8 by Adam Wróbel, 6 years ago

I don't think my issue was with segfaulting processes, but it's certain that something caused the process to be restarted. Better error message might be nice to have, but won't be helpful unless it helps determining what caused the process segfault/restart.

Maybe there would be a way to reuse a connection of a dead process?

comment:9 by Hameed Gifford, 4 years ago

To anyone else who's still having this problem, it seems to be specific to OSX. Or at least this reason and fix is.

https://stackoverflow.com/questions/50168647/multiprocessing-causes-python-to-crash-and-gives-an-error-may-have-been-in-progr

Putting this somewhere in your code before the test runner is initialized (settings worked good for me) fixes the issue:

import os
import platform
import sys

if 'test' in sys.argv and platform.system().lower() == 'darwin':
    os.environ.setdefault('OBJC_DISABLE_INITIALIZE_FORK_SAFETY', 'YES')

comment:10 by Mariusz Felisiak, 3 years ago

#32769 was closed as a duplicate, see comment.

comment:11 by Chris Jerdonek, 3 years ago

Cc: Chris Jerdonek added

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In cb6c197:

Refs #27734 -- Prevented creation of more parallel workers than TestCases.

The parallel test runner uses multiple workers to distribute the
workload. These workers are assigned a worker ID using a globally
incremented variable, which determines what test database to connect
to. When the worker ID surpasses the test database IDs Django will
crash.

This reduce likelihood of crashing parallel tests because
ParallelTestSuite will no longer create more workers than TestCases.

It won't eliminate the problem completely though because there are
other circumstances in which new workers can be created which can then
be assigned an "illegal" worker ID.

Note: See TracTickets for help on using tickets.
Back to Top