Opened 9 months ago
Closed 9 months ago
#35186 closed Bug (duplicate)
Script run as manage.py command gets frequent django.db.utils.InterfaceError: connection already closed errors
Reported by: | utkarshpandey12 | Owned by: | nobody |
---|---|---|---|
Component: | Core (Management commands) | Version: | 5.0 |
Severity: | Normal | Keywords: | Connection already closed error command manage.py |
Cc: | utkarshpandey12 | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I am using Django rest framework for serving APIs for my microservices based application. These microservices run inside kubernetes with pods and there are different environments for dev/stg/prod.
Prod and stg can scale the number of pods as per traffic.
We have written a Kafka broker consumer script which is run as a manage.py command. The handle method of the command has the following code. This script internally calls methods of AuthTask( ) class internally and handles events on these published topics. These event handler functions of AuthTasks( ) makes queries ex User.objects.get( pk = 1) and randomly at some point of time it gives the above mentioned error i.e Connection already closed.
This has been reported by one fellow developer in one of the forum post but we could not reach solution ( tough luck ).
As per the discussion and poinyted out by forum member these manage.py command scripts run for weeks, sometimes even months, but never any longer than a few months as this error knocks them out.
This error happens randomly in time so reproducibility might be very tough but I am attaching the traceback of the error which we keep receiving from time to time in this script
Tracebacks
exception error : connection already closed traceback = File "/app/core/management/commands/run_kafka_consumer.py", line 38, in handle self.auth_consumer_tasks_manager.handle_efg(task_payload = msg.value) File "/app/core/tasks/consumer_tasks/consumer_tasks_manager.py", line 57, in handle_efg user_obj = auth_dao.get_user_by_user_id(user_id = int(task_payload['user_id'])) File "/app/core/dao/auth.py", line 16, in get_user_by_user_id user_obj = User.objects.get(pk=user_id) File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 492, in get num = len(clone) File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 302, in __len__ self._fetch_all() File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1507, in _fetch_all self._result_cache = list(self._iterable_class(self)) File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 57, in __iter__ results = compiler.execute_sql( File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1359, in execute_sql cursor = self.connection.cursor() File "/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py", line 284, in cursor return self._cursor() File "/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py", line 262, in _cursor return self._prepare_cursor(self.create_cursor(name)) File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py", line 262, in _cursor return self._prepare_cursor(self.create_cursor(name)) File "/usr/local/lib/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 256, in create_cursor cursor = self.connection.cursor()
Code of script run as manage.py command
from kafka import KafkaConsumer class Command(BaseCommand): help = 'Starts the Kafka consumer in background for Taskqueues' auth_consumer_tasks_manager = AuthTasks() def handle(self, *args, **options): topics = [ "list_of_topics" ] config = { 'bootstrap_servers': KAFKA_BROKER_URL, 'value_deserializer': lambda m: json.loads(m.decode('utf-8')), 'auto_offset_reset':'earliest', 'group_id':'test_group', "api_version":(2, 8, 1), "enable_auto_commit":False, } consumer = KafkaConsumer(**config) consumer.subscribe(topics) for msg in consumer: try: if msg.topic=='abc': self.auth_consumer_tasks_manager.handle_abc(task_payload=msg.value) elif msg.topic=='efg': self.auth_consumer_tasks_manager.handle_efg(task_payload=msg.value) consumer.commit() except Exception as e: # Means exception happened in background task thread. Should do apprpriate handling here # For now just passing so that consumer stays active. logger.error( f"Exception happened while trying to consume task- {msg.topic} with payload {msg.value},exception error : {str(e)} traceback = {''.join(traceback.format_tb(e.__traceback__))}", extra={"className": self.__class__.__name__} )
settings.py
DATABASES = {"default": env.db("DATABASE_URL")} DATABASES["default"]["ATOMIC_REQUESTS"] = True DJANGO_APPS = [ "django.contrib.auth", "django.contrib.contenttypes", "django.contrib.sessions", "django.contrib.sites", "django.contrib.messages", "django.contrib.staticfiles", # "django.contrib.humanize", # Handy template tags "django.contrib.admin", ] THIRD_PARTY_APPS = [ "rest_framework", "rest_framework.authtoken", "corsheaders", "drf_spectacular", "cid.apps.CidAppConfig", ] LOCAL_APPS = [ "core_app" # Your stuff: custom apps go here ] INSTALLED_APPS = DJANGO_APPS + THIRD_PARTY_APPS + LOCAL_APPS INSTALLED_APPS = ["whitenoise.runserver_nostatic" ] + DJANGO_APPS + THIRD_PARTY_APPS + LOCAL_APPS INSTALLED_APPS += ["debug_toolbar"] INSTALLED_APPS += ["django_extensions"] MIDDLEWARE = [ "django.middleware.security.SecurityMiddleware", "corsheaders.middleware.CorsMiddleware", "whitenoise.middleware.WhiteNoiseMiddleware", "django.contrib.sessions.middleware.SessionMiddleware", "django.middleware.locale.LocaleMiddleware", "django.middleware.common.CommonMiddleware", "django.middleware.csrf.CsrfViewMiddleware", "django.contrib.auth.middleware.AuthenticationMiddleware", "django.contrib.messages.middleware.MessageMiddleware", "django.middleware.common.BrokenLinkEmailsMiddleware", "django.middleware.clickjacking.XFrameOptionsMiddleware", "core_app.middleware.CustomExceptionMiddleware", ] MIDDLEWARE += ["debug_toolbar.middleware.DebugToolbarMiddleware"]
Stack details
gunicorn version 20.1.0 (worker =1 & timeout of 60-120 sec in dev/stg) and (1 worker and timeout =30sec in prod)
postgresql version 15.5
django version 4.0.8
Using wsgi
requirements.txt
pytz==2022.7
python-slugify==7.0.0
Pillow==9.4.0
argon2-cffi==21.3.0
whitenoise==5.3.0
redis==4.4.1
hiredis==2.1.1
twilio==7.16.3
django-cid==2.3
watchtower==3.0.1
# Django
# ------------------------------------------------------------------------------
django==4.0.8 # pyup: < 4.1
django-environ==0.9.0
django-model-utils==4.3.1
django-allauth==0.52.0
django-crispy-forms==1.14.0
crispy-bootstrap5==0.7
django-redis==5.2.0
# Django REST Framework
djangorestframework==3.14.0
django-cors-headers==3.13.0
# DRF-spectacular for api documentation
drf-spectacular==0.25.1
django-storages[boto3]==1.13.2
kafka-python==2.0.2
gunicorn==20.1.0
psycopg2==2.9.5
firebase-admin==6.4.0
How we execute things up
we execute the shell script with following commands
start.sh
#!/bin/bash
set -o errexit
set -o pipefail
set -o nounset
#Migrate
python /app/manage.py migrate
# Collect static files
python /app_root_dir/manage.py collectstatic --noinput
# Start the Kafka consumer in the background
python /app_root_dir/manage.py run_kafka_consumer & ( script where issue is coming )
# Start the Gunicorn server
exec /usr/local/bin/gunicorn config.wsgi --bind 0.0.0.0:8000 --chdir=/app_root_dir --timeout 120
Any help is highly appreciated. Thanks for checking into this and following up on this issue. I have tried to provide every detail possible without revealing too much about the core project. If any further info is reqd I will be happy to provide it.
This is a duplicate of #14845 (just like #32589), check out Simon's comment.