Code

Opened 2 years ago

Closed 17 months ago

Last modified 17 months ago

#18135 closed Bug (fixed)

Sleeping Database Connections on Startup with MySQL

Reported by: Mnewman Owned by: aaugustin
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords: MySQL
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I am digging into this a little further, but right now, starting a new project starting it up with runserver, or gunicorn, creates a sleeping MySQL connection.

./manage.py startproject testproject
cd testproject

Change the settings for a local MySQL database.

./manage.py runserver

On another command prompt

./manage.py dbshell
SHOW PROCESSLIST;

Observe the 2 sleeping connections (on for the shell and one for runserver).

Removing all the default apps seems to stop the connection. Adding contenttypes creates the connection. Adding staticfiles does not. Due to this, I believe that it is some code in contenttypes that is causing this connection.

Attachments (1)

18135-sleeping-mysql-connection-1.4.X.diff (1.2 KB) - added by ramiro 2 years ago.
Patch for this issue in 1.4.X branch

Download all attachments as: .zip

Change History (32)

comment:1 Changed 2 years ago by ramiro

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I see this also with 1.3.X.

Also, the contenttypes app isn't the only one that causes this, activating any of the DB-using apps (e.g. auth) makes the connection appear.

What is the practical effect of this issue? Is it an issue at all?

comment:2 follow-up: Changed 2 years ago by Mnewman

Practical effect is that a database only has so many connections available. Each instance fires up a connection, so if a database has 50 available connections and you have 10 servers with 5 instances each, you all of a sudden are out of database connections. Sleeping connections are a really bad thing.

comment:3 follow-up: Changed 2 years ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted

I did a little digging and the reason for the connection is that model validation needs to know server version, and checking server version opens a connection which is then not closed. So, this code in mysql/validation.py:

        try:
            db_version = self.connection.get_server_version()
            text_version = '.'.join([str(n) for n in db_version[:3]])
        except OperationalError:
            db_version = None
            text_version = ''

seems to need some way to close the connection. I am not sure if it is safe to just have a "finally: close connection" in there. Maybe it should be checked that if there is no existing connection when entering this function, but there is one when leaving it, then close the created connection.

Version 0, edited 2 years ago by akaariai (next)

comment:4 in reply to: ↑ 3 Changed 2 years ago by ramiro

Replying to akaariai:

EDIT: reading further it seems the only usage for the version in this part of the code is checking for pre-5.0.3 versions, and as support for those versions is going to be dropped, it seems it might be possible that 1.5 does not need any other fix here than removing the get_server_version() call altogether.

See #18116.

comment:5 Changed 2 years ago by akaariai

So, I guess now the question is should this be back-patched? I am not the right person to answer this as I am not too experienced with Django's back-patching rules.

Another question which arises is how to create a regression test for this? This is the kind of bug which could easily resurface later on.

comment:6 Changed 2 years ago by Mnewman

In a self serving way, I do think that this should be back-ported. This is kind of a shocking problem that people use ORMs to avoid in their own code. I was lucky, I have had sleeping connections in the past that have brought down my DB, so I monitor for sleeping connections to the database. People on smaller machines with databases that only allow a limited number of connections will reach their DB's limit very quickly as their sites scale.

Also, fixing this problem is difficult. The only way to open those connections back up is to go into the database and kill those processes. This takes quite a bit of knowledge of the DB, which is another reason that people use ORMs.

comment:7 Changed 2 years ago by aaugustin

The backport policy is clear:

Patches applied to the trunk will also be applied to the last minor release, to be released as the next micro release, when they fix critical problems:

  • Security issues.
  • Data-loss bugs.
  • Crashing bugs.
  • Major functionality bugs in newly-introduced features.

This bug doesn't fall into any of these categories.

If you want to backport the patch, you can. But you don't have to.

comment:8 follow-up: Changed 2 years ago by Mnewman

I would argue that the bug leads to crashing, quite reliably. When a database runs out of connections, Django doesn't run. Thanks for reminding me of those policies, my first note should have mentioned that.

I will backport it myself. I am more worried about new people coming to Django. This took me a long time to figure out and I knew what I was looking for.

comment:9 in reply to: ↑ 8 Changed 2 years ago by ramiro

  • Resolution set to fixed
  • Status changed from new to closed

Replying to Mnewman:

I would argue that the bug leads to crashing, quite reliably. When a database runs out of connections, Django doesn't run. Thanks for reminding me of those policies, my first note should have mentioned that.

I will backport it myself. I am more worried about new people coming to Django. This took me a long time to figure out and I knew what I was looking for.

Fixed in [17921].

Given this reasoning I now think it could be right to fix this in previous releases. Will try to port fixes to 1.4.X and 1.3.X and attach patches here for review and testing.

Last edited 2 years ago by ramiro (previous) (diff)

Changed 2 years ago by ramiro

Patch for this issue in 1.4.X branch

comment:10 Changed 2 years ago by ramiro

I've attached a fix for this issue in the 1.4.X and 1.3.X branches (in the latter it applies with fuzz 1 (offset -73 lines)).

Testing report and feedback welcome.

comment:11 Changed 2 years ago by akaariai

My quick test indicates that the extra connection is gone both in 1.4.X and 1.3.X. So, looks good to me. Only thing I wonder is if there should be try-finally to ensure the new connection is closed. But I guess if the finally branch is ever hit the open connection is not the biggest of your worries...

comment:12 Changed 2 years ago by Mnewman

Did this ticket get closed by accident? I don't see it patched in the repo.

I can confirm that this patch closes the lingering connection.

I opened pull requests with this patch for the master, 1.4.x and 1.3.x banches on GitHub:

comment:13 Changed 2 years ago by akaariai

  • Has patch set
  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Accepted to Ready for checkin

Yes, it looks this was only partly fixed in [17921] - that is the server version checking still can leave open connections, it is just not called in the same place as before. It would probably be a good idea to apply all of the above pull requests.

comment:14 Changed 2 years ago by akaariai

Verified here also that the issue is present both in 1.3 and 1.4 and the patch fixes the issues.

Mnewman: If you could add [1.4.x] Fixed... and [1.3.x] Fixed... prefixes to the commit messages I think we could directly merge the pull requests to Django for all versions. Maybe you should add a paragraph of "Made sure server version checking does not leave open connection behind. Backport of <COMMIT_HASH of the master commit>".

comment:15 Changed 2 years ago by Mnewman

Quick crash course in git rebase (which is awesome) and I think I have done it. Please double check that I did it right, I am still figuring out Git. Thanks

comment:16 Changed 2 years ago by akaariai

I think the problem is gone in master, but this still needs to be fixed in 1.4 and 1.3. Master should get this fix, too, even if the problem isn't present in model validation.

comment:17 Changed 2 years ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from reopened to closed

In [4423757c0c50afbe2470434778c8d5e5b4a70925]:

Fixed #18135 -- Close connection used for db version checking

On MySQL when checking the server version, a new connection could be
created but never closed. This could result in open connections on
server startup.

comment:18 Changed 2 years ago by Anssi Kääriäinen <akaariai@…>

In [0f69a16785b88d03dc246a652cf348e0f2704e8e]:

[1.4.x] Fixed #18135 -- Close connection used for db version checking

On MySQL when checking the server version, a new connection could be
created but never closed. This could result in open connections on
server startup.

Backport of 4423757c0c50afbe2470434778c8d5e5b4a70925.

comment:19 Changed 2 years ago by Anssi Kääriäinen <akaariai@…>

In [a15d3b58d8c4cbb6137f0458544ec03f3394bb08]:

[1.3.x] Fixed #18135 -- Close connection used for db version checking

On MySQL when checking the server version, a new connection could be
created but never closed. This could result in open connections on
server startup.

Backport of 4423757c0c50afbe2470434778c8d5e5b4a70925.

comment:20 Changed 2 years ago by russellm

Why has this patch been applied to the 1.3.X branch? The *only* patches that should be applied to 1.3 are those that will result in a security release.

comment:21 Changed 2 years ago by akaariai

I missed the security-only status of 1.3.x. I will revert the patch later on today from 1.3.

comment:22 Changed 2 years ago by Anssi Kääriäinen <akaariai@…>

In [7ca10b1dac758924f9bbd219880cc17a537c5e47]:

Reverted "[1.3.x] Fixed #18135 -- Close connection used for db version checking"

This reverts commit a15d3b58d8c4cbb6137f0458544ec03f3394bb08. Django
1.3.x is in security fixes only state, and this wasn't a security
issue.

comment:23 in reply to: ↑ 2 Changed 17 months ago by aaugustin

Replying to Mnewman:

Practical effect is that a database only has so many connections available. Each instance fires up a connection, so if a database has 50 available connections and you have 10 servers with 5 instances each, you all of a sudden are out of database connections. Sleeping connections are a really bad thing.


I stumbled upon this ticket while working on persistent database connections and I don't understand this argument.

If you have 50 workers, and they're all serving requests simultaneously, then you're going to have 50 database connections in parallel anyway.

If your database is only allows, say, 20 connections at a time, then you can't have more than 20 workers active, and there's no point in having 50 of them!

I fail to see how sleeping connections are a bad thing.


Currently, the connection created at startup will be closed after the first query. In the near future, it will stay open. (See the discussion on django-developers.)

I'm going to revert this change.

comment:24 Changed 17 months ago by aaugustin

  • Resolution fixed deleted
  • Status changed from closed to new

comment:25 Changed 17 months ago by aaugustin

  • Component changed from Uncategorized to Database layer (models, ORM)
  • Owner changed from nobody to aaugustin
  • Status changed from new to assigned
  • Triage Stage changed from Ready for checkin to Design decision needed

comment:26 follow-up: Changed 17 months ago by Mnewman

This is not scalable. It makes sense for small applications to have a database query open for every thread that you have running the Django app, but what happens as your application scales? You can leverage agressive caching techniques or NoSQL to make the load on the databases less and less. Some applications that I run only need connection to the database when a write operation is performed.

So what happens on sites, like I have, that run millions of concurrents? I need hundreds of thousands threads to properly scale. I cannot afford a relational database that requires that many open "sleeping" connections.

Persistant connections are a fine thing for small sites that don't autoscale along with swaths of users that are looking to improve performance by the couple of hundredths of second that it takes to connect to a local database, but when it comes to a large site, the overhead of a connection on each Django instance is really detrimental.

This needs to at least be an option. The difference in price between a database that has 50 available connections to one that requires 500 is extreme. Applications can cache queries to the database, they can't magically increase the number of available connections to the database.

I will weigh in on the discussion on django-dev, can you point me in the right direction? I have not seen it pass through my inbox.

comment:27 in reply to: ↑ 26 Changed 17 months ago by aaugustin

Replying to Mnewman:

So what happens on sites, like I have, that run millions of concurrents? I need hundreds of thousands threads to properly scale. I cannot afford a relational database that requires that many open "sleeping" connections.

Exaggeration doesn't help your credibility. No one's running hundred of thousands of Django workers connected to a MySQL database without a pooler. And the topic of this ticket isn't persistent connections. At the moment, they're just a proposal, open for discussion on the mailing list.


Back to the topic.

I missed a use case in my previous comment: it's possible to have a MySQL database that's only occasionally hit by workers, especially if it isn't the primary database. Such a database could support fewer simultaneous connections than the total amount of workers.

In the current codebase, the only place that uses the MySQL server version is sequence_reset_by_name_sql, and this function is only used by the tests. It appears that mysql_version was called at import time in previous versions of Django, but that isn't true any longer.

One could argue for keeping this code in case future versions of Django introduce MySQL-version-dependent code that's evaluated at compile time, but in general, unnecessary code hurts maintainability.

Last edited 17 months ago by aaugustin (previous) (diff)

comment:28 Changed 17 months ago by Mnewman

I think we are on the same page.

comment:29 Changed 17 months ago by aaugustin

Today, I learned that some people indeed run hundreds of thousands of Django threads... My comment above is wrong!

comment:30 Changed 17 months ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In aea98e8c5357b15da214af311e8cb74b1503f958:

Simplified MySQL version checking.

Django used to check the version of MySQL before handling the first
request, which required:

  • opening a connection
  • closing it, to avoid holding it idle until the first request.

This code isn't necessary any longer since Django dropped support for
some versions of MySQL, and other database backends don't implement a
similar dance. For consistency and maintenability, remove it.

Reverts 4423757c0c50afbe2470434778c8d5e5b4a70925.

Closes #18135.

comment:31 Changed 17 months ago by Aymeric Augustin <aymeric.augustin@…>

In ebabd772911f732ef54e014f130f6f5530198e14:

Ensured a connection is established when checking the database version.

Fixed a test broken by 21765c0a. Refs #18135.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.