Opened 6 months ago

Last modified 5 months ago

#31276 assigned Cleanup/optimization

Add 'flush' option to only clear non-empty tables

Reported by: Adam (Chainz) Johnson Owned by: jonatron
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This was suggested as an extra feature for Django-MySQL by @jonatron in ​https://github.com/adamchainz/django-mysql/pull/611/files , but it would make more sense to make it in Django core.

The sql_flush database operation, used by the 'flush' management command and, thus through TransactionTestCase, creates one TRUNCATE TABLE statement for every table - regardless of whether they are empty already. This means unnecessary database operations are run, which will be most noticeable in test runs.

Instead of flushing *every* table, we can flush only the non-empty ones.

A minimal, hopefully cross-database way of quickly finding out which tables in a list are non-empty is to make existence queries on them combined with union:

chainz@localhost [14]> (select 't' as has_rows from t limit 1) union (select 't2' as has_rows from t2 limit 1);
+----------+
| has_rows |
+----------+
| t2       |
+----------+
1 row in set (0.001 sec)

We could use logic like this to batch the tables into such union queries in django.core.management.sql.sql_flush. We could then reduce the list of tables we wish to flush down to the non-empty ones for the sql_flush operation. Perhaps it would be best to put this behind a flag that is set only in tests, for backwards compatibility.

Change History (9)

comment:1 Changed 6 months ago by Claude Paroz

Triage Stage: UnreviewedAccepted

+1. What sort of backwards compatibility issues would you imagine here?

comment:2 Changed 6 months ago by Adam (Chainz) Johnson

Not sure actually, just being cautious by suggesting the flag in case someone else can come up with a case. As far as I can tell, as long as we still reset sequences, there’s no need to truncate/delete from empty tables.

comment:3 Changed 6 months ago by felixxm

I'm not sure, TBH. Checking no. of rows for each table will have a performance impact, and creating a huge union for all tables (as proposed) may not be better:

(select 't' as has_rows from t limit 1) union (select 't2' as has_rows from t2 limit 1);

We need to take into account a few restrictions e.g. the database's limit on the number of query parameters and 3rd-party database backends may not support UNION.

comment:4 Changed 6 months ago by jonatron

Owner: changed from nobody to jonatron
Status: newassigned

comment:5 Changed 6 months ago by jonatron

I've been benchmarking with 1000 tables and 5 tests. You'll notice sqlite has a problem with too many tables. Also the query for sqlite needs to be:
select * from ( SELECT '{}' AS has_rows FROM {} LIMIT 1 )
instead of
( SELECT '{}' AS has_rows FROM {} LIMIT 1 )

postgres

current behaviour
Ran 5 tests in 1.538s
Ran 5 tests in 1.545s
Ran 5 tests in 1.573s

with patch
Ran 5 tests in 0.807s
Ran 5 tests in 0.880s
Ran 5 tests in 0.812s

mysql
current behaviour
Ran 5 tests in 17.101s
Ran 5 tests in 15.396s
Ran 5 tests in 15.487s

with patch
Ran 5 tests in 3.891s
Ran 5 tests in 3.545s
Ran 5 tests in 3.522s

sqlite - 400 tables
current behaviour
Ran 5 tests in 2.006s
Ran 5 tests in 2.039s
Ran 5 tests in 2.055s

with patch
Ran 5 tests in 0.106s
Ran 5 tests in 0.104s
Ran 5 tests in 0.107s

sqlite - 1000 tables
django.db.utils.OperationalError: too many terms in compound SELECT

comment:6 Changed 6 months ago by felixxm

About SQLite, I mentioned this in my comment "... the database's limit on the number of query parameters" (999 on SQLite).

comment:7 Changed 6 months ago by jonatron

The sqlite problem can be fixed using the same chunking as is in _quote_params_for_last_executed_query.
I can't explain sqlite's performance.

sqlite
1000 tables with patch
Ran 5 tests in 0.237s
Ran 5 tests in 0.345s
Ran 5 tests in 0.288s
Ran 5 tests in 0.257s
Ran 5 tests in 0.240s

1000 tables without patch

Ran 5 tests in 55.617s
Ran 5 tests in 57.577s
Ran 5 tests in 55.242s

900 tables without patch
Ran 5 tests in 45.796s

800 tables without patch
Ran 5 tests in 35.795s

600 tables without patch
Ran 5 tests in 20.521s

550 tables without patch
Ran 5 tests in 19.025s

500 tables without patch
Ran 5 tests in 3.049s

400 tables without patch
Ran 5 tests in 2.068s

200 tables without patch
Ran 5 tests in 0.612s

comment:8 Changed 6 months ago by jonatron

Has patch: set
Version 0, edited 6 months ago by jonatron (next)

comment:9 Changed 5 months ago by felixxm

Needs documentation: set
Note: See TracTickets for help on using tickets.
Back to Top