﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
33015	QuerySet.extra Use Case: filtering on large lists of tuples	kris-swann	nobody	"Hello,

While reading through the documentation for .extra(..) it looked like use cases should be logged as a ticket here -- so here is that.

In an effort to speed up some slow queries that we've been encountering I've been looking into taking advantage of db indexes.

Basically we have run into the same issue as this SO question: https://stackoverflow.com/questions/23991090/django-filter-multiple-columns-with-a-list-of-tuples

We are using Postgres as our backing db.

We have a model that looks like this

{{{
class ExampleModel(models.Model):
    val1 = models.TextField()
    val2 = models.TextField()
    # etc...

    class Meta:
        indexes = [
            models.Index(fields=['val1', 'val2'])
        ]
}}}


For demonstration purposes, we'll fill it with sample data

{{{
sample_data = [
    ExampleModel(val1=str(v1), val2=str(v2))
    for v1 in range(0,1000)
    for v2 in range(0,1000)
]
ExampleModel.objects.bulk_create(sample_data)
}}}

Using a or-chained Q object is significantly slower

Note: In practice this can be a list of any arbitrary combinations, but for ease of generation, we'll use an easy to generate list of tuples
{{{
items_to_fetch = [
    (i, j)
    for i in range(0,100)
    for j in range(0,100)
]
}}}


Using an or-chained Q object:
{{{
import time
query = Q()
for v1, v2 in items_to_fetch:
    query |= Q(val1=v1, val2=v2)
start = time.perf_counter()
ExampleModel.objects.filter(query)
end = time.perf_counter()
print(f""{end - start} seconds"")

>>> OUTPUT:
>>> 43.73997112699999 seconds
}}}

VS.

Using .extra(...)
{{{
import time
start = time.perf_counter()
ExampleModel.objects.extra(
    where=[""(val1, val2) in %s""],
    params=[tuple(items_to_fetch)]
)
end = time.perf_counter()
print(f""{end - start} seconds"")

>>> OUTPUT:
>>> 0.0004218950000449695 seconds
}}}



If there are any alternatives worth trying out, that would be really helpful info. I wasn't able to find anything in the docs (although I might have just been looking in the wrong places too)."	Cleanup/optimization	closed	Database layer (models, ORM)	2.2	Normal	fixed	QuerySet.extra, Documentation	Keryn Knight Simon Charette	Accepted	0	0	0	0	0	0
