Opened 4 years ago
Last modified 4 years ago
#32241 closed New feature
Array Many to Many field — at Version 1
Reported by: | Omar Altayyan | Owned by: | |
---|---|---|---|
Component: | contrib.postgres | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
let's use an example:
class Item related = model.ManyToManyField(Tag) class Tag pass
in relational databases, you sometimes have the use case where u want to filter Item
by getting rows that are related to a certain set of Tag
, so for example get me the items that are related to tag1 and tag2 at the same time, if we wanted to get the items that have either tag1 or tag2, the query would be as simple as an in
condition, but here we want the items that have both tag1 and tag2 and possibly more.
using regular django or SQL you'd solve this by adding a group by and a count, note that this query has to search in the Tag entity, get the records, group them, then check the items where the count of Tag related rows equals the desired number of Tags we are filtering by.
however, using a simple ArrayField in postgres, that has all the IDs of Tag related to Item in an array field in the Item entity, along with a good index like GinIndex, you can simply use instersect on that tags field in Item to do that filtering, and even do a regular OR filtering that is many times more efficient.
we tested this approach vs the group by/count in the M2M field, the performance improvement was almost 2 orders of magnitude, going from 100s of ms to like 4-5 ms
however the only issue with anyone trying this approach in Django is that the arrayField can store the IDs of Tag but there's no ORM backing behind it, there's no prefetch related, nor any other type of help, so you'd have to manually fetch entities which is not only bad for code but also for performance.
there's already a package https://github.com/primal100/django_postgres_extensions that has this applied, but it has been archived and it uses a lot of hacks that either mess up django (which is what happened to us in https://code.djangoproject.com/ticket/32239#comment:4) or make it break when doing a django update
so since the improvement is huge in performance, it's worth considering by the Django core team, you could just fetch the ArrayManyToMany field from this package and apply it's patches in a concrete way and make this field a part of Django, since it's benefits are too good to ignore.
in order to help a bit, i made a pastebin entry:
https://pastebin.com/yPJGi3vu
these are patched versions of the methods in Django 3.1.4
note that the additions to django's methods are enclosed in OUR CUSTOM CODE
comments, the ones after else
blocks mean the else isn't found in django.
in order to make these methods work we would patch the methods in django in an App ready()
function:
from django.db.models import query from django.db.models.sql import datastructures from django.db.models.signals import pre_delete from django_postgres_extensions.signals import delete_reverse_related from core.lib.django_postgres_extensions.array_m2m_patches import prefetch_one_level from core.lib.django_postgres_extensions.array_m2m_patches import as_sql from core.lib.django_postgres_extensions.array_m2m_patches import _update from core.lib.django_postgres_extensions.array_m2m_patches import update query.QuerySet.update = update query.QuerySet._update = _update datastructures.Join.as_sql = as_sql query.prefetch_one_level = prefetch_one_level pre_delete.connect(delete_reverse_related)
i hope this has any value.