#32411 closed Cleanup/optimization (fixed)
Case-insensitive lookups on JSONField doesn't work on MySQL.
| Reported by: | elonzh | Owned by: | Hasan Ramezani |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 3.1 |
| Severity: | Normal | Keywords: | JSONField mysql |
| Cc: | Mariusz Felisiak, Sage Abdullah | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
given model
class Paper(models.Model):
...
authors = models.JSONField(default=list)
...
In [53]: p = Paper.objects.get(id=1)
In [54]: p.__dict__
Out[54]:
{'_state': <django.db.models.base.ModelState at 0x7f2aa74b3790>,
'id': 1,
...
'authors': [{'name': 'Rehmann Kim-Thomas',
'workplace': 'SAP SE, Walldorf, Germany'},
{'name': 'Folkerts Enno', 'workplace': 'SAP SE, Walldorf, Germany'}],
...
}
In [55]: print(Paper.objects.filter(authors__icontains="rehmann").count())
0
In [56]: print(Paper.objects.filter(authors__icontains="Rehmann").count())
1
In [57]: print(Paper.objects.filter(authors__icontains="Rehmann").query)
SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE JSON_UNQUOTE(`scholardata_paper`.`authors`) LIKE %Rehmann% ORDER BY `scholardata_paper`.`id` ASC
In [58]: print(Paper.objects.filter(authors__iregex="rehmann").count())
1
In [59]: print(Paper.objects.filter(authors__iregex="Rehmann").count())
1
In [60]: print(Paper.objects.filter(authors__iregex="Rehmann").query)
SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE REGEXP_LIKE(JSON_UNQUOTE(`scholardata_paper`.`authors`), Rehmann, 'i') ORDER BY `scholardata_paper`.`id` ASC
In [63]: print(Paper.objects.filter(authors__0__name__icontains="rehmann").count())
1
In [64]: print(Paper.objects.filter(authors__0__name__icontains="rehmann").query)
SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE LOWER(JSON_UNQUOTE(JSON_EXTRACT(`scholardata_paper`.`authors`, $[0]."name"))) LIKE LOWER(%rehmann%) ORDER BY `scholardata_paper`.`id` ASC
Change History (9)
comment:1 by , 5 years ago
| Description: | modified (diff) |
|---|
follow-up: 7 comment:2 by , 5 years ago
| Cc: | added |
|---|---|
| Resolution: | → invalid |
| Status: | new → closed |
comment:3 by , 5 years ago
| Cc: | added |
|---|---|
| Keywords: | mysql added |
| Resolution: | invalid |
| Status: | closed → new |
| Summary: | JSONField filter icontains is case sensitive in mysql → Case-insensitive lookups on JSONField doesn't work on MySQL. |
| Triage Stage: | Unreviewed → Accepted |
| Type: | Bug → Cleanup/optimization |
This is not a documented usage, however it doesn't work only on MySQL, so IMO we can fix this for consistency. MySQL handles strings used in JSON context using the utf8mb4_bin binary collation, that's why comparison of JSON values is case-sensitive and we need to use CaseInsensitiveMixin. We can fix this be registering a custom case-insensitive lookup for JSONField, e.g.
class JSONFieldIContains(CaseInsensitiveMixin, lookups.IContains):
pass
JSONField.register_lookup(JSONFieldIContains)
I don't think it makes sense to do the same for __istartswith, iendswith, and __iexact.
comment:5 by , 5 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
comment:7 by , 5 years ago
Replying to Carlton Gibson:
Hi. Thanks for the report.
Your usage is wrong for what you're trying to do. See the Querying JSONField topic docs.
For JSONField
containstakes key pairs, rather than a string:
>>> Dog.objects.filter(data__contains={'owner': 'Bob'})
icontainsis not explicitly implemented on the field itself, so it's falling back to the base implementation (wrongly — the JSON_UNQUOTE gives us a string but we didn't generate theLOWER) but it's not clear that makes much sense. (Mariusz: I'm closing this as invalid, but do we want to accept in order to raise an error here? 🤔)
As you've discovered the correct usage is to query by field (as per here):
>>> Dog.objects.filter(data__owner__name__icontains='bob')For your example
Paper.objects.filter(authors__name__icontains="rehmann")should give you what you want.
I hope that helps.
This seems related to the documentation issue #26511.
Thanks for your information, I know how to query field in a json object, but the problem is the authors fields is a list and Django does not support filter it.
Paper.objects.filter(authors__name__icontains="rehmann") won't give me what I want.
>>> print(Paper.objects.filter(authors__name__icontains="rehmann").query) SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE LOWER(JSON_UNQUOTE(JSON_EXTRACT(`scholardata_paper`.`authors`, $."name"))) LIKE LOWER(%rehmann%) ORDER BY `scholardata_paper`.`id` ASC
follow-up: 9 comment:8 by , 5 years ago
the problem is the authors fields is a list and Django does not support filter it
That has little to do with this ticket though. The ORM has no way of knowing authors is a JSON Array because JSONField are schema less. If you want to filter out papers that have at least one author with a matching name you'll have to use the MySQL equivalent of PostgreSQL json_array_elements and a subquery through a Func expression but that's something you should ask about on support channels.
comment:9 by , 5 years ago
Replying to Simon Charette:
the problem is the authors fields is a list and Django does not support filter it
That little to do with this ticket though. The ORM has no way of knowing
authorsis a JSON Array becauseJSONFieldare schema less. If you want to filter out papers that have at least one author with a matching name you'll have to use the MySQL equivalent of PostgreSQLjson_array_elementsand a subquery through aFuncexpression but that's something you should ask about on support channels.
Yes, that little to do with this ticket. I just want to tell Carlton Gibson that he has given the wrong solution.
Hi. Thanks for the report.
Your usage is wrong for what you're trying to do. See the Querying JSONField topic docs.
For JSONField
containstakes key pairs, rather than a string:>>> Dog.objects.filter(data__contains={'owner': 'Bob'})icontainsis not explicitly implemented on the field itself, so it's falling back to the base implementation (wrongly — the JSON_UNQUOTE gives us a string but we didn't generate theLOWER) but it's not clear that makes much sense. (Mariusz: I'm closing this as invalid, but do we want to accept in order to raise an error here? 🤔)As you've discovered the correct usage is to query by field (as per here):
For your example
Paper.objects.filter(authors__name__icontains="rehmann")should give you what you want.I hope that helps.
This seems related to the documentation issue #26511.