Opened 18 months ago

Last modified 14 months ago

#26511 new Cleanup/optimization

Document how to do a substring search in JSONField

Reported by: Eoin Murray Owned by: nobody
Component: Documentation Version: master
Severity: Normal Keywords: postgres, jsonb
Cc: eoin@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I have a model called File with a JSONField. I create a new file and inside the JSONField is a key value pair

{"title": "the cow jumped over the moon"}

I want to be able to search for objects where the title contains moon...

I can do it will the following raw SQL statement in Django

File.objects.raw('SELECT * FROM web_file WHERE (metadata ->> %s)::text LIKE %s;', ['title', '%moon%'])[0]

This works, but doesn't return a queryset, and doesn't play nice with the rest of my code. I was hoping I could do this query with the Django ORM. From the docs, I thought the following code would work:

File.objects.filter(metadata__title__contains='moon')

But gives the an error (traceback here https://gist.github.com/eoinmurray/04843b77e8d71fad8cfa4a990c00740a)

DataError: invalid input syntax for type json
LINE 1: ...b_file" WHERE "web_file"."metadata" -> 'title' @> 'moon' ORD...

DETAIL:  Token "moon" is invalid.
CONTEXT:  JSON data, line 1: moon

Perhaps this is something that cannot be accomplished with the ORM, and therefore is not a bug - but I'm not really sure from the documentation...

Here is all the code needed to recreate the error, Im using Django 1.9, Postgres 9.4 and psycopg2 2.6.1.

models.py

from django.db import models
from django.contrib.postgres.fields import JSONField

class File(models.Model):
    owner = models.ForeignKey('auth.User', related_name='file_user')
    created_on = models.DateTimeField(auto_now_add=True, blank=True)
    name = models.CharField(max_length=100)
    metadata = JSONField(null=True, blank=True, default=dict())

query.py

from web.models import File
from django.contrib.auth.models import User

user = User.objects.get(pk=1)

File.objects.create(name="testfile.txt", owner=user, metadata={'title': 'the cow jumped over the moon'})

# raw query that works
# File.objects.raw('SELECT * FROM web_file WHERE (metadata ->> %s)::text LIKE %s;', ['title', '%moon%'])[0]

File.objects.filter(metadata__title__contains='moon')

>>> outputs traceback https://gist.github.com/eoinmurray/04843b77e8d71fad8cfa4a990c00740a
>>> DataError: invalid input syntax for type json
>>> LINE 1: ...b_file" WHERE "web_file"."metadata" -> 'title' @> 'moon' ORD...
                                                             ^
>>> DETAIL:  Token "moon" is invalid.
>>> CONTEXT:  JSON data, line 1: moon

Change History (12)

comment:1 Changed 18 months ago by Claude Paroz

Resolution: worksforme
Status: newclosed
Last edited 18 months ago by Claude Paroz (previous) (diff)

comment:2 Changed 18 months ago by Eoin Murray

That will only return objects where the metadata contains the key 'title' with the exact value 'moon', it wont match with objects where 'title' is 'the cow jumped over the moon'...

I have just tested it there and using your suggested syntax doesnt work for me

File.objects.create(name="testfile.txt", owner=some_user, metadata={'title': 'the cow jumped over the moon'})

print File.objects.filter(metadata__contains={'title': 'moon'})

>>> []

The filter returns no items

I just checked and it seems to me that

print File.objects.filter(metadata__title= 'moon')

and

print File.objects.filter(metadata__contains={'title': 'moon'})

are equivalent, and they do exact searches for title=moon, and not a substring containment search which is what Im looking for

Last edited 18 months ago by Eoin Murray (previous) (diff)

comment:3 Changed 18 months ago by Simon Charette

As Claude pointed out the textfield__contains and jsonfield__contains lookups do not behave the same.

In order to use the textfield__contains operator on metadata -> 'title' you'll have to cast your JSON field to text (just like you do in your raw() query) before hand.

The following should work against the master branch (which introduced the Cast function):

from django.db import models
from django.db.models.functions import Cast

File.objects.annotate(
    title=Cast('metadata__title', models.TextField()),
).filter(title__contains='moon')
Last edited 18 months ago by Simon Charette (previous) (diff)

comment:4 Changed 18 months ago by Claude Paroz

Component: contrib.postgresDocumentation
Owner: set to nobody
Resolution: worksforme
Status: closednew
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization
Version: 1.9master

Thanks Simon for the details. Then it might be worth adding that hint somewhere in the documentation.

comment:5 Changed 18 months ago by Eoin Murray

Is there any way to do this in Django <=1.9.5?

comment:6 Changed 18 months ago by Tim Graham

You can probably copy the Cast function from Django master (03b6947728466e4e907487f30dd4dfec94a8eb2f) into your project.

comment:7 Changed 16 months ago by Christian Karrié

I've done something like:

like_arg = "%%%s%%" % query_string
qs.extra(where=["json_values ->> %s LIKE %s"], params=[lookup_key, like_arg])

json_values is my JSONField, lookup_key is "title" in your case, like_arg is "%moon%" in your case.

Thus returns a queryset

comment:8 Changed 15 months ago by Jens Neuhaus

Has patch: set

Hi. I submitted a patch to include it in the docs: https://github.com/django/django/pull/6965

comment:9 Changed 15 months ago by Tim Graham

Patch needs improvement: set
Summary: Django 1.9 and Postgres 9.4 jsonb string containment inside JSONFieldDocument how to do a substring search in JSONField

Marking as "Patch needs improvement" per Simon's comment, "Searching a whole JSON blob seems like a bad practice to me." The technique described in comment:3 doesn't work, so it's possible some code changes might also be needed to allow the desired behavior.

comment:10 Changed 15 months ago by Simon Charette

It looks like what we really need here is support for the ->> operator as (lhs -> key)::text is the JSON representation of the string (('"foo"'::jsonb)::text yields '"foo"').

I believe this should be done by introducing a new expression for this purpose and documenting it.

In the meantime we could document using Func('data', template="%(expressions)s ->> 'hobby'", output_field=models.TextField()).

comment:11 Changed 15 months ago by Simon Charette

Here's a POC of what I had in mind.

comment:12 Changed 14 months ago by Tim Graham

See #27205 for the same issue as it applies to ArrayField.

Note: See TracTickets for help on using tickets.
Back to Top