Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#31991 closed Bug (wontfix)

QuerySet.raw() method returns string instead of dict for JSONField().

Reported by: horpto Owned by: nobody
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords: JSONfield
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Imagine 2 tables:

class AtsPhoneCall(models.Model):
    created = models.DateTimeField(auto_now_add=True, db_index=True)

    class Meta:
        db_table = 'atscall'

class PhoneCall(models.Model):
    ats_call = models.OneToOneField(
        'phones.AtsPhoneCall',
        on_delete=models.PROTECT,
        related_name='call',
    )
    ivr_data = models.JSONField(null=True)

And raw query with join will give us wrong value of ivr_data. Old django.contrib.postgres.fields.JSONField returned dict value as new django.db.models.JSONField returns str instead of dict. Example from django shell:

>>> sql = '''
    join phones_phonecall as p on atscall.id = p.ats_call_id
    where ivr_data is not null
    limit 10
'''
...     select atscall.id, p.ivr_data
...     from atscall
...     join phones_phonecall as p on atscall.id = p.ats_call_id
...     where ivr_data is not null
...     limit 10
... '''
>>>
>>> AtsPhoneCall.objects.raw(sql)[0].ivr_data
DEBUG;(0.002)
    select atscall.id, p.ivr_data
    from atscall
    join phones_phonecall as p on atscall.id = p.ats_call_id
    where ivr_data is not null
    limit 10
; args=()
'{"v1": []}'
>>>
>>> for i in AtsPhoneCall.objects.raw(sql):
...     print(i.id, i.ivr_data, type(i.ivr_data))
...     break
...
DEBUG;(0.002)
    select atscall.id, p.ivr_data
    from atscall
    join phones_phonecall as p on atscall.id = p.ats_call_id
    where ivr_data is not null
    limit 10
; args=()
180621245012669 {"v1": []} <class 'str'>
>>> next(AtsPhoneCall.objects.raw(sql).iterator()).ivr_data
DEBUG;(0.008)
    select atscall.id, p.ivr_data
    from atscall
    join phones_phonecall as p on atscall.id = p.ats_call_id
    where ivr_data is not null
    limit 10
; args=()
'{"v1": []}'

Change History (4)

comment:1 by Mariusz Felisiak, 4 years ago

Resolution: wontfix
Status: newclosed
Summary: .raw method returns string instead of dictQuerySet.raw() method returns string instead of dict for JSONField().

Thanks for the report. This behavior was changed in 0be51d2226fce030ac9ca840535a524f41e9832c as a part of fix for custom decoders. Unfortunately we have to skip the default psycopg2's behavior for all JSONField() (with psycopg2.extras.register_default_jsonb()) we cannot change this per a specific field. Moreover the new behavior is consistent for all database backends. You can avoid using raw() with Subquery():

AtsPhoneCall.objects.annotate(
    ivr_data=Subquery(PhoneCall.objects.filter(ats_call__pk=OuterRef('pk')).values('ivr_data')[:1]),
)

or call json.loads() on fetched data.

I will add a note to Django 3.1.1 release notes about this behavior change.

comment:2 by Петр Eлагин, 4 years ago

Resolution: wontfix
Status: closednew

i not use raw() with Subquery() ! i use direct query in database, pleasy make in settings use or not use register_default_jsonb\

comment:3 by Mariusz Felisiak, 4 years ago

Resolution: wontfix
Status: newclosed

Петр, please don't shout and don't reopen closed tickets. If you don't use raw(), you can provide more detail in #32474.

comment:4 by Ruben Nielsen, 3 years ago

Hello,

Chiming in here, as this is something that breaks a lot of functionality in our app when loading. I'm seeing this particular piece of the code as my problem:

        # Register dummy loads() to avoid a round trip from psycopg2's decode
        # to json.dumps() to json.loads(), when using a custom decoder in
        # JSONField.
        psycopg2.extras.register_default_jsonb(conn_or_curs=connection, loads=lambda x: x)

(django.db.backends.postgresql.base.py::DatabaseWrapper.get_new_connection)

I sort-of understand the reasoning behind this change, but I'm not sure how to get around it, if I just want the original functionality for my project, which only uses Postgres anyway. The comment says "when using a custom decoder", so shouldn't you check for this? If I use one, then fine, but if I don't then stop breaking psycopg2 for me... Or could this be controlled from settings.py? I know there are tonnes of settings already, but it would certainly help.

Or another path. How do I set a custom decoder for all my JSONFields? Earlier, everything was simply json.loads by default, so that was all good. I know how to make a custom field with a custom decoder, sure, but how would I then tie that into queryset.raw()?

--- Edit:

Commenting out that line that I am unhappy about will solve my raw() problems, but will cause problems elsewhere. My current desperate solution, which seems to solve my issues is to monkey patch the things I didn't want:

  1. Don't disable the psycopg2 defaults
  2. In the SQLCompiler, when doing conversions, if the conversion is for JSON, assume that psycopg2 has already handled it and skip it

It is very hamfisted, but it works. Any pointers for more sensible solution would be much appreciated :)

@async_unsafe
def monkey_get_new_connection(self, conn_params):
    connection = Database.connect(**conn_params)

    # self.isolation_level must be set:
    # - after connecting to the database in order to obtain the database's
    #   default when no value is explicitly specified in options.
    # - before calling _set_autocommit() because if autocommit is on, that
    #   will set connection.isolation_level to ISOLATION_LEVEL_AUTOCOMMIT.
    options = self.settings_dict['OPTIONS']
    try:
        self.isolation_level = options['isolation_level']
    except KeyError:
        self.isolation_level = connection.isolation_level
    else:
        # Set the isolation level to the value from OPTIONS.
        if self.isolation_level != connection.isolation_level:
            connection.set_session(isolation_level=self.isolation_level)

    # --- Monkey: This line was introduces by Django team, which changes the
    # --- defaults of psycopg2. We need to not do that.
    # --- (https://code.djangoproject.com/ticket/31991)
    # Register dummy loads() to avoid a round trip from psycopg2's decode
    # to json.dumps() to json.loads(), when using a custom decoder in
    # JSONField.
    #psycopg2.extras.register_default_jsonb(conn_or_curs=connection, loads=lambda x: x)
    # ---
    return connection
DatabaseWrapper.get_new_connection = monkey_get_new_connection


def monkey_apply_converters(self, rows, converters):
    connection = self.connection
    converters = list(converters.items())
    for row in map(list, rows):
        for pos, (convs, expression) in converters:
            value = row[pos]
            for converter in convs:
                # --- Monkey: `monkey_get_new_connection` now lets psycopg2 deal
                # --- with JSON conversion for us, so when we arrive at this point
                # --- it'll already be a dict. So if the converter is for JSON,
                # --- We can safely skip it
                module_name = str(inspect.getmodule(converter))
                is_json = 'django/db/models/fields/json.py' in module_name
                is_array = 'django/contrib/postgres/fields/array.py' in module_name
                if is_json or is_array:
                    continue
                # ---
                value = converter(value, expression, connection)
            row[pos] = value
        yield row
SQLCompiler.apply_converters = monkey_apply_converters
Last edited 3 years ago by Ruben Nielsen (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top