Code

Opened 6 years ago

Last modified 10 months ago

#9619 new Bug

to_python not called when fetching data with .values(...)

Reported by: Valera Grishin Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: values,custom field, datetime
Cc: amlau@…, olau@…, chrischambers, marfire Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider following models:

from django.db import models

class BinaryField(models.Field):
    __metaclass__ = models.SubfieldBase

    def encode(self):
        # some encoding here

    def decode(self):
        # some decoding here

    def db_type(self):
        return 'text'

    def get_db_prep_value(self, value):
        return super(BinaryField, self).get_db_prep_value(BinaryField.__encode(value))

    def to_python(self, value):
        return super(BinaryField, self).to_python(BinaryField.__decode(value))

class Logo(models.Model):
    channel = models.CharField(max_length=100)
    logo = BinaryField(editable=False)

Now assume you want to fetch values of "logo" field:

print Logo.objects.values('logo')[1]['logo']

This will print encoded value as it is get stored in the database. That is to_python not called.

Attachments (0)

Change History (17)

comment:1 Changed 5 years ago by amlau

  • Cc amlau@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 5 years ago by amlau

  • Cc amlau@… added; amlau@… removed

comment:3 Changed 5 years ago by jacob

  • milestone set to 1.1
  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 5 years ago by mtredinnick

  • Triage Stage changed from Accepted to Design decision needed

This is a bit of a line-ball, since it's called values, not Python objects, so returning the data representation isn't a completely bad idea -- leaving it up to the user to decide what they want to do with it. For example, complete consistency here would then be converting foreign key values to their corresponding Python objects (kind of defeating the purpose of calling values() instead of a normal queryset). I've just wontfixed #8144 about that particular issue (largely because of the huge impact it would have on existing code).

I'm kind of in favour of staying away from django.db.models.fields.Field entirely when working with values() and values_list(), keeping it as the fast path for raw data.

I'm going to risk the wrath of Jacob and DDN this for a little bit. I think we need a discussion when there's a quiet moment.

comment:5 Changed 5 years ago by mtredinnick

Also occurs to me that a new parameter to values() and values_list() might not be out of the realms of possibility here.

Something like do_the_extra_work_required_to_convert_to_python_even_though_I_could_use_a_normal_queryset_with_defer = True, perhaps. :-)

comment:6 Changed 5 years ago by Alex

Malcolm, you mentions defer. Got something you want to share with the class :)

comment:7 Changed 5 years ago by mtredinnick

Keep the ticket comments on-topic, please.

comment:8 Changed 5 years ago by mtredinnick

  • milestone 1.1 deleted

Having thought about this a lot more, it isn't really a bug in existing functionality since values() has always behaved this way and the name is pretty descriptive. Worthwhile having a discussion about adding some feature to this in the future, but I can't see that it's a bug in the 1.1-release sense.

comment:9 Changed 5 years ago by benwilber@…

  • Keywords values,custom field, datetime added
  • Version changed from 1.0 to SVN

This seems like a bug to me. Please see my post for a better case:
http://groups.google.com/group/django-users/browse_thread/thread/7f6f75334ef1653c/0174e8ce8e1e514b?hl=en%C2%AEe8ce8e1e514b

It seems to me that 'values()' should return data as it's _meant_ to be consumed, not as it's stored. If you want raw data, get at something like a 'raw_values()' method or similar. As it stands now, I'm not sure how to use date_hierarchy and maybe others with a custom datetime field when 'to_python()' is not called without digging into the internals. Are there other standard apps that access API'd data 'through the back door' as well?

comment:10 Changed 4 years ago by olau

Hey, while we're waiting for a design decision, could we please get a quick fix on this that updates the documentation? It never has occurred to me that values() wouldn't deserialize, I've always thought it would just be an easy way to get plain dicts/tuples instead of the heavier objects, and the documentation doesn't mention raw values at all:

"Returns a ValuesQuerySet -- a QuerySet that returns dictionaries when used as an iterable, rather than model-instance objects.

Each of those dictionaries represents an object, with the keys corresponding to the attribute names of model objects."

I have been using values() for a couple of years without noticing the difference before today, where it broke on me because I've defined a custom field.

Here's a suggestion for a wording: "Note that the values returned are directly from the database, no field-specific conversions are performed, e.g. for custom fields with to_python()."

Otherwise, +1 for getting this fixed, with raw=True/False or whatever. :)

comment:11 Changed 4 years ago by olau

  • Cc olau@… added

comment:12 Changed 4 years ago by chrischambers

  • Cc chrischambers added

comment:13 Changed 3 years ago by lukeplant

  • Severity set to Normal
  • Type set to Bug

comment:14 Changed 2 years ago by sam@…

  • Easy pickings unset
  • UI/UX unset

IMHO, if this isn't a bug it's definitely a gotcha (and the documentation should be updated). I spent the last several hours trying to debug why values_list was not returning the python objects defined by my custom model field's to_python method.

comment:15 Changed 2 years ago by aaugustin

I probably haven't been here long enough, but I'm finding this behavior is extremely unexpected.

I'll leave this ticket in DDN because I'm not sure I grasp all the consequences.

But my gut feeling is that we should swallow the backwards incompatibility and fix this at some point — or at the very least make it clear in the documentation that QuerySet.values() returns raw database values, as spit by the database adapter.

comment:16 Changed 16 months ago by akaariai

  • Triage Stage changed from Design decision needed to Accepted

I vote for raw kwarg, if True no conversion happens. One day the default should be False, but likely with backwards compatibility period where it default still to True.

It might be worth it to inspect if there is a way to skip those method calls that do not actually do anything, maybe just if 'col_field.to_python == Field.to_python'. The standard Field.to_python just returns the value as is. There might be need to ask the field explicitly if the given connection's values needs conversion, most of the to_python() implementations seem non-necessary (IntegerField does int(value) for example, but I don't believe this is necessary on any DB's values).

The overhead of calling no-ops in trivial test case is 0.7s vs 0.07s. The test is:

def noop(value):
    return value

myvals = [
    (0, 1, 2, 3, 4),
] * 100
def values():
    for row in myvals:
        yield row  # alternate: yield map(noop, row)

from datetime import datetime
start = datetime.now()
for _ in range(0, 10000):
    for val in values():
        pass
print datetime.now() - start

Still, the test passes in under a second even with the map call, so it might be that the overhead doesn't matter in practice.

Anyways, marking as accepted, it would be good to end up in a situation where you can opt out from conversion, but the default is to use to_python() for the values.

comment:17 Changed 10 months ago by marfire

  • Cc marfire added

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.