Opened 14 years ago

Closed 12 years ago

Last modified 11 years ago

#13844 closed Bug (fixed)

Errors when using character fields for aggregation

Reported by: zegrep@… Owned by: Greg Wogan-Browne
Component: Database layer (models, ORM) Version: 1.2
Severity: Normal Keywords:
Cc: philipe.rp@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Julien Phalip)

My intension is joining tables with two different formated columns,
by using django's double underscore magic to resolve the querysets.

The values in table B are preceded by leading zeros.

table A --> id = "10"

table B --> id = "000010"

select distinct b.id from a, b where where a.id = TRIM(leading '0' FROM b.id);

The resulting code should look like this

    qs_a=A.objects.all().values('id')
    qs_b=B.objects.annotate(id_trim=Trim('id', position='leading', char='0')).filter(id_trim__in=qs_a)

I use the following code to implement the extra functionality.

try:
    import psycopg2
except ImportError, e:
    import psycopg
else:
    psycopg = psycopg2._psycopg

class Trim(django.db.models.Aggregate):
    name = 'Trim_pg'
django.db.models.Trim=Trim

class Trim_pg(django.db.models.sql.aggregates.Aggregate):
    '''
        position = [leading, trailing, both]
        char = <character to remove>
    '''

    sql_function = 'TRIM'
    sql_template = '''%(function)s(%(position)s %(char)s FROM %(field)s)'''

    def __init__(self, col, distinct=False, **extra):
        assert extra.has_key('position'), u'no position'
        assert extra['position'] in ('leading', 'trailing', 'both'), 'position no in [leading, trailing, both]'
        assert extra.has_key('char'), u'no char'
        assert len(extra['char']) == 1, 'only one character'
        extra['char']=str(psycopg2._psycopg.QuotedString(extra['char'])) #Quoting
        super(Trim_pg, self).__init__(col, distinct=distinct, **extra)
django.db.models.sql.aggregates.Trim_pg=Trim_pg

The problem is, that "convert_values" makes for a "CharField"
a cast to float. My solution is to return the value for CharFields
without the cast.

Index: db/backends/__init__.py
===================================================================
--- db/backends/__init__.py (Revision 12595)
+++ db/backends/__init__.py (Arbeitskopie)
@@ -438,6 +438,8 @@
             return int(value)
         elif internal_type in ('DateField', 'DateTimeField', 'TimeField'):
             return value
+        elif internal_type in  ('CharField'):
+            return value
         # No field, or the field isn't known to be a decimal or integer
         # Default to a float
         return float(value)

Change History (14)

comment:1 by Chris Beaven, 14 years ago

Component: UncategorizedDatabase layer (models, ORM)
Summary: ValueError: by adding custom aggregate function TRIMErrors when using character fields for aggregation
Triage Stage: UnreviewedAccepted
Version: 1.11.2

For the life of me I can't understand the logic of the original convert_values method, but it definitely seems wrong.

Someone in IRC encountered a problem because of this using the built in Max aggregation class with a charfield, so it's not just custom aggregate classes which are effected here.

comment:2 by Felipe 'chronos' Prenholato, 14 years ago

Cc: philipe.rp@… added
Has patch: set

I'm the guy at IRC. The proposed fix already works for me. I having troubles in get Max from a charfield.

Given model:

class Link_A(models.Model):
    link = models.CharField(max_length=500)
    short_link = models.CharField(max_length=50)

I get following error when try to get Max of short_link:

In [5]: Link_A.objects.all().aggregate(Max('short_link'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/felipe/projects/testproject/<ipython console> in <module>()

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/query.pyc in aggregate(self, *args, **kwargs)
    311                 is_summary=True)
    312 
--> 313         return query.get_aggregation(using=self.db)
    314 
    315     def count(self):

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/sql/query.pyc in get_aggregation(self, using)
    371             (alias, self.resolve_aggregate(val, aggregate, connection=connections[using]))
    372             for (alias, aggregate), val
--> 373             in zip(query.aggregate_select.items(), result)
    374         ])
    375 

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/sql/query.pyc in resolve_aggregate(self, value, aggregate, connection)
    325         else:
    326             # Return value depends on the type of the field being processed.

--> 327             return self.convert_values(value, aggregate.field, connection)
    328 
    329     def get_aggregation(self, using):

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/sql/query.pyc in convert_values(self, value, field, connection)
    303         it can be overridden by Query classes for specific backends.
    304         """
--> 305         return connection.ops.convert_values(value, field)
    306 
    307     def resolve_aggregate(self, value, aggregate, connection):

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/backends/__init__.pyc in convert_values(self, value, field)
    443         # No field, or the field isn't known to be a decimal or integer

    444         # Default to a float

--> 445         return float(value)
    446 
    447     def check_aggregate_support(self, aggregate_func):

ValueError: invalid literal for float(): c

and checking queries, I can see that querie are made:

In [6]: connection.queries
Out[6]: 
[{'sql': 'SELECT MAX("webui_link_a"."short_link") AS "short_link__max" FROM "webui_link_a"',
  'time': '0.001'}]

That's is a small fix, but I think that need to check how act with others internal types, or default to float only if float(value) don't raise nothing and just return value if raise.

comment:3 by Matthias Kestenholz, 14 years ago

This is a related issue: #12889 Using annotation unexpectedly returns DecimalFields as floats

comment:4 by Greg Wogan-Browne, 13 years ago

Owner: changed from nobody to Greg Wogan-Browne
Status: newassigned

comment:5 by Graham King, 13 years ago

Severity: Normal
Type: Bug

comment:6 by Julien Phalip, 13 years ago

Description: modified (diff)
Needs tests: set
Patch needs improvement: set

Patch needs improvement as per chronos' comment. Also needs tests.

comment:7 by Aymeric Augustin, 12 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:8 by Aymeric Augustin, 12 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:9 by valhallasw, 12 years ago

The proposed solution does not solve the underlying design issue: this conversion should be done *in the fields*, not in the database backend! I.e.

    def convert_values(self, value, field):
        return field.convert_value(value)

and I expect (but am not 100% certain) that field.to_python() already does exactly what should happen. The differences I can see at the moment are a) the default float() and b) CommaSeparatedIntegerField (which shouldn't be cast to int in the first place).

a) the default to_python implementation is just
{{{ def to_python(self, value):

return value

}}}

b) the CommaSeparatedIntegerField is actually just a charfield with a specific validation.

comment:10 by valhallasw, 12 years ago

I have submitted a pull request with a regression test & a fix for the issue.

An even simpler option, which uses

    def convert_values(self, value, field):
        return value

fails AggregationTests.test_more;

The pull-requested-version works correctly and passes all aggregation and aggregation_regress tests under both sqlite and postgresql.

comment:11 by valhallasw, 12 years ago

Last, but not least: a work-around for postgresql_psycopg2:

from django.db.backends.postgresql_psycopg2 import base

class DatabaseOperations(base.DatabaseOperations):
    def convert_values(self, value, field):
        return field.to_python(value)

class DatabaseWrapper(base.DatabaseWrapper):
    def __init__(self, *args, **kwargs):
        super(DatabaseWrapper, self).__init__(*args, **kwargs)
        self.ops = DatabaseOperations(self)

comment:12 by Anssi Kääriäinen, 12 years ago

Needs tests: unset
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

There is a comment in Oracle's convert_values() which says that because to_python() is meant for validation, it should not be used for converting values. At least one possible problem is that the to_python() method raises ValidationError on problematic input, and that error doesn't fit here.

I am proposing (and I am going to likely commit) a patch which just removes the non-necessary float(value) from convert_values. This fixes PostgreSQL and MySQL. Oracle and SQLite already override the convert_values() method, and return the bare value in the "unknown field type" case.

In addition, the suggestion made in the comments of this ticket that the conversion should be done by the field, not by the backend gets a +1 from me. It might be as straightforward as defining to_python() to do the conversion, but IMO it needs separate ticket and some more investigation.

comment:13 by Anssi Kääriäinen <akaariai@…>, 12 years ago

Resolution: fixed
Status: assignedclosed

In [59a655988ee95e4b4e4646cebea88b620d8fcdd2]:

Fixed #13844 -- Avoid converting unknown db values to float

This patch removes an unconditional float(value) conversion from db
backend default convert_values() method. This can cause problems when
aggregating over character fields for example. In addition, Oracle
and SQLite already return the bare value from their convert_values().

In the long term the converting should be done by fields, and the
fields should then call database backend specific converters when
needed. The current setup is inflexible for 3rd party fields.

Thanks to Merlijn van Deen for the original patch.

comment:14 by Florian Apolloner, 11 years ago

#19448 is a duplicate.

Note: See TracTickets for help on using tickets.
Back to Top