Opened 5 years ago

Closed 3 years ago

Last modified 3 years ago

#13844 closed Bug (fixed)

Errors when using character fields for aggregation

Reported by: zegrep@… Owned by: wogan
Component: Database layer (models, ORM) Version: 1.2
Severity: Normal Keywords:
Cc: philipe.rp@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by julien)

My intension is joining tables with two different formated columns,
by using django's double underscore magic to resolve the querysets.

The values in table B are preceded by leading zeros.

table A --> id = "10"

table B --> id = "000010"

select distinct b.id from a, b where where a.id = TRIM(leading '0' FROM b.id);

The resulting code should look like this

    qs_a=A.objects.all().values('id')
    qs_b=B.objects.annotate(id_trim=Trim('id', position='leading', char='0')).filter(id_trim__in=qs_a)

I use the following code to implement the extra functionality.

try:
    import psycopg2
except ImportError, e:
    import psycopg
else:
    psycopg = psycopg2._psycopg

class Trim(django.db.models.Aggregate):
    name = 'Trim_pg'
django.db.models.Trim=Trim

class Trim_pg(django.db.models.sql.aggregates.Aggregate):
    '''
        position = [leading, trailing, both]
        char = <character to remove>
    '''

    sql_function = 'TRIM'
    sql_template = '''%(function)s(%(position)s %(char)s FROM %(field)s)'''

    def __init__(self, col, distinct=False, **extra):
        assert extra.has_key('position'), u'no position'
        assert extra['position'] in ('leading', 'trailing', 'both'), 'position no in [leading, trailing, both]'
        assert extra.has_key('char'), u'no char'
        assert len(extra['char']) == 1, 'only one character'
        extra['char']=str(psycopg2._psycopg.QuotedString(extra['char'])) #Quoting
        super(Trim_pg, self).__init__(col, distinct=distinct, **extra)
django.db.models.sql.aggregates.Trim_pg=Trim_pg

The problem is, that "convert_values" makes for a "CharField"
a cast to float. My solution is to return the value for CharFields
without the cast.

Index: db/backends/__init__.py
===================================================================
--- db/backends/__init__.py (Revision 12595)
+++ db/backends/__init__.py (Arbeitskopie)
@@ -438,6 +438,8 @@
             return int(value)
         elif internal_type in ('DateField', 'DateTimeField', 'TimeField'):
             return value
+        elif internal_type in  ('CharField'):
+            return value
         # No field, or the field isn't known to be a decimal or integer
         # Default to a float
         return float(value)

Change History (14)

comment:1 Changed 5 years ago by SmileyChris

  • Component changed from Uncategorized to Database layer (models, ORM)
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from ValueError: by adding custom aggregate function TRIM to Errors when using character fields for aggregation
  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.1 to 1.2

For the life of me I can't understand the logic of the original convert_values method, but it definitely seems wrong.

Someone in IRC encountered a problem because of this using the built in Max aggregation class with a charfield, so it's not just custom aggregate classes which are effected here.

comment:2 Changed 5 years ago by chronos

  • Cc philipe.rp@… added
  • Has patch set

I'm the guy at IRC. The proposed fix already works for me. I having troubles in get Max from a charfield.

Given model:

class Link_A(models.Model):
    link = models.CharField(max_length=500)
    short_link = models.CharField(max_length=50)

I get following error when try to get Max of short_link:

In [5]: Link_A.objects.all().aggregate(Max('short_link'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/home/felipe/projects/testproject/<ipython console> in <module>()

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/query.pyc in aggregate(self, *args, **kwargs)
    311                 is_summary=True)
    312 
--> 313         return query.get_aggregation(using=self.db)
    314 
    315     def count(self):

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/sql/query.pyc in get_aggregation(self, using)
    371             (alias, self.resolve_aggregate(val, aggregate, connection=connections[using]))
    372             for (alias, aggregate), val
--> 373             in zip(query.aggregate_select.items(), result)
    374         ])
    375 

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/sql/query.pyc in resolve_aggregate(self, value, aggregate, connection)
    325         else:
    326             # Return value depends on the type of the field being processed.

--> 327             return self.convert_values(value, aggregate.field, connection)
    328 
    329     def get_aggregation(self, using):

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/models/sql/query.pyc in convert_values(self, value, field, connection)
    303         it can be overridden by Query classes for specific backends.
    304         """
--> 305         return connection.ops.convert_values(value, field)
    306 
    307     def resolve_aggregate(self, value, aggregate, connection):

/home/felipe/.py/2.6/lib/python2.6/site-packages/django/db/backends/__init__.pyc in convert_values(self, value, field)
    443         # No field, or the field isn't known to be a decimal or integer

    444         # Default to a float

--> 445         return float(value)
    446 
    447     def check_aggregate_support(self, aggregate_func):

ValueError: invalid literal for float(): c

and checking queries, I can see that querie are made:

In [6]: connection.queries
Out[6]: 
[{'sql': 'SELECT MAX("webui_link_a"."short_link") AS "short_link__max" FROM "webui_link_a"',
  'time': '0.001'}]

That's is a small fix, but I think that need to check how act with others internal types, or default to float only if float(value) don't raise nothing and just return value if raise.

comment:3 Changed 5 years ago by mk

This is a related issue: #12889 Using annotation unexpectedly returns DecimalFields as floats

comment:4 Changed 4 years ago by wogan

  • Owner changed from nobody to wogan
  • Status changed from new to assigned

comment:5 Changed 4 years ago by graham_king

  • Severity set to Normal
  • Type set to Bug

comment:6 Changed 4 years ago by julien

  • Description modified (diff)
  • Needs tests set
  • Patch needs improvement set

Patch needs improvement as per chronos' comment. Also needs tests.

comment:7 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:8 Changed 3 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:9 Changed 3 years ago by valhallasw

The proposed solution does not solve the underlying design issue: this conversion should be done *in the fields*, not in the database backend! I.e.

    def convert_values(self, value, field):
        return field.convert_value(value)

and I expect (but am not 100% certain) that field.to_python() already does exactly what should happen. The differences I can see at the moment are a) the default float() and b) CommaSeparatedIntegerField (which shouldn't be cast to int in the first place).

a) the default to_python implementation is just
{{{ def to_python(self, value):

return value

}}}

b) the CommaSeparatedIntegerField is actually just a charfield with a specific validation.

comment:10 Changed 3 years ago by valhallasw

I have submitted a pull request with a regression test & a fix for the issue.

An even simpler option, which uses

    def convert_values(self, value, field):
        return value

fails AggregationTests.test_more;

The pull-requested-version works correctly and passes all aggregation and aggregation_regress tests under both sqlite and postgresql.

comment:11 Changed 3 years ago by valhallasw

Last, but not least: a work-around for postgresql_psycopg2:

from django.db.backends.postgresql_psycopg2 import base

class DatabaseOperations(base.DatabaseOperations):
    def convert_values(self, value, field):
        return field.to_python(value)

class DatabaseWrapper(base.DatabaseWrapper):
    def __init__(self, *args, **kwargs):
        super(DatabaseWrapper, self).__init__(*args, **kwargs)
        self.ops = DatabaseOperations(self)

comment:12 Changed 3 years ago by akaariai

  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

There is a comment in Oracle's convert_values() which says that because to_python() is meant for validation, it should not be used for converting values. At least one possible problem is that the to_python() method raises ValidationError on problematic input, and that error doesn't fit here.

I am proposing (and I am going to likely commit) a patch which just removes the non-necessary float(value) from convert_values. This fixes PostgreSQL and MySQL. Oracle and SQLite already override the convert_values() method, and return the bare value in the "unknown field type" case.

In addition, the suggestion made in the comments of this ticket that the conversion should be done by the field, not by the backend gets a +1 from me. It might be as straightforward as defining to_python() to do the conversion, but IMO it needs separate ticket and some more investigation.

comment:13 Changed 3 years ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In [59a655988ee95e4b4e4646cebea88b620d8fcdd2]:

Fixed #13844 -- Avoid converting unknown db values to float

This patch removes an unconditional float(value) conversion from db
backend default convert_values() method. This can cause problems when
aggregating over character fields for example. In addition, Oracle
and SQLite already return the bare value from their convert_values().

In the long term the converting should be done by fields, and the
fields should then call database backend specific converters when
needed. The current setup is inflexible for 3rd party fields.

Thanks to Merlijn van Deen for the original patch.

comment:14 Changed 3 years ago by apollo13

#19448 is a duplicate.

Note: See TracTickets for help on using tickets.
Back to Top