Opened 12 years ago

Closed 10 years ago

Last modified 10 years ago

#18757 closed Cleanup/optimization (fixed)

Examine ways to get rid of DB backend convert_values()

Reported by: Anssi Kääriäinen Owned by: Marc Tamlyn
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The convert_values() method is responsible for converting database values to Python values. The conversion is used at least for aggregates, GIS and Oracle's query compiler.

The problem with current implementation is two-fold:

  1. The model fields have no way of specifying new converters. This is problematic for 3rd party fields. They would need to do some kind of a dynamic patch to the backend's convert_values() method if they need convert_values() support.
  2. The different backends have slightly differing implementation for the same cases.

In #13844 it was suggested that Field.to_python() method could be reused. It seems it does mostly the correct thing, but it is meant for validation, not converting DB values to Python. So, going that route would need at least some more investigation.

One possible solution is something like this: A field has convert_values() method. The field can (and likely will) call a backend specific converter method. So, a custom field could do the following:

  1. Add converter methods to backends it supports:
    def psycopg2_converter(self, value):
        return value.lower()
    try:
        from django.db.backends.postgresql_psycopg2.operations import DatabaseOperations
        DatabaseOperations.convert_myfield_value = psycopg2_converter
        # Not sure if this exact syntax works.
    except ImportError:
        pass
    # Or, if there is need for just a generic converter:
    
    from django.db.backends import BaseDatabaseOperations
    BaseDatabaseOperations.convert_myfield_value = my_generic_converter
    
  2. The convert_values method of the field itself would be:
    def convert_values(self, connection, value):
        return connection.convert_myfield_value(value)
    

A little complicated, but it is general. I think it would be fast enough.

Most fields could just do the conversion directly in field.convert_values() without messing with the DatabaseOperations at all. Core fields would not need to do any dynamic installation, as we could of course just install the converter into the source code.

The above proposal is just one way to achieve what is needed. I do think there is some need for improvement here, but I am not at all sure the suggestion in this post is the way forward.

Change History (7)

comment:1 by Anssi Kääriäinen, 12 years ago

Triage Stage: UnreviewedAccepted

I am going to mark this accepted. Not necessarily what I suggested in the description, but some way of getting rid of the convert_values seems like a good idea. This is one area where non-core fields and core-fields are not equal.

comment:2 by Marc Tamlyn, 10 years ago

Owner: changed from nobody to Marc Tamlyn
Status: newassigned

convert_values is currently run under the following conditions:

  • When aggregates are run. This means sqlite, oracle, mssql, and gis all have special cases, and the default backend has handling for floats and IntegerFields (which is not always called in a super() for sqlite and oracle...)
  • When the compiler defines resolve_columns and this calls convert_values. This means oracle, all gis.

It is worth noting that the work done in resolve_columns on mysql probably should be in convert_values (make boolean fields into a boolean value).


Like with many things, we need to support both third party backends and third party fields appropriately here. So we need to make sure that mssql's use of convert_values is still possible to handle SQLServer's weird datetime formatting, and also support a third party (or builtin) field which has its own storage. There are two possible steps - one is normalisation of the returned data so all backends operate the same, and the second is further conversion from the "backend" version to the full model version (this would solve #14462). Both steps should be possible at both levels in my opinion - so the field level method should still receive the connection object.

On top of this, all data conversion at the moment is done using get_internal_type so for backwards compatibility we need to make sure we respect any field with that set to a real internal type. This means we cannot move all of the conversion currently done to field level (at least outside of gis I think).


A possible plan of action:

  • make SQLCompiler.results_iter always call SQLCompiler.resolve_columns, and make sure subclasses (which are still needed) call the super() as needed.
  • For each field and row call field.from_db_value(value, connection). By default this would look like:
    def field.from_db_value(value, connection):
        internal_type = self.get_internal_type()
        converter_name = 'convert_%s_value' % internal_type.lower()
        func = getattr(connection.ops, converter_name, None)
        if func:
             value = func(value)
        return value
    

An individual field can override from_db_value if it wants to do custom conversions, and backends add convert_myfield_value methods rather than convert_values with a stupid number of if statements in it. Individual convert_myfield_value methods should be coded defensively in case conversion is already done - there are cases where you get a different value depending on whether it is an aggregate or a direct value.

I think this will allow me to remove some complexity from gis and mysql at least, and remove convert_values as it is all together. It would be backwards incompat for third party backends using convert_values, although this could easily be included with a deprecation warning in the default from_db_value if needed (if hasattr....). Given how mssql's convert_values is written anyway, I think this will be an improvement anyway.

POC coming up soon hopefully!

comment:3 by Anssi Kääriäinen, 10 years ago

I have tested a bit in this area and we really can't call field.from_db_value for every field. It is just plain too expensive. We are talking about 2-3x performance slowdown for models with ten fields. A single method call is somewhere around 5% slowdown in model initialization speed. For 10 fields the above field.from_db_value() definition will cause 3 calls per field, so that is 5% * 3 * 10 = 150% slowdown.

Instead we need to do the following:

1) Collect converters for each field before we start iterating the result set.
2) Not all fields define a converter - in fact almost all fields in core do not need a converter at all.
3) At the same time we also collect converters from the database backend.
4) When we have collected the converters, start iterating through the result set. Pass the values through the found converters.

This is needed so that if a field doesn't define any converter, then there is no overhead for that field. For this reason I would also try to avoid doing backend specific conversion in the field's from_db method - this means that if any backend needs from_db support, then all backends need to pay the overhead of calling the method for no benefit at all.

I don't think the new way needs to be backwards compatible to the old convert_values way. We just need to ensure the old way works for backwards compatibility period.

So, alternate proposal:

1) Add field.get_db_value_converters(connection) method. The method returns a list of converters or None if no converters are needed.
2) Add backend.get_value_converter(field) method. By default this method returns [backend.convert_values] if that is defined for the backend, otherwise it returns None.
3) Collect all converters before iterating results.
4) When iterating through the results, the converters will be called with just value as argument (we could likely also add connection here)
5) Deprecate SubfieldBase, deprecate backend.convert_values.

As for how to do the convert_values collection, look for https://github.com/akaariai/django/tree/custom_lookups for one implementation. Unfortunately that implementation is mixed with unrelated changes.

comment:4 by Marc Tamlyn, 10 years ago

Has patch: set
Patch needs improvement: set

POC at https://github.com/django/django/pull/3047

I agree with your comments akaariai - will try to update to use that approach. Most of the code is currently in removing convert_values rather than in the details of what happens afterwards. If I get the POC all green, the performance *should* be easier.

For my own notes, I believe #21565 should get fixed by this as well, and so I should write a failing test for it. At least if I can get a working spatialite setup anyway...

comment:5 by Marc Tamlyn <marc.tamlyn@…>, 10 years ago

Resolution: fixed
Status: assignedclosed

In e9103402c0fa873aea58a6a11dba510cd308cb84:

Fixed #18757, #14462, #21565 -- Reworked database-python type conversions

Complete rework of translating data values from database

Deprecation of SubfieldBase, removal of resolve_columns and
convert_values in favour of a more general converter based approach and
public API Field.from_db_value(). Now works seamlessly with aggregation,
.values() and raw queries.

Thanks to akaariai in particular for extensive advice and inspiration,
also to shaib, manfre and timograham for their reviews.

comment:6 by Tim Graham <timograham@…>, 10 years ago

In 185ab9ffefcf81378d7da02dddca0a59487b9613:

Fixed Oracle GIS failures introduced by e9103402c0; refs #18757.

Thanks Marc Tamlyn for the patch.

comment:7 by Marc Tamlyn <marc.tamlyn@…>, 10 years ago

In c6fd1e904cb15da1a627843c79b89b19beabe2a1:

Fixed Oracle GIS gml() test failure introduced by e910340; refs #18757.

Note: See TracTickets for help on using tickets.
Back to Top