Opened 6 years ago

Closed 4 years ago

#12165 closed New feature (invalid)

Ability to use SQL functions in queries

Reported by: premalshah Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Normal Keywords: mysql functions
Cc: Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

MySQL has text data type fields like TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT and binary data type fields like TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. Text fields have a character set, while binary fields dont. We store a lot of data in text fields. Recently, during a database audit, it was recommended by MySQL experts to compress the data we store in the text fields and move it into the binary fields since text fields cannot store compressed binary data. There are 2 ways of doing this.

  1. Create a CompressedTextField custom model field. This code works fine. There are other side effects though.
    • MySQL query log now has binary characters. So, if you tailiing or catting it, its not a good experience.
    • You are compressing/uncompressing on the web server. Most people have powerful database servers and less powerful web servers. So, it can hurt performance.

The code is as under.

import zlib

class CompressedField(Field):
    __metaclass__ = SubfieldBase

    def to_python(self, value):
        try:            
            value = zlib.decompress(value[4:])
        except (TypeError, zlib.error):
            return value

    def get_db_prep_value(self, value):
        if not value: return None
        c = zlib.compress(value)
        return buffer(struct.pack('I', len(value)) + c)

    def get_internal_type(self):
        return 'TextField'

    def db_type(self):
        return 'longblob'
  1. MySQL has compress and uncompress functions. http://dev.mysql.com/doc/refman/5.1/en/encryption-functions.html
    COMPRESS(string_to_compress)
    ex: INSERT INTO <table_name> (col1) values (COMPRESS(<string_to_compress>));
    UNCOMPRESS(string_to_uncompress)
    ex: SELECT UNCOMPRESS(<string_to_uncompress>) FROM <table_name>;
    

We can create a custom model field as under.

class BlobField(Field):
    __metaclass__ = SubfieldBase

    def get_internal_type(self):
        return 'TextField'

    def db_type(self):
        return 'longblob'

Then a field can be defined as

class Test(models.Model):
    blog = models.BlobField(input_function='compress', output_function='uncompress')

compress and uncompress are the mysql functions are wrapped around the text or the column name when django constructs the query. This is just one example of database functions that could be used by developers via the Django ORM. Someone might find use for Encrypt and Decrypt or Encode and Decode.

Any thoughts?

Change History (6)

comment:1 Changed 6 years ago by tobias

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

Not sure if it is possible, but have you tried writing a Field that calls compress/uncompress on the database side? It may require diving deeper into the Django code. When doing this you should also think about how it will work for other database backends. Do postgres, sqlite, and oracle have similar methods?

As an alternative, you could probably accomplish what you need with the Model.objects.raw() being developed in #11863.

comment:2 Changed 6 years ago by premalshah

We are using MySQL which does not support user defined data types unlike postgres. In postgres, you could define a user defined field and tell it to use postgress functions when data is saved/retrieved from the database. This is why the responsibility of compress/decompress falls on the query or the application. There are other applications too. The mysql datetime field uses 8 bytes while the timestamp field (which stored the time as a unix timestamp) uses 4 bytes. If you have a few million rows, there is definitely some savings. mysql has functions like FROM_TIMESTAMP and UNIX_TIMESTAMP which convert the timestamps back and forth. It would be powerful to use these functions in the queries while still operating on datetime objects in the python application. Any thoughts?

comment:3 Changed 5 years ago by russellm

  • milestone 1.2 deleted

This is a feature request, so the window for inclusion in 1.2 has closed.

comment:4 Changed 5 years ago by intgr

For the record, PostgreSQL does transparent compression by default for TOAST values (values larger than a few kB)

comment:5 Changed 4 years ago by mattmcc

  • Severity set to Normal
  • Type set to New feature

comment:6 Changed 4 years ago by aaugustin

  • Easy pickings unset
  • Resolution set to invalid
  • Status changed from new to closed
  • UI/UX unset

I'm not convinced by this argument:

Most people have powerful database servers and less powerful web servers.

It's trivial to scale web servers: just put a load balancer and N web servers behind. It isn't as easy for database servers.

On the other hand, compressing and decompressing directly the database is useful if you're doing anything else than storing and retrieving data: lookups, sorting, etc. This is a good reason to look into the second option.

At first sight, it would require extensive changes to the ORM. Per the contributing guide, such feature requests should be sent to django-developers, preferably with a patch demonstrating the concept. For now, given the lack of interest for this ticket, I'm going to close it.

Note: See TracTickets for help on using tickets.
Back to Top