Opened 16 years ago

Closed 12 years ago

Last modified 11 years ago

#5725 closed Bug (fixed)

Inspectdb makes too long CharFields

Reported by: anonymous Owned by: nobody
Component: Core (Management commands) Version: dev
Severity: Normal Keywords: introspection mysql inspectdb
Cc: rokclimb15@…, django@… Triage Stage: Ready for checkin
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using mysql5.0 and python2.4, the maxlength of a CharField is three times as big as the varchar column's definition says in the table.

Attachments (1)

mysql_varchar_length.diff (2.6 KB ) - added by ferdonline 12 years ago.
Patch fixing the incorrectly detected varchar field length

Download all attachments as: .zip

Change History (16)

comment:1 by Simon G <dev@…>, 16 years ago

Triage Stage: UnreviewedAccepted

Huh. Confirmed on Python 2.4, Mysql 5.0.45, @6851

Models.py says this:

from django.db import models

# Create your models here.
class Fudge(models.Model):
    snork = models.CharField(max_length=10, blank=True)

The table is created in MySQL like so:

mysql> describe t5725_Fudge;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(11)     | NO   | PRI | NULL    | auto_increment | 
| snork | varchar(10) | NO   |     |         |                | 
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.03 sec)

& inspectdb gives this -

class T5725Fudge(models.Model):
    id = models.IntegerField(primary_key=True)
    snork = models.CharField(max_length=30)
    class Meta:
        db_table = u't5725_fudge'

comment:2 by brockweaver@…, 16 years ago

Triage Stage: AcceptedDesign decision needed

Somewhere near line 174 of django/db/backends/mysql/base.py, the charset is hardcoded to 'utf8':

    def _cursor(self, settings):
        if not self._valid_connection():
            kwargs = {
                'conv': django_conversions,
                'charset': 'utf8',  # this bad boy, right here
                'use_unicode': True,
            }

If your collation is not set to this in MySql, it will report the wrong size. In my case, my table is configured to be 'latin1'. Changing the charset to 'latin1' in base.py caused inspectdb to report the correct length. However, that's obviously not a general solution. It would be best to make this caller-configurable (or better yet detected and altered when pulling the description off of the cursor object).

Honestly though, this hardcoded default is a very safe idea. Look at the problems this guy had when converting from a latin1 table to a utf8 table:

http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

comment:3 by brockweaver@…, 16 years ago

Sorry, I meant charset above where I said collation. My bad.

comment:4 by Malcolm Tredinnick, 16 years ago

Triage Stage: Design decision neededAccepted

I think this is a case of "we take patches". If somebody wants to work out how to extract the server side's encoding for each table automatically (and remember that they could be different for each table) and factor that in, go ahead and we'll how it looks. I think we should include a pretty stern warning in the comments of the generated model or something, though, if the encoding isn't a safe one like UTF-8 or UTF-16. Things will go wrong in interesting and difficult to diagnose ways if/when Django passes through Unicode data that cannot be squeezed back into ASCII or Latin-1 or whatever. So tell the inspectdb user of the excitement they're in for in this case and they can make the judgement call.

This should only be done for inspectdb, though. Normal Django code assumes you can store the data you're submitting in the database and it's up to you to ensure that. If your database isn't in UTF-8, that's not our fault.

comment:5 by rokclimb15, 14 years ago

There is definitely something more to this bug.

I am using Python2.6 and MySQL server 5.1.37 with MySQLdb 1.2.2. My database is utf8 and so are my tables. When I introspect, I get the symptoms described above with charfields being 3x too long. If I change the connection charset to latin1, it works properly. I am wondering if there is an additional option that should be passed to the connection to indicate collation, or perhaps just a bug in MySQLdb with cursor.description.

Can anyone else confirm this behavior?

in reply to:  5 comment:6 by Karen Tracey, 14 years ago

Replying to rokclimb15:

Can anyone else confirm this behavior?

Yes. This thread: http://groups.google.com/group/django-users/browse_thread/thread/d1f88e2d3f725e87/ describes what I found when I looked into this.

comment:7 by rokclimb15, 14 years ago

Cc: rokclimb15@… added

comment:8 by Gabriel Hurley, 13 years ago

Component: django-admin.py inspectdbCore (Management commands)

comment:9 by Gabriel Hurley, 13 years ago

Severity: Normal
Type: Bug

comment:10 by Ramiro Morales, 13 years ago

Easy pickings: unset
Keywords: inspectdb added
UI/UX: unset

by ferdonline, 12 years ago

Attachment: mysql_varchar_length.diff added

Patch fixing the incorrectly detected varchar field length

comment:11 by Andrew Godwin, 12 years ago

Triage Stage: AcceptedReady for checkin

comment:12 by jedie, 12 years ago

Cc: django@… added

comment:13 by Claude Paroz <claude@…>, 12 years ago

Resolution: fixed
Status: newclosed

In [879b245baa727090e242c482a8f12ee52f169642]:

Fixed #5725 -- Fixed varchar column size introspection for MySQL

Thanks ferdonline for the initial patch and Karen Tracey for the
related post on django-users.

comment:14 by Mark Jones, 11 years ago

Looks like this one is back in 1.4.2

A similar problem happens with mysql 5.5 where the size is 3x greater than that specified, given the SQL:

CREATE TABLE primaryfields (

fpost int(11) NOT NULL,
index1911 char(36) DEFAULT NULL,
ssn char(11) DEFAULT NULL,
middleinitial char(1) DEFAULT NULL,
firstname char(30) DEFAULT NULL,
lastname char(30) DEFAULT NULL,
employeeid char(10) DEFAULT NULL,
PRIMARY KEY (fpost),
CONSTRAINT primaryfields_ibfk_1 FOREIGN KEY (fpost) REFERENCES filenames (fpost)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

You get the Python:

class Primaryfields(models.Model):

fpost = models.ForeignKey(Filenames, primary_key=True, db_column='fpost')
index1911 = models.CharField(max_length=108, blank=True)
ssn = models.CharField(max_length=33, blank=True)
middleinitial = models.CharField(max_length=3, blank=True)
firstname = models.CharField(max_length=90, blank=True)
lastname = models.CharField(max_length=90, blank=True)
employeeid = models.CharField(max_length=30, blank=True)
class Meta:

db_table = u'primaryfields'

comment:15 by Claude Paroz, 11 years ago

This was not backported to 1.4, so please test with current code (1.5alpha or master).

Note: See TracTickets for help on using tickets.
Back to Top