Opened 9 years ago

Closed 2 years ago

Last modified 3 weeks ago

#9893 closed Bug (fixed)

Filename + path length greater than 100 truncated on database insertion in Core.Storage

Reported by: Refefer Owned by: Pavel Shpilev
Component: File uploads/storage Version: master
Severity: Normal Keywords: storage, filename
Cc: drmeers@…, jeroen@…, walter+django@…, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no


In, the storage object doesn't check the length of filename + upload_to to ensure that it's less than 100 characters. If the path length is greater than 100, it is truncated to 100 characters when inserted into the database. With filename collision mitigation appending an '_' to the end of the filename, popular filenames can easily reach lengths that exceed the size of the.

To reproduce the issue, try uploading a file/image with a length over 100 characters.

Here's some quick hackery that attempts truncate the filename. Note that it assumes the upload_to length to be less than 30 characters long. Also note that this should probably be divided up into a couple of different methods and this is more to get the ball rolling than anything.

    def get_available_name(self, name):
        Returns a filename that's free on the target storage system, and
        available for new content to be written to.
        # If the filename already exists, append an incrementing integer
        # to the file until the filename doesn't exist.
        dir_name, file_name = os.path.split(name)
        flength = len(file_name)
	if flength > 70: # If filenameis longer than 70, truncate filename
		offset = flength - (flength % 40 + 20) # modulus of file name + 20 to prevent file type truncation
		file_name = file_name[offset:]
		name = os.path.join(dir_name, file_name)

        if self.exists(name): # filename exists, get dot index
                dot_index = file_name.rindex('.')
            except ValueError: # filename has no dot
                dot_index = -1

            inc = 0 # Set incrementer to zero
            while self.exists(name):
                inc += 1
                if dot_index == -1: # If no dot, append to end
                    tname = file_name + str(inc)
                    tname = file_name[:dot_index] + str(inc) + file_name[dot_index:]
                name = os.path.join(dir_name, tname)

        return name

Attachments (1)

filefield_length_validation.diff (1.7 KB) - added by Carson Gee 7 years ago.

Download all attachments as: .zip

Change History (48)

comment:1 Changed 8 years ago by wwinham

Wouldn't it be easy to change the length of the field to > 100 characters? It's not like increasing the space to 255 would hurt anybody and it would surely make the problem go away for a huge percentage of file names. I just ran in to this myself.

comment:2 Changed 8 years ago by (none)

milestone: post-1.0

Milestone post-1.0 deleted

comment:3 Changed 8 years ago by Jacob

milestone: 1.1
Triage Stage: UnreviewedAccepted

comment:4 Changed 8 years ago by Antti Kaihola

I ran into this issue when using the django-attachments re-usable app, but in my case the file name wasn't truncated – an exception was thrown instead. I'm storing files in subdirectories named after the user who uploads (max 30 characters), and apparently one of my users was uploading a file with a really long name so len("attachments/<username>/<filename.ext>") exceeded 100 characters.

I wonder why an exception happened instead of truncation. I'm using PostgreSQL, maybe that's significant?

Here's the relevant part of the traceback:

  File "attachments/attachments/", line 137, in save
    super(Attachment, self).save(force_insert, force_update)

  File "django/db/models/", line 408, in save
    self.save_base(force_insert=force_insert, force_update=force_update)

  File "django/db/models/", line 484, in save_base
    result = manager._insert(values, return_id=update_pk)

  File "django/db/models/", line 177, in _insert
    return insert_query(self.model, values, **kwargs)

  File "django/db/models/", line 1035, in insert_query
    return query.execute_sql(return_id)

  File "django/db/models/sql/", line 320, in execute_sql
    cursor = super(InsertQuery, self).execute_sql(None)

  File "django/db/models/sql/", line 2290, in execute_sql
    cursor.execute(sql, params)

DataError: value too long for type character varying(100)

comment:5 Changed 8 years ago by Antti Kaihola

Issues to consider:

Smart truncation can't be done in the storage object
since it doesn't know about the max_length= of the FileField.
The maximum length can be set to a different value than 100 in the model.

One possibility is to extend the signature of
Storage.get_valid_name() and .get_available_name() to accept
a maximum file name length argument, but I assume that's sub-optimal.

Truncation should avoid collisions with existing file names.
Simply shortening the file name isn't sufficient,
but something similar to what VFAT did with its FILENA~1.EXT
8.3-character filename representations should be used.
Storage.exists() can be called to check for collisions.

The upload_to= kwarg of FileField can be either

  • a string specifying the target directory, or
  • a callable which replaces FileField.generate_filename and returns the path to the file to be saved.

In case of a callable, do we leave length checking as responsibility of the callable,
or should we prepare to truncate the generated path?

Truncation could be done in FileField.generate_filename(),
but how to ensure the generated filename is still valid for the storage?
This is an issue if a numbering scheme must be used to avoid file name collisions.

What if the directory is already too long without the actual file name?
This shouldn't be an issue when upload_to= is a string,
but a callable might generate a long path if not coded carefully.

comment:6 Changed 8 years ago by wwinham

What are the drawbacks to just making a FileField use Text instead of varchar in the database? It seems like a pretty simple solution that would completely resolve the issue.

comment:7 in reply to:  6 Changed 8 years ago by Antti Kaihola

Replying to wwinham:

What are the drawbacks to just making a FileField use Text instead of varchar in the database? It seems like a pretty simple solution that would completely resolve the issue.

I started a thread with your question on the django-developers list.

comment:8 Changed 8 years ago by Jacob

milestone: 1.11.2

I think ultimately akaihola's idea of modifying the get_valid_name and get_available_name to accept a max length is the right way to go. Using a text field is a bad idea for a bunch of reason -- speed, storage size, compatibility with Oracle -- but any hardcoded length could possibly be too short.

At this point past feature-freeze, though, changing that signature could cause too much heartache. I'm going to punt this to 1.2; for now, it's a simple workaround: use FileField(max_length=SOMETHING_BIGGER).

comment:9 Changed 8 years ago by Chris Beaven

Has patch: unset

comment:10 Changed 8 years ago by Simon Meers

Cc: drmeers@… added

comment:11 Changed 8 years ago by j.c.sackett

Owner: changed from nobody to j.c.sackett
Status: newassigned

comment:12 Changed 8 years ago by j.c.sackett

Owner: changed from j.c.sackett to nobody
Status: assignednew

comment:13 Changed 7 years ago by streetcleaner

Owner: changed from nobody to streetcleaner
Status: newassigned

comment:14 Changed 7 years ago by Antti Kaihola


To elaborate the changes needed:

  • the following methods need a max_length=100 kwarg:
    • Storage.get_valid_name
    • Storage.get_available_name
    • FileSystemStorage._save
  • FileField.get_filename needs to call Storage.get_valid_name with the field's max_length
  • needs to call with the field's max_length
  • needs to pass that value to self._save
  • FileSystemStorage._save needs to pass it on to self.get_available_name

In addition, shouldn't a smart filename truncation / collision prevention mechanism be put in place anyway? As pointed out in the ticket description, appending underscores will cause collisions if truncation happens.

I propose appending an underscore and a counter integer to the first part of a dot-separated filename. The first part should be truncated before appending to ensure that no truncation will happen at the end. Examples (with a max_length of 36):

  • /long/path/long-file-name.ext1.ext2 (35 characters)
  • /long/path/long-file-nam_1.ext1.ext2 (36 characters; first part truncated)
  • /long/path/long-file-nam_2.ext1.ext2 (36 chars) and so on until...
  • /long/path/long-file-na_10.ext1.ext2 (36 chars; first part further truncated)

This solution assumes that storage backends which don't allow underscores or digits in file names will implement their own get_available_name (just like the same is currently assumed for underscores only).

It doesn't solve the case when the length of the directory path approaches max_length and causes even truncated file names to overflow.

streetcleaner, do you intend to write a patch? If not, I volunteer to do that.

comment:15 Changed 7 years ago by Adam Nelson

#10149 is a related ticket.

comment:16 Changed 7 years ago by Karen Tracey

milestone: 1.2

With no concrete patch at this point, I don't see this having a chance to make it into 1.2.

comment:17 in reply to:  6 Changed 7 years ago by Antti Kaihola

Replying to wwinham:

What are the drawbacks to just making a FileField use Text instead of varchar in the database? It seems like a pretty simple solution that would completely resolve the issue.

Another point from Shai Berger in the googlegroups thread mentioned earlier:

In many engines, text fields are kept out-of-table; then, what's kept in the
table, is (effectively) the name of a file where the text is kept. Even if
this doesn't seriously affect performance, using such a field to keep a
filename must raise a few eyebrows.

Changed 7 years ago by Carson Gee

comment:18 Changed 7 years ago by Carson Gee

I just ran into this problem last week and decided to try and write a patch. The problem was reported before model validation was incorporated, so it seemed like an easy fix to just add a validate method to the FileField model class now that those are validated. The patch just runs generate_filename to get the full length of the path that will be stored in the database and ensure it is less than max_length. Be kind, it's my first attempt at a patch.

comment:19 Changed 7 years ago by Carson Gee

Has patch: set

comment:20 Changed 6 years ago by Chris Beaven

Severity: Normal
Type: Bug

comment:21 Changed 6 years ago by Chris Beaven

Needs tests: set

I'd be nice if there was also some way of passing the maximum length allowed to the storage engine too so when it generates a unique name, it can truncate it down.

But even without that, this is a worthy addition.

comment:22 Changed 5 years ago by Aymeric Augustin

UI/UX: unset

Change UI/UX from NULL to False.

comment:23 Changed 5 years ago by Aymeric Augustin

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:24 Changed 5 years ago by Jeroen Dekkers

Cc: jeroen@… added

comment:25 Changed 5 years ago by Aymeric Augustin

Owner: changed from streetcleaner to Aymeric Augustin
Status: assignednew

comment:26 Changed 5 years ago by Aymeric Augustin <aymeric.augustin@…>

Resolution: fixed
Status: newclosed

In [dcd4383107d96c18bcb53312ca4de070374b334c]:

Fixed #9893 -- Validated the length of file names

after the full file name is generated by the storage class.

Thanks Refefer for the report, carsongee for the patch, and
everyone else involved in the discussion.

comment:27 Changed 5 years ago by Claude Paroz <claude@…>

In [05d333ba3bb16af024c11966d2072de38fe9f82f]:

Fixed #18515 -- Conditionally regenerated filename in FileField validation

When a FileField value has been saved, a new validation should not
regenerate a new filename when checking the length. Refs #9893.

comment:28 Changed 4 years ago by Aymeric Augustin

Resolution: fixed
Status: closednew

I'm going to revert that commit because of the regression reported in #19525.

Here are the relevant bits of that ticket:

As of dcd43831 (fixing #9893), a FileField will call generate_filename() as part of the validation step for a FileField on a form. This was then updated in 05d333ba to address #18515.

Unfortunately, this means that the filename function provided as an argument to upload_to can no longer reference any field with a pre-save behavior.

The common use case for this is to organize files on disk according to upload date. For example:

def user_filename(instance, filename):
    return os.path.join('user_files', instance.uploaded_timestamp.strftime('%Y-%m-%d'), filename)

class UserFile(models.Model):
    uploaded_timestamp = models.DateTimeField(auto_now_add=True)
    data = models.FileField(upload_to=user_filename)

Under Django 1.5, attempting to call is_valid() on a Modelform for this model will raise a "'NoneType' object has no attribute 'strftime'" exception, because instance.uploaded_timestamp hasn't been instantiated yet. This is despite the fact that the uploaded data has been provided, the generated filename would be valid, and the upload timestamp can be computed.

In Django 1.4 and earlier, this works because no validation was performed for FileFields filenames; the uploaded_timestamp was evaluated as part of the model pre-save, and the persistence of the file to disk occurred after the model was saved.

To my reading, the documentation is ambiguous on whether this is expected behavior or not. It says that the model may not be saved to the database yet, but points at AutoField as the cause for problems. However, it also explicitly talks about using strftime as part of file paths. A file datetimes of 'now' would seem to be an obvious usage of this feature.

For the record, I discovered this by upgrading a commercial project to Django 1.5, so there is at least one project in the wild that will be affected by this change. Although I've discovered it with a auto_now_add FileField, it's not hard to see that this change also affects any field with a pre_save behaviour.

It also has the potential to lead to incorrect validation. Consider the case of a field with a pre_save behavior that updates the field (auto_now is one example, but any denormalization/summary field would be an example). The call to validate occurs *before* the call to pre_save is made, which means that you're going to get the pre_save value used as part of your validation. If you then save the model, the pre_save() will be called, and the actual filename that is used for saving the file will be different to the one used for validation.

Some initial thoughts about possible solutions:

  • Document that you can't use a field with pre-save behaviour. Not ideal IMHO, since it rules out an obvious use case for upload_to.
  • Roll back the fix. Also less than ideal; #9893 is an edge case bug, but it's something that has been seen in the wild, and isn't *too* hard to generate.
  • Invoke pre_save on all model fields prior to validation. Given that most validation doesn't need this, this approach seems a little excessive.

I've just checked with my production code, and yes, works for the auto_now_add case. However, it won't address auto_now, or the pre_save conflict problem.

comment:29 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

In db278c3bf9177043c42a9ed8b529a5c117938460:

Fixed #19525 -- Reverted dcd4383107 and 05d333ba3b.

Refs #9893, #18515.

Thanks Russell for the report.

comment:30 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

In 3cb87ec605fb9e7785aa0580bd8220797b622e0c:

[1.5.x] Fixed #19525 -- Reverted dcd4383107 and 05d333ba3b.

Refs #9893, #18515.

Thanks Russell for the report.

Backport of db278c3 from master.

comment:31 Changed 4 years ago by Preston Holmes <preston@…>

In 0e431e5dd7ed19aa2119ceba9ebed050c2988844:

[1.5.x] Fixed #19525 -- Reverted dcd4383107 and 05d333ba3b.

Refs #9893, #18515.

Thanks Russell for the report.

Backport of db278c3 from master.

comment:32 Changed 4 years ago by Aymeric Augustin

Status: newassigned

comment:33 Changed 4 years ago by Aymeric Augustin

Owner: Aymeric Augustin deleted
Status: assignednew

I don't know how to solve this properly.

comment:34 Changed 4 years ago by Aymeric Augustin

Component: Core (Other)File uploads/storage

comment:35 Changed 4 years ago by Walter Doekes

Cc: walter+django@… added

comment:36 Changed 4 years ago by Walter Doekes

So, I ran into bug #13314. These are all similar but not quite the same.

#9893 -- Filename + path length greater than 100 (when making filenames unique)
#10410 -- FileField saves one filename on disk and another on DB
#13314 -- validation does not account for "upload_to" when counting characters

This ticket #9893 is about the changed filenames when making them unique.

Ticket #13314 is just about the upload_to parameter. For that problem the fix could be done like this:

--- django/db/models/fields/	2013-04-01 17:03:59.752332630 +0200
+++ django/db/models/fields/	2013-04-01 17:08:15.833870239 +0200
@@ -290,6 +290,8 @@
         if 'initial' in kwargs:
             defaults['required'] = False
+        # And deduct the upload_to length from the max_length.
+        defaults['max_length'] -= len(self.upload_to)
         return super(FileField, self).formfield(**defaults)
 class ImageFileDescriptor(FileDescriptor):

With this environment:

  • ImageField with max_length default 100
  • An upload_to value of "profile/" (8 characters)

Without the fix, I get a DatabaseError for a 99-byte length filename.
With the fix, I get a nice ValidationError until I reduce the filename to 92 bytes.

That should be at least be one less problem, right?

This was tested with Django 1.3.7.

comment:37 Changed 3 years ago by Pavel Shpilev

Owner: set to Pavel Shpilev
Status: newassigned

comment:38 Changed 3 years ago by Pavel Shpilev

Hey guys
Long time no activity on the ticket...
I have recently run into this bug on production myself. We needed a solution ASAP, so I ended up with an overwrite for get_available_name(), providing max_filename_length as an argument. Pretty much what @akaihola first suggested in Seem to be working fine for us. I can try implement this as a general solution now, if you think this would be a proper fix.

comment:39 Changed 3 years ago by Simon Charette

Cc: Simon Charette added
Needs documentation: set
Patch needs improvement: set

pavel_shpilev put together a PR based on ticket:9893#comment:1 that still requires some adjustments.

comment:40 Changed 3 years ago by Collin Anderson

Patch needs improvement: unset

Pavel Shpilev wants another review.

comment:41 Changed 3 years ago by Berker Peksag

Needs documentation: unset
Needs tests: unset
Triage Stage: AcceptedReady for checkin

PR #3369 LGTM.

comment:42 Changed 3 years ago by Tim Graham

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Comment for improvement is on the PR.

comment:43 Changed 2 years ago by Pavel Shpilev

Triage Stage: AcceptedReady for checkin

comment:44 Changed 2 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In a7c256cb5491bf2a77abdff01638239db5bfd9d5:

Fixed #9893 -- Allowed using a field's max_length in the Storage.

comment:45 Changed 21 months ago by Tim Graham <timograham@…>

In 1bb6ecf:

Refs #9893 -- Removed shims for lack of max_length support in file storage per deprecation timeline.

comment:46 Changed 3 weeks ago by Jan Geboers


could it be that this bug still exists?
Both on Django 1.8.18 and 1.11.2 I see this behavior isn't corrected yet.
The validation of max_length takes only file_name into account and ignores upload_to, causing truncation on the database level, exactly as described in

When I check out the relevant code I strongly suspect this bug was never fixed:

comment:47 Changed 3 weeks ago by Tim Graham

#13314 is reopened per the last comment.

Note: See TracTickets for help on using tickets.
Back to Top