Opened 2 months ago

Last modified 7 weeks ago

#35607 assigned Cleanup/optimization

Improve Storage base backend API flexibility to allow filename validation to be overridden safely

Reported by: Natalia Bidart Owned by: Natalia Bidart
Component: File uploads/storage Version:
Severity: Normal Keywords: storages
Cc: Josh Schneier, Shai Berger Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently, Django's Storage base backend API provides limited flexibility with regards to customization, particularly concerning filename validation. This rigidity has historically led to challenges and security concerns. See for example CVE-2024-39330, CVE-2021-45452, and CVE-2021-31542.

To address this, I'm proposing revisiting and enhancing the public API of storage backends to support customizable validation methods. This was also discussed internally in the Django Security mailing list. This public validation method would provide a default implementation very similar to the current validations, and should be used to replace the ad-hoc validations being done in save, generate_filename, and get_available_name.

Change History (6)

comment:1 by Natalia Bidart, 2 months ago

To add some extra context and details, below some extracts of conversations held while discussing the security reports around Storage API:

From Shai:

[...] I think the validation itself should be overridable, but I feel weary about adding an API in a security release; can we add it as "internal" in the security release, and document it (making it public) later?

From Josh Schneier:

Untangling the various ways that validation was peppered through the code base in a couple of competing ways led me to drafting a local change that unified them all. I’m happy to hear you’re taking a stab at that, shows my head was in the right place.
I think you are already on top of get_available_name and generate_filename (which uses validate_file_name) but want to also call your attention to get_valid_name which is also part of the publicly overridable API. Currently it calls into get_valid_filename which has yet another implementation of some of the protection logic, it also appears to be its only used place in the code base.

From me:

At a high level, I believe the core challenges to resolve are:

  1. The entanglement and tight coupling of file name and file path validations in the code.
  2. The scattering of validations across multiple methods and files.

To fix 1, I think we should take path validation out of validate_file_name and put it in a dedicated validate_file_path. This makes the allow_relative_path switch disappear. Backward compatibility is an issue with this plan though.
There is one more change that I considered but postponed, which is potentially removing the call to validate_file_name in FileField.generate_filename[1] since this method calls it's storage's generate_filename which would *also* call validate_file_name (at least in its default implementation). My main concern here is how often/likely is to have custom storages overriding generate_filename without validating the given name
[1] https://github.com/django/django/blob/main/django/db/models/fields/files.py#L336

Outcome up to this point: we did evaluate adding a public and overridable method to do validation in the recent security release but we decided not to, to keep the patch small and self contained, with the counterpart of having to duplicate file name validation in various places.

I'll be proposing a PR (no logic change) to land in main to extend test coverage in preparation for future refactoring.

comment:2 by Sarah Boyce, 2 months ago

Cc: Josh Schneier Shai Berger added
Component: Core (Other)File uploads/storage
Summary: Improve Storage base backend API FlexibilityImprove Storage base backend API flexibility to allow filename validation to be overridden safely
Triage Stage: UnreviewedAccepted

comment:3 by Mohammad Salehi, 8 weeks ago

Owner: set to Mohammad Salehi
Status: newassigned

comment:4 by Mohammad Salehi, 7 weeks ago

Last edited 7 weeks ago by Mohammad Salehi (previous) (diff)

comment:5 by Mohammad Salehi, 7 weeks ago

Replying to Natalia Bidart:

I want to write these tests for the functions get_available_name, generate_filename, get_valid_name, and validate_file_name:

1- Where there is double dots in filename-> "../file.ext"
2_ If there is an unusual character in the filename, include one of the following characters: @ # $ % & * ( ) + = { } | \ / : ; ' " < > ? for exapmle ->
"file@…"
3_ Where there is long filename -> "a_very_very_very_long_filename_that_exceeds_the_usual_limits_of_a_filesystem.ext"
4_ Where there is filename without extension -> "filename"
5_ Where there is filename with hidden extension -> "filename."

Is this ok? Natalia Bidart[]

Version 0, edited 7 weeks ago by Mohammad Salehi (next)

comment:6 by Natalia Bidart, 7 weeks ago

Owner: changed from Mohammad Salehi to Natalia Bidart

Hello Mohammad Salehi! Thank you for your interest in helping with this ticket!

I have been working with and thinking about the Storages API for a while, and I have a few branches already with half way of a refactor. My advice would be to pause a bit on this work until I can clean up my work and push it. I will assign this ticket to myself to better signal the fact that I have started this work some time ago. Thanks again!

Note: See TracTickets for help on using tickets.
Back to Top