Opened 8 years ago

Closed 6 years ago

Last modified 6 years ago

#3553 closed (wontfix)

urlpatterns uses urllib.quote/unquote should use urllib.quote/unquote_plus

Reported by: justin-django@… Owned by: graham_king
Component: Core (Other) Version: 0.95
Severity: Keywords: sprintsept14
Cc: Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: UI/UX:

Description

When using the + character to represent spaces in urls it is impossible to differentiate between a literal + and an escaped %2B in the url.

Example

urlpatterns = (r'/search/(?<query>.*)$,app.search)
def search(request, query):
  print "query: "+query

Observed

Visiting http://example.com/search/foo%2B+bar causes django to print foo++bar

Expected

Visitin http://example.com/search/foo%2B+bar should cause django to print foo+ bar

Solution

I assume that django is using urllib.quote/unqoute Changing it to use urllib.quote_plus and urllib.unqoute_plus would produce the expected behavior and not have any side affects

Attachments (1)

3553.diff (6.4 KB) - added by graham_king 6 years ago.
Uses urllib.unquote_plus for dev server, mod_python and fastcgi, and all the tests pass

Download all attachments as: .zip

Change History (13)

comment:1 Changed 8 years ago by SmileyChris

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

Well it would have a side-effect of breaking anyone's existing code if their code relied on the current behaviour...

comment:2 Changed 8 years ago by justin-django@…

probably the best method is to have a init param default to urllib.un/quote and be able to specify any func... (excuse the misterminology python newb)

comment:3 Changed 8 years ago by ubernostrum

  • Component changed from Uncategorized to Core framework
  • Keywords sprintsept14 added

I believe all of this changed with the Unicode merge (since various helper functions had to be written to handle non-ASCII data); can someone confirm whether this is still an issue?

comment:4 Changed 7 years ago by jacob

  • Triage Stage changed from Design decision needed to Accepted

comment:5 Changed 7 years ago by loppear

Verified against trunk r7264, using:

def search(request, query):
    response = HttpResponse()
    response.write("query: %s" % query)
    return response

urlpatterns = patterns('', (r'q/(?P<query>.*)$', search),)

does print "query: foo++bar" for /q/foo%2B+bar

comment:6 Changed 7 years ago by mattmcc

  • milestone set to 1.0
  • Needs tests set

Verified against [8776].

comment:7 Changed 7 years ago by jacob

  • milestone changed from 1.0 to post-1.0

comment:8 Changed 6 years ago by anonymous

  • Owner changed from nobody to anonymous
  • Status changed from new to assigned

comment:9 Changed 6 years ago by graham_king

  • Owner changed from anonymous to graham_king
  • Status changed from assigned to new

comment:10 Changed 6 years ago by graham_king

  • Status changed from new to assigned
  • Triage Stage changed from Accepted to Design decision needed

I've uploaded a patch which solves this, but read on.
On the dev server it's a one liner.

On mod_python and fastcgi, for a path of '/q/foo%2B+bar' the environment hands you '/q/foo++bar' in PATH_INFO. I've had to use REQUEST_URI (req.unparsed_uri in mod_python) instead of PATH_INFO.
If those smart people at mod_python and fastcgi/flup are converting the %2B and + this way, I don't feel comfortable doing it differently. I'm setting this back to 'Design Decision Needed' for advice.

For what it's worth I've tested the mod_python version of this patch on a medium to large work project, and it doesn't seem to cause any problems.
I've just noticed this patch seems to break a whole bunch of unit tests. Oops. Investigating.

Changed 6 years ago by graham_king

Uses urllib.unquote_plus for dev server, mod_python and fastcgi, and all the tests pass

comment:11 Changed 6 years ago by graham_king

  • Resolution set to wontfix
  • Status changed from assigned to closed

Attached a patch that will allow you to do this, but I don't recommend you do. It will probably break mod_rewrite. The answer is don't put the query in the path. Change your URL structure to be /search?q=foo%2B+bar.

+ encodes a space in a _query_ (coming from a form), not in a _path_. In a path %20 should be used. Create a document called 'my test.html' on a local Apache install and try browsing to http://localhost/my+test.html -> 404, replace the + with %20 and it works.

When Apache hands Django the PATH_INFO, the %2B has been correctly converted to a +, and the other plus is left as it because, in the path, it doesn't mean anything special. So foo%2B+bar correctly becomes foo++bar

Look at last.fm, where there is both a band called '44' and one called '+44':
http://www.last.fm/music/44 --> 44
http://www.last.fm/music/+44 --> 44
http://www.last.fm/music/%2B44 --> 44
http://www.last.fm/music/%252B44 --> +44

They had to double encode it, replacing the % with %25. That's what you'll have to do to use + as a character and a space in the path. Otherwise shift the query to the query component or the URL.

comment:12 Changed 6 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top