Context Navigation

← Previous Ticket
Next Ticket →

#3553 closed (wontfix)

urlpatterns uses urllib.quote/unquote should use urllib.quote/unquote_plus

Reported by:	justin-django@…	Owned by:	Graham King
Component:	Core (Other)	Version:	0.95
Severity:		Keywords:	sprintsept14
Cc:		Triage Stage:	Design decision needed
Has patch:	no	Needs documentation:	no
Needs tests:	yes	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no
Pull Requests:	How to create a pull request

Description ¶

When using the + character to represent spaces in urls it is impossible to differentiate between a literal + and an escaped %2B in the url.

Example

urlpatterns = (r'/search/(?<query>.*)$,app.search)

def search(request, query):
  print "query: "+query

Observed

Visiting http://example.com/search/foo%2B+bar causes django to print foo++bar

Expected

Visitin http://example.com/search/foo%2B+bar should cause django to print foo+ bar

Solution

I assume that django is using urllib.quote/unqoute Changing it to use urllib.quote_plus and urllib.unqoute_plus would produce the expected behavior and not have any side affects

Attachments (1)

3553.diff (6.4 KB ) - added by Graham King 16 years ago.: Uses urllib.unquote_plus for dev server, mod_python and fastcgi, and all the tests pass

Download all attachments as: .zip

Change History (13)

comment:1 by Chris Beaven, 18 years ago

Triage Stage:	Unreviewed → Design decision needed

Well it would have a side-effect of breaking anyone's existing code if their code relied on the current behaviour...

comment:2 by justin-django@…, 18 years ago

probably the best method is to have a init param default to urllib.un/quote and be able to specify any func... (excuse the misterminology python newb)

comment:3 by James Bennett, 18 years ago

Component:	Uncategorized → Core framework
Keywords:	sprintsept14 added

I believe all of this changed with the Unicode merge (since various helper functions had to be written to handle non-ASCII data); can someone confirm whether this is still an issue?

comment:4 by Jacob, 17 years ago

Triage Stage:	Design decision needed → Accepted

comment:5 by loppear, 17 years ago

Verified against trunk r7264, using:

def search(request, query):
    response = HttpResponse()
    response.write("query: %s" % query)
    return response

urlpatterns = patterns('', (r'q/(?P<query>.*)$', search),)

does print "query: foo++bar" for /q/foo%2B+bar

comment:6 by Matt McClanahan, 17 years ago

milestone:	→ 1.0
Needs tests:	set

Verified against [8776].

comment:7 by Jacob, 17 years ago

milestone:	1.0 → post-1.0

comment:8 by anonymous, 16 years ago

Owner:	changed from nobody to anonymous
Status:	new → assigned

comment:9 by Graham King, 16 years ago

Owner:	changed from anonymous to Graham King
Status:	assigned → new

comment:10 by Graham King, 16 years ago

Status:	new → assigned
Triage Stage:	Accepted → Design decision needed

I've uploaded a patch which solves this, but read on.
On the dev server it's a one liner.

On mod_python and fastcgi, for a path of '/q/foo%2B+bar' the environment hands you '/q/foo++bar' in PATH_INFO. I've had to use REQUEST_URI (req.unparsed_uri in mod_python) instead of PATH_INFO.
If those smart people at mod_python and fastcgi/flup are converting the %2B and + this way, I don't feel comfortable doing it differently. I'm setting this back to 'Design Decision Needed' for advice.

For what it's worth I've tested the mod_python version of this patch on a medium to large work project, and it doesn't seem to cause any problems.
I've just noticed this patch seems to break a whole bunch of unit tests. Oops. Investigating.

by Graham King, 16 years ago

Attachment:	3553.diff added

Uses urllib.unquote_plus for dev server, mod_python and fastcgi, and all the tests pass

comment:11 by Graham King, 16 years ago

Resolution:	→ wontfix
Status:	assigned → closed

Attached a patch that will allow you to do this, but I don't recommend you do. It will probably break mod_rewrite. The answer is don't put the query in the path. Change your URL structure to be /search?q=foo%2B+bar.

+ encodes a space in a _query_ (coming from a form), not in a _path_. In a path %20 should be used. Create a document called 'my test.html' on a local Apache install and try browsing to http://localhost/my+test.html -> 404, replace the + with %20 and it works.

When Apache hands Django the PATH_INFO, the %2B has been correctly converted to a +, and the other plus is left as it because, in the path, it doesn't mean anything special. So foo%2B+bar correctly becomes foo++bar

Look at last.fm, where there is both a band called '44' and one called '+44':
http://www.last.fm/music/44 --> 44
http://www.last.fm/music/+44 --> 44
http://www.last.fm/music/%2B44 --> 44
http://www.last.fm/music/%252B44 --> +44

They had to double encode it, replacing the % with %25. That's what you'll have to do to use + as a character and a space in the path. Otherwise shift the query to the query component or the URL.

comment:12 by (none), 16 years ago

milestone:	post-1.0

Milestone post-1.0 deleted

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues