Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#14955 closed Uncategorized (fixed)

URLField validation should use HEAD requet instead of GET

Reported by: Claude Paroz Owned by: Jannis Leidel
Component: Core (Other) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When an URLField has verify_exists to True, it checks that the page exists with urrlib2.urlopen. Doing a GET request (urllib2 default) is not necessary whereas a simple HEAD request should be sufficient to know if the page exists.

Attachments (3)

urrlib2_head_request.diff (1013 bytes ) - added by Claude Paroz 13 years ago.
Use HEAD HTTP request instead of GET
urllib2_head_request_2.diff (2.9 KB ) - added by Claude Paroz 13 years ago.
Add fallback to GET
urllib2_head_request_3.diff (3.3 KB ) - added by Claude Paroz 13 years ago.
Put HeadRequest class at root level

Download all attachments as: .zip

Change History (15)

by Claude Paroz, 13 years ago

Attachment: urrlib2_head_request.diff added

Use HEAD HTTP request instead of GET

comment:1 by Claude Paroz, 13 years ago

Note that tests are already done in forms/tests/fields.py (e.g. test_urlfield_3).

comment:2 by rasca, 13 years ago

milestone: 1.3
Triage Stage: UnreviewedReady for checkin

I've applied it cleanly, tested it in a demo app and run the tests OK.
Marking it as RFC.

comment:3 by Graeme, 13 years ago

What about websites that don't allow HEAD request? There are some sites that only allow GET and POST.
How about HEAD first fall back to GET?

comment:4 by Claude Paroz, 13 years ago

Owner: changed from nobody to Claude Paroz
Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Yeah, you are probably right. Even if it adds a supplementary request for each not found resource, a fallback to a GET request is probably more robust.

comment:5 by Luke Plant, 13 years ago

RFC 2616 says that all general purpose servers MUST support both GET and HEAD. However, I don't think it hurts to fall back to GET, but only if the server returns a 501 code or a 405 code.

comment:6 by Claude Paroz, 13 years ago

Owner: Claude Paroz removed
Patch needs improvement: unset

Thanks Luke for the useful insight. The patch now falls back to GET when response code is 405 or 501.

by Claude Paroz, 13 years ago

Attachment: urllib2_head_request_2.diff added

Add fallback to GET

comment:7 by Alex Gaynor, 13 years ago

Class should not be created inside the method; classes are some of the most expensive (memorywise) objects in CPython, and can be slow to gc.

by Claude Paroz, 13 years ago

Attachment: urllib2_head_request_3.diff added

Put HeadRequest class at root level

comment:8 by Claude Paroz, 13 years ago

Thanks Alex for your input. I've modified the patch to get HeadRequest class out of the method.

comment:9 by Jannis Leidel, 13 years ago

Triage Stage: AcceptedReady for checkin

comment:10 by Jannis Leidel, 13 years ago

Owner: set to Jannis Leidel
Resolution: fixed
Status: newclosed

In [15500]:

Fixed #14955 -- Made the URLValidator use a HEAD request when verifying a URL. Thanks, Claude Paroz.

comment:11 by jamstooks <ben.stookey@…>, 13 years ago

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

After thinking about this, I wonder if it's worth falling back to GET for any failed request? I've found a variety of IIS servers running ASP.NET that are responding with 403 errors or simply timing out on HEAD requests. Any thoughts? Should I open another ticket about this?

comment:12 by Jacob, 13 years ago

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top