Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

#14955 closed Uncategorized (fixed)

URLField validation should use HEAD requet instead of GET

Reported by: Claude Paroz Owned by: Jannis Leidel
Component: Core (Other) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When an URLField has verify_exists to True, it checks that the page exists with urrlib2.urlopen. Doing a GET request (urllib2 default) is not necessary whereas a simple HEAD request should be sufficient to know if the page exists.

Attachments (3)

urrlib2_head_request.diff (1013 bytes) - added by Claude Paroz 6 years ago.
Use HEAD HTTP request instead of GET
urllib2_head_request_2.diff (2.9 KB) - added by Claude Paroz 6 years ago.
Add fallback to GET
urllib2_head_request_3.diff (3.3 KB) - added by Claude Paroz 6 years ago.
Put HeadRequest class at root level

Download all attachments as: .zip

Change History (15)

Changed 6 years ago by Claude Paroz

Attachment: urrlib2_head_request.diff added

Use HEAD HTTP request instead of GET

comment:1 Changed 6 years ago by Claude Paroz

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

Note that tests are already done in forms/tests/fields.py (e.g. test_urlfield_3).

comment:2 Changed 6 years ago by rasca

milestone: 1.3
Triage Stage: UnreviewedReady for checkin

I've applied it cleanly, tested it in a demo app and run the tests OK.
Marking it as RFC.

comment:3 Changed 6 years ago by Graeme

What about websites that don't allow HEAD request? There are some sites that only allow GET and POST.
How about HEAD first fall back to GET?

comment:4 Changed 6 years ago by Claude Paroz

Owner: changed from nobody to Claude Paroz
Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Yeah, you are probably right. Even if it adds a supplementary request for each not found resource, a fallback to a GET request is probably more robust.

comment:5 Changed 6 years ago by Luke Plant

RFC 2616 says that all general purpose servers MUST support both GET and HEAD. However, I don't think it hurts to fall back to GET, but only if the server returns a 501 code or a 405 code.

comment:6 Changed 6 years ago by Claude Paroz

Owner: Claude Paroz deleted
Patch needs improvement: unset

Thanks Luke for the useful insight. The patch now falls back to GET when response code is 405 or 501.

Changed 6 years ago by Claude Paroz

Attachment: urllib2_head_request_2.diff added

Add fallback to GET

comment:7 Changed 6 years ago by Alex Gaynor

Class should not be created inside the method; classes are some of the most expensive (memorywise) objects in CPython, and can be slow to gc.

Changed 6 years ago by Claude Paroz

Attachment: urllib2_head_request_3.diff added

Put HeadRequest class at root level

comment:8 Changed 6 years ago by Claude Paroz

Thanks Alex for your input. I've modified the patch to get HeadRequest class out of the method.

comment:9 Changed 6 years ago by Jannis Leidel

Triage Stage: AcceptedReady for checkin

comment:10 Changed 6 years ago by Jannis Leidel

Owner: set to Jannis Leidel
Resolution: fixed
Status: newclosed

In [15500]:

Fixed #14955 -- Made the URLValidator use a HEAD request when verifying a URL. Thanks, Claude Paroz.

comment:11 Changed 5 years ago by jamstooks <ben.stookey@…>

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

After thinking about this, I wonder if it's worth falling back to GET for any failed request? I've found a variety of IIS servers running ASP.NET that are responding with 403 errors or simply timing out on HEAD requests. Any thoughts? Should I open another ticket about this?

comment:12 Changed 5 years ago by Jacob

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top