Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#14955 closed Uncategorized (fixed)

URLField validation should use HEAD requet instead of GET

Reported by: claudep Owned by: jezdez
Component: Core (Other) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When an URLField has verify_exists to True, it checks that the page exists with urrlib2.urlopen. Doing a GET request (urllib2 default) is not necessary whereas a simple HEAD request should be sufficient to know if the page exists.

Attachments (3)

urrlib2_head_request.diff (1013 bytes) - added by claudep 5 years ago.
Use HEAD HTTP request instead of GET
urllib2_head_request_2.diff (2.9 KB) - added by claudep 5 years ago.
Add fallback to GET
urllib2_head_request_3.diff (3.3 KB) - added by claudep 5 years ago.
Put HeadRequest class at root level

Download all attachments as: .zip

Change History (15)

Changed 5 years ago by claudep

Use HEAD HTTP request instead of GET

comment:1 Changed 5 years ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Note that tests are already done in forms/tests/fields.py (e.g. test_urlfield_3).

comment:2 Changed 5 years ago by rasca

  • milestone set to 1.3
  • Triage Stage changed from Unreviewed to Ready for checkin

I've applied it cleanly, tested it in a demo app and run the tests OK.
Marking it as RFC.

comment:3 Changed 5 years ago by Graeme

What about websites that don't allow HEAD request? There are some sites that only allow GET and POST.
How about HEAD first fall back to GET?

comment:4 Changed 5 years ago by claudep

  • Owner changed from nobody to claudep
  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

Yeah, you are probably right. Even if it adds a supplementary request for each not found resource, a fallback to a GET request is probably more robust.

comment:5 Changed 5 years ago by lukeplant

RFC 2616 says that all general purpose servers MUST support both GET and HEAD. However, I don't think it hurts to fall back to GET, but only if the server returns a 501 code or a 405 code.

comment:6 Changed 5 years ago by claudep

  • Owner claudep deleted
  • Patch needs improvement unset

Thanks Luke for the useful insight. The patch now falls back to GET when response code is 405 or 501.

Changed 5 years ago by claudep

Add fallback to GET

comment:7 Changed 5 years ago by Alex

Class should not be created inside the method; classes are some of the most expensive (memorywise) objects in CPython, and can be slow to gc.

Changed 5 years ago by claudep

Put HeadRequest class at root level

comment:8 Changed 5 years ago by claudep

Thanks Alex for your input. I've modified the patch to get HeadRequest class out of the method.

comment:9 Changed 5 years ago by jezdez

  • Triage Stage changed from Accepted to Ready for checkin

comment:10 Changed 5 years ago by jezdez

  • Owner set to jezdez
  • Resolution set to fixed
  • Status changed from new to closed

In [15500]:

Fixed #14955 -- Made the URLValidator use a HEAD request when verifying a URL. Thanks, Claude Paroz.

comment:11 Changed 4 years ago by jamstooks <ben.stookey@…>

  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset

After thinking about this, I wonder if it's worth falling back to GET for any failed request? I've found a variety of IIS servers running ASP.NET that are responding with 403 errors or simply timing out on HEAD requests. Any thoughts? Should I open another ticket about this?

comment:12 Changed 4 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top