#2934 closed enhancement (fixed)
[patch] validators.isExistingURL is frequently wrong
| Reported by: | Owned by: | Adrian Holovaty | |
|---|---|---|---|
| Component: | Validators | Version: | 0.95 |
| Severity: | normal | Keywords: | |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
The existing isExistingURL validator uses urllib2's default user agent string, which is commonly rejected by servers.
Similarly, the validator fails if a 301 or 302 is returned, though a 401 is accepted as passing.
I think it's better to claim to support all sorts of responses, allow a configurable user agent (via settings) and accept 301,302 as valid. As a philosophical issue, we could perhaps loop on 301,302, calling it a failure after a certain number of tries, but then you might fall into a cookied tarpit which is valid, but requires a cookie store. Semi-aside, hey, httplib2 is nice.
Sorry, no patch; I'm on 0.91 and can't easily diff w/ trunk. Even so, here's my local isExistingURL:
def isExistingURL(field_data, all_data):
import urllib2
try:
headers = {
"Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
"Accept-Language" : "en-us,en;q=0.5",
"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Connection" : "close",
"User-Agent":URL_FETCH_USER_AGENT
}
req = urllib2.Request(field_data,None, headers)
u = urllib2.urlopen(req)
except ValueError:
raise ValidationError, _("Invalid URL: %s") % field_data
except urllib2.HTTPError, e:
# 401s are valid; they just mean authorization is required.
# 301 and 302 are redirects; they just mean look somewhere else.
if str(e.code) not in ('401','301','302'):
raise ValidationError, _("The URL %s is a broken link.") % field_data
except: # urllib2.URLError, httplib.InvalidURL, etc.
raise ValidationError, _("The URL %s is a broken link.") % field_data
Change History (6)
comment:1 by , 19 years ago
| Summary: | validators.isExistingURL is frequently wrong → [patch] validators.isExistingURL is frequently wrong |
|---|
comment:3 by , 19 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
comment:4 by , 19 years ago
FYI, I had to use a "real" user agent string-- there are, unbelievably, sites out there which throw a 500 if they don't recognize the UA string.
comment:5 by , 19 years ago
Yeah, I know that -- my banking site, for one, which broke when FF2.0 came out -- but I just feel skeezy putting a real UA in the default Django settings.
Claiming it as a patch since applying it is straight-forward.
May I suggest Firefox as the global_settings.py value of URL_FETCH_USER_AGENT?