#2934 closed enhancement (fixed)
[patch] validators.isExistingURL is frequently wrong
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | Validators | Version: | 0.95 |
Severity: | normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The existing isExistingURL validator uses urllib2's default user agent string, which is commonly rejected by servers.
Similarly, the validator fails if a 301 or 302 is returned, though a 401 is accepted as passing.
I think it's better to claim to support all sorts of responses, allow a configurable user agent (via settings) and accept 301,302 as valid. As a philosophical issue, we could perhaps loop on 301,302, calling it a failure after a certain number of tries, but then you might fall into a cookied tarpit which is valid, but requires a cookie store. Semi-aside, hey, httplib2 is nice.
Sorry, no patch; I'm on 0.91 and can't easily diff w/ trunk. Even so, here's my local isExistingURL:
def isExistingURL(field_data, all_data): import urllib2 try: headers = { "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5", "Accept-Language" : "en-us,en;q=0.5", "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Connection" : "close", "User-Agent":URL_FETCH_USER_AGENT } req = urllib2.Request(field_data,None, headers) u = urllib2.urlopen(req) except ValueError: raise ValidationError, _("Invalid URL: %s") % field_data except urllib2.HTTPError, e: # 401s are valid; they just mean authorization is required. # 301 and 302 are redirects; they just mean look somewhere else. if str(e.code) not in ('401','301','302'): raise ValidationError, _("The URL %s is a broken link.") % field_data except: # urllib2.URLError, httplib.InvalidURL, etc. raise ValidationError, _("The URL %s is a broken link.") % field_data
Change History (6)
comment:1 by , 18 years ago
Summary: | validators.isExistingURL is frequently wrong → [patch] validators.isExistingURL is frequently wrong |
---|
comment:3 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:4 by , 18 years ago
FYI, I had to use a "real" user agent string-- there are, unbelievably, sites out there which throw a 500 if they don't recognize the UA string.
comment:5 by , 18 years ago
Yeah, I know that -- my banking site, for one, which broke when FF2.0 came out -- but I just feel skeezy putting a real UA in the default Django settings.
Claiming it as a patch since applying it is straight-forward.
May I suggest Firefox as the global_settings.py value of URL_FETCH_USER_AGENT?