Opened 17 years ago
Closed 10 years ago
#5418 closed New feature (wontfix)
Add assertNoBrokenLinks() to test system
Reported by: | Adrian Holovaty | Owned by: | Unai Zalakain |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | feature |
Cc: | absoludity@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
It would be convenient and useful if the test system could automatically check every <a href>
in a rendered page to make sure the links did not cause 404s.
Obviously, for pages with dozens of links, this could take a long time to run. So I suggest an only_internal
keyword argument, which would be True
by default and would cause only the *internal* links (i.e., those without an "https?:") to be checked, thereby using Django's internal URL resolving to check the links instead of going over HTTP.
In the documentation for this feature, we should note that people should ensure none of their pages have side effects, because this test will essentially cause all links to be "clicked on."
Attachments (3)
Change History (35)
comment:1 by , 17 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:4 by , 17 years ago
Status: | new → assigned |
---|
by , 17 years ago
Attachment: | assert_no_broken_links.diff added |
---|
comment:5 by , 17 years ago
I've created a patch for an initial solution, but there's a few questions (see doc string for assertNoBrokenLinks()).
The regexes are pretty ugly too. I've tried to use a little indentation to make it clearer, but guessing this goes against PEP08?
Any feedback appreciated.
comment:6 by , 17 years ago
I haven't reviewed this thoroughly, but line 287
self.fail(u'The URL %s appears to be invalid.')
misses the final % link
comment:7 by , 17 years ago
Thanks Michael - I'll finish it today (and write the docs/tests) - was just keen to know whether the way I was going about it was ok.
comment:8 by , 17 years ago
Cc: | added |
---|---|
Has patch: | set |
Resolution: | → fixed |
Status: | assigned → closed |
I've added a patch with the new assertion and it's own unit tests and docs. For some reason when you click on the attachment, it seems empty, but it's there when you click on the 'raw format' option.
Let me know if there's any improvements to be made!
-Michael
comment:9 by , 17 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
comment:10 by , 17 years ago
Any reason you cannot just use the standard HTMLParser module? It would eliminate all those RE:s, and also get rid of the <a id="">href issue you mentioned. The sample on this page should be helpful, I think:
comment:11 by , 17 years ago
Patch needs improvement: | set |
---|
Yes, this patch should be reworked to use HTMLParser
instead of regexes. Also, please remove the commented-out print
statements in the patch.
comment:12 by , 17 years ago
Keywords: | feature added |
---|
comment:13 by , 17 years ago
Thanks for the hints and the link. Will try to get the changes in today (au-time).
by , 17 years ago
Attachment: | assert_no_broken_links_with_tests_and_doc.diff added |
---|
New assertNoBrokenLinks (using HTMLParser) with regression tests and docs.
comment:14 by , 17 years ago
Patch needs improvement: | unset |
---|
Added new version of assertNoBrokenLinks patch using the HTMLParser, includes regression tests and docs.
I'm not sure what the process is - unchecking patch needs improvement?
comment:15 by , 17 years ago
Patch needs improvement: | set |
---|
Actually, hold off on reviewing... just thought of another test case (handling blank href's <a href=""> as the url template tag fails silently and just leaves a blank href.
Not sure if this should actually be an extra arg: ignore_blank_hrefs=False.
by , 17 years ago
Attachment: | assert_no_broken_links_with_tests_and_doc.2.diff added |
---|
Updated patch that also checks for blank links and internal page links (ie. href="#content")
comment:16 by , 17 years ago
Patch needs improvement: | unset |
---|
OK, I've uploaded a further patch that handles blank hrefs as well as verifying that internal page links (href="#content") are not broken. All with their own unit tests.
I didn't overwrite the previous patch in case you don't want the extra functionality.
comment:17 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
comment:18 by , 14 years ago
milestone: | → 1.3 |
---|---|
Triage Stage: | Accepted → Design decision needed |
It should be noted that I have implemented this as a third party app. It's useful in Continuous integration and as a quick sanity check of your app.
http://github.com/ericholscher/django-crawler
I'm not sure that this should be in Django itself. It seems like it works fine as an external application, and will advance and be more useful outside of core. Setting this to 1.3 so we can make a decision about it.
comment:19 by , 14 years ago
Triage Stage: | Design decision needed → Accepted |
---|
To my mind, crawling is a slightly different activity to what this test assertion is proposing.
Crawling is a process of discovering bad links; it requires that there is a tree of links from one page to another.
Link validation as a test case allows you to set up specific test conditions, and check a specific URL as an acceptance test.
Consider the case of an account activation link in an email -- there won't be a link to that URL (so a crawler shouldn't find it), but it should still be validated for invalid links. The URL for the validation link is something that should be different every time it is generated, so writing a test script to check it would be, at the very least, cumbersome.
I think there is a place for both a crawler and a link verifying test assertion.
comment:21 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → New feature |
comment:22 by , 14 years ago
Easy pickings: | unset |
---|---|
Patch needs improvement: | set |
assert_no_broken_links_with_tests_and_doc.2.diff fails to apply cleanly on to trunk
comment:24 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:25 by , 11 years ago
Pull request issued: https://github.com/django/django/pull/1740
Check for broken links in a rendered page. By default only internal links are
checked, if only_internal
is False
, external links are checked too.
This commit adds the patch, the tests and the docs where the possible side
effects of this test are noted.
comment:26 by , 11 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:27 by , 11 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
Someone other than the patch author needs to review and mark as RFC.
follow-up: 31 comment:29 by , 10 years ago
Four years later, I beg to differ with Russell. I don't believe this is sufficiently useful to be added to Django.
Django recommands to generate all URLs in templates with the {% url %}
. That guarantees that URLs are valid. (Yes, I can imagine edge cases breaking this guarantee, but it holds for non-pathological URL patterns.) Regarding the "email" example, based on my experience, 95% of problems are going to be in the domain name, not in the path, so this system wouldn't be able to catch them.
Considering the ongoing efforts on Unai's pull request, I'm not going to close the ticket wontfix, but that would have been my choice otherwise.
comment:30 by , 10 years ago
I must admit I didn't pick up this issue because I considered it particularly useful but rather because I thought it would be a funny one to work on so I beg you you don't decide whether to accept it or not on the basis of my efforts ;-)
comment:31 by , 10 years ago
Replying to aaugustin:
Four years later, I beg to differ with Russell. I don't believe this is sufficiently useful to be added to Django.
Django recommands to generate all URLs in templates with the
{% url %}
. That guarantees that URLs are valid. (Yes, I can imagine edge cases breaking this guarantee, but it holds for non-pathological URL patterns.) Regarding the "email" example, based on my experience, 95% of problems are going to be in the domain name, not in the path, so this system wouldn't be able to catch them.
That's a good point - not sure why that didn't occur to me at the time. I'm happy to step back from my "accepted" position on this ticket.
comment:32 by , 10 years ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
I'd prefer not to add this as well.
Initial thoughts for feedback.