Opened 5 months ago

Closed 5 months ago

Last modified 5 months ago

#36339 closed Bug (invalid)

BrokenLinkEmailsMiddleware fires when Referer is invalid and URL redirects

Reported by: Xeekoo4u Owned by:
Component: HTTP handling Version: 5.2
Severity: Normal Keywords:
Cc: Claude Paroz, Jake Howard, Florian Apolloner, Aymeric Augustin, HARI KRISHNA KANCHI Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi all,

I discovered some behavior in the BrokenLinkEmailsMiddleware that I don't fully understand: Our Website is scraped by bots on a regular basis and one of the bots likes to use https://my.web.site/wp-admin as the referrer. The bot requests several URLs that do not exist (e.g. /wp-content) which are part of the IGNORABLE_404_URLS and are ignored properly, but it also requests the /login URL that returns a 301 redirect to /login/. For some reason, this request results in an email being sent if the referrer is an invalid URL (even though the actually requested URL does not return a 404).

It confuses me that I cannot find the part of the code that validates whether the referrer is valid or not. I discovered issues from almost a decade ago (https://code.djangoproject.com/ticket/26059, https://code.djangoproject.com/ticket/25971) that face a similar issue, but in their case the referrer was (almost) equal to the URL.

Is there any advice how I could handle this issue? Why is the referrer even validated and why doesn't the validation respect the IGNORABLE_404_URLS? I'd be glad to remove some email spam :)

Change History (3)

comment:1 by Natalia Bidart, 5 months ago

Component: UncategorizedHTTP handling
Resolution: invalid
Status: newclosed
Type: UncategorizedBug

Hello Xeekoo4u, thank you for taking the time to create this ticket. This report is borderline a support request (which is usually better handled in the Django Forum), but before redirecting you there, I invested some time and created a test case showcasing your scenario. This does not necessarily mean the behavior is a bug, since:

  1. You can customize the behavior for BrokenLinkEmailsMiddleware by providing your own middleware and overriding is_ignorable_request.
  2. If your project use django.middleware.common.CommonMiddleware (which I believe your project should), any URL that needs the slash appended will get it appended and the redirect is returned, never hitting the BrokenLinkEmailsMiddleware.process_response method.

To illustrate what I mean, I've created three tests. The first one is the failing test for the scenario that you described: test_referer_invalid_url_redirects. The following two tests, test_referer_invalid_url_redirects_full_request and test_referer_invalid_url_redirects_incomplete_middleware, showcase what I mean regarding the CommonMiddleware:

  • tests/middleware/tests.py

    diff --git a/tests/middleware/tests.py b/tests/middleware/tests.py
    index 2e796ecfc7..a7f3e703e8 100644
    a b class BrokenLinkEmailsMiddlewareTest(SimpleTestCase):  
    485485        BrokenLinkEmailsMiddleware(self.get_response)(self.req)
    486486        self.assertEqual(len(mail.outbox), 1)
    487487
     488    @override_settings(APPEND_SLASH=True)
     489    def test_referer_invalid_url_redirects(self):
     490        self.req.path = self.req.path_info = "/login"
     491        self.req.META["HTTP_REFERER"] = "https://my.web.site/wp-admin"
     492        BrokenLinkEmailsMiddleware(self.get_response)(self.req)
     493        self.assertEqual(len(mail.outbox), 0)
     494
     495    @override_settings(APPEND_SLASH=True, ROOT_URLCONF="middleware.urls")
     496    def test_referer_invalid_url_redirects_full_request(self):
     497        referer = "https://my.web.site/wp-admin"
     498        for url, status_code in [("/slash/", 200), ("/slash", 301)]:
     499            with self.subTest(url=url, status_code=status_code):
     500                response = self.client.get(url, HTTP_REFERER=referer)
     501                self.assertEqual(len(mail.outbox), 0)
     502                self.assertEqual(response.status_code, status_code)
     503
     504    @override_settings(
     505        APPEND_SLASH=True,
     506        ROOT_URLCONF="middleware.urls",
     507        MIDDLEWARE=["django.middleware.common.BrokenLinkEmailsMiddleware"],
     508    )
     509    def test_referer_invalid_url_redirects_incomplete_middleware(self):
     510        referer = "https://my.web.site/wp-admin"
     511        response = self.client.get("/slash", HTTP_REFERER=referer)
     512        self.assertEqual(len(mail.outbox), 0)
     513        self.assertEqual(response.status_code, 301)
     514
    488515
    489516@override_settings(ROOT_URLCONF="middleware.cond_get_urls")
    490517class ConditionalGetMiddlewareTest(SimpleTestCase):

As expected, test_referer_invalid_url_redirects and test_referer_invalid_url_redirects_incomplete_middleware fail but test_referer_invalid_url_redirects_full_request passes. Given this, I will close this ticket as invalid since I'm not sure we want to support the (potentially) niche use case of using BrokenLinkEmailsMiddleware without CommonMiddleware. OTOH, if your project is using CommonMiddleware properly, and you are still affected by this issue, please reopen providing a way to reproduce (either a test case, or a minimal Django test project).

comment:2 by Natalia Bidart, 5 months ago

Cc: Claude Paroz Jake Howard Florian Apolloner Aymeric Augustin HARI KRISHNA KANCHI added

Adding some folks as CC to get their input.

comment:3 by Jake Howard, 5 months ago

I also find this issue strange. If you can reliably reproduce the issue Xeekoo4u, It'd be great if you can try and diagnose what's going on with BrokenLinkEmailsMiddleware yourself to point us in the right direction. As Natalia says, the forum might also be a good place to get help on this before circling back to this issue with some more context. Alternatively, if you can get the issue working in a minimal project, that'd be helpful too.

The part which makes me think there's something else going on here (rather than it being merely a bug with BrokenLinkEmailsMiddleware) is the if response.status_code == 404 on the first line of the middleware, which ought to avoid any issue like this.

Note: See TracTickets for help on using tickets.
Back to Top