Opened 5 years ago

Last modified 3 weeks ago

#32723 assigned Cleanup/optimization

Add a GitHub action to run the Sphinx linkcheck builder.

Reported by: Nick Pope Owned by: Sarah Abderemane
Component: Documentation Version: dev
Severity: Normal Keywords: linkcheck, github, action
Cc: Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

This is a follow on step from #32720. (Migrated to this new ticket as suggested by Mariusz.)

Add a scheduled GitHub action to check for broken links, or redirects that could be simplified, on a weekly/monthly basis.

This would need to wait for sphinx-doc/sphinx#6525 to be addressed so that we can treat desired redirections as "working" links instead of "redirected", e.g. https://docs.djangoproject.com/en/stable/https://docs.djangoproject.com/en/3.2/.

The linkcheck builder generates docs/_build/linkcheck/output.{json,txt} which could be filtered and attached as an artifact from the GitHub action to make it easier to provide a report on what needs fixing.

Change History (15)

comment:1 by Nick Pope, 5 years ago

Triage Stage: UnreviewedSomeday/Maybe

comment:2 by Sarah Abderemane, 5 years ago

Owner: changed from nobody to Sarah Abderemane
Status: newassigned

comment:3 by Sarah Abderemane, 5 years ago

sphinx-doc/sphinx#6525 has been closed and merged in the last release of sphinx. Does the triage stage will be updated to accepted according to this ?

in reply to:  3 ; comment:4 by Nick Pope, 5 years ago

Replying to Sarah Abderemane:

sphinx-doc/sphinx#6525 has been closed and merged in the last release of sphinx. Does the triage stage will be updated to accepted according to this ?

I'll need to have a look as the entire proposal wasn't implemented with some work being left for a follow-up.

We'll also need to wait for any changes to be released.

in reply to:  4 comment:5 by Sarah Abderemane, 5 years ago

Nick Pope any news on this subject ?

comment:6 by Jacob Walls, 5 years ago

Hey Sarah. Took a look myself. I believe Nick was hoping for changes in Sphinx to allow regexes like these like these with backreferences, but testing against the latest version of Sphinx, they aren't supported:

WARNING: Failed to compile regex in linkcheck_allowed_redirects: '^https://\\1\\.readthedocs\\.io/[-a-z]+/(?:master|latest|stable)/$' invalid group reference 1
WARNING: Failed to compile regex in linkcheck_allowed_redirects: '^https://docs\\.djangoproject\\.com/\\1/\\d+\\.\\d+/' invalid group reference 1

However the released Sphinx changes would allow regexes without the backreferences. So if we were to get started now with implementing this in Django, we would, say, in the case of the readthedocs.io links, either need to use looser regexes (possibly allowing one readthedocs site to redirect to another readthedocs site, etc) or need to write regexes for each readthedocs site (ouch?)

I don't have a strong opinion, but I would say the former option (just remove the backreferences and allow looser matches) seems tolerable. We're perfectionists who don't mind using imperfect tools, right? :wink: We could then tighten up the regexes once someone submits a follow-up PR to Sphinx. A second opinion would be good to get before setting to work, though.

I think the triage stage is set to Maybe because there might be objections to running this on CI vs. manually, but that doesn't bear directly on whether we should get the regexes written to use the new Sphinx feature.

comment:7 by Nick Pope, 5 years ago

Thanks for looking into this Jacob.

So, yes, I was waiting for the feature to be added to Sphinx, but there were some complications over the implementation, so it is only part of what would be needed. It's on my very long list of things to look into. The main problem is that substituting the captures from the source pattern into the target pattern using back-references precludes the use of back-references for repeating captured groups within the target pattern.

At this stage I think we still need to wait until something further is implemented in Sphinx. That is why I flagged this "Someday/Maybe". I think we'd require a great many more entries because we can't make it generic enough. On top of that, one of the reasons for wanting to be able to substitute from the source to the target is to ensure that we're actually redirected to the right page. This whole thing is intended to make linkcheck useful for documentation sites that redirect for language and version, but not mask other potential issues.

comment:8 by Sarah Abderemane, 5 years ago

Thanks Nick and Jacob for your replies. I understand better what it implies to do before this ticket, I'll follow up to see how it will involves.
I agree, the triage stage is confusing, even more for me when I didn't know what it implies really. I think it will be good to have many more entries for better understanding for everyone.

comment:9 by Ralph Reid, 4 years ago

Hi Sarah, I have some spare cycles to help out. Has the closed reference ticket https://github.com/sphinx-doc/sphinx/issues/6525 moved this ticket further?
Thanks.

comment:10 by Sarah Abderemane, 4 years ago

Hi Ralph, I was waiting this issue to be resolved https://github.com/sphinx-doc/sphinx/issues/9234 and the triage stage to be changed.
Nick or Jacob, or someone else who have the knowledge, do you think it's ok to change the triage stage or there are some stuff to be checked before to change the status of this ticket ?

comment:11 by Nick Pope, 4 years ago

My previous comment above still stands - it was posted after that change to Sphinx was merged.

The change made was only part of the solution, so we cannot really progress here yet.

comment:12 by Sarah Abderemane, 4 years ago

Sorry for the inconvenience Nick, thanks for the quick response. I will check if the triage status only changes.

in reply to:  description comment:13 by piunam, 3 months ago

Can i try to solve this ticket ?

comment:14 by Jacob Walls, 3 months ago

It depends: have you identified what changed in Sphinx to unblock anything here? Or, do you have a more minimal proposal for us to consider?

comment:15 by Revant Khanna, 3 weeks ago

Another possible approach could be handling this at the workflow/tooling level instead of adding dynamic backreference substitution support directly inside Sphinx.

For example:

  1. Run sphinx-build -b linkcheck
  2. Generate output.json
  3. Post-process the results using a small Python script
  4. Filter or normalize acceptable redirect patterns before failing CI

This could provide a simpler and more maintainable solution for projects with many predictable redirects.

Example idea:

ALLOWED = [

(

r"https://(.*)\.readthedocs\.io/en/latest/",
r"https://\1.readthedocs.io/en/stable/"

)

]

The post-processing script could:

-> compare original URLs against actual redirect targets
-> dynamically reconstruct expected redirect destinations using re.sub
-> suppress known/acceptable redirects
-> still fail for genuine broken links (404, DNS failures, SSL issues, unexpected redirects, etc.)

After some additional testing, it appears that linkcheck already exposes sufficient redirect metadata externally in output.json.

Example output:

{

"status": "redirected",
"uri": "http://github.com",
"info": "https://github.com/"

}

Since both the original URL (uri) and redirect target (info) are available, external tooling can already reconstruct and validate redirect behavior dynamically without necessarily requiring substitution support directly inside Sphinx core.

Potential advantages of this approach:

-> avoids increasing complexity inside Sphinx core
-> keeps linkcheck behavior simpler and more predictable
-> allows projects to implement project-specific redirect policies externally
-> easier to iterate independently in CI workflows
-> provides more flexibility for custom CI filtering and redirect validation logic

This may be especially useful for large projects like Django where many redirects follow consistent patterns.

Note: See TracTickets for help on using tickets.
Back to Top