If you not familiar with a
robots.txt - this is a file that site owners can place in the site root in order to control crawling and indexation by the search robots.
And it appears that
deadlinks is also in some kind of a search robot. If you are a site owner it might be useful to learn about such a feature and how to use it.
We do respect
robots.txt by default, however, there are always some edge cases when
deadlinks users can find useful ignore robots.txt instructions. For this propose use can use a
deadlinksUser-Agent isn’t important enough to be included in github.com
robots.txt, but checking a state of linked repositories or files is quite important. More of that - it was a reason why
deadlinkswas created in the first place.
You need to implement links checks as part of CI/CD procedure for the web site publicly not available.
Weird cases when sites are generating
robots.txton the fly, but forbid to direct access for humans (e.g. Amazon S3 Hosting).