First time poster (so please excuse me in advance)
I have a webserver running linux, apache, etc. I have a list of HTML webpages that I want to delete because I think they are old. While I could delete them then check for broken links, I'd like to be more pro-active.
I want to write a shell script that will search all the pages in my site for links to the pages in my list.
Let's say I have a potential file to delete at
www.fake-url.com/foo/bar/index.html
I can't just grep for it because the page can be written within pages in a number of way:
1) It could be a full or root relative (that's easy enough to search for)
2) It could be a relative link!
I can't grep for "index.html" because there are multiple index pages. I've written some shell scripts, but searching for relative links like this seems overwhelming.
Hopefully my question makes some sense and I'm posting it in an appropriate place. I was thinking of writing my own script, but you know of an existing script or program that does this it would certainly be appreciated!