08-14-2008
Finding what pages link to a specific file
First time poster (so please excuse me in advance)
I have a webserver running linux, apache, etc. I have a list of HTML webpages that I want to delete because I think they are old. While I could delete them then check for broken links, I'd like to be more pro-active.
I want to write a shell script that will search all the pages in my site for links to the pages in my list.
Let's say I have a potential file to delete at
www.fake-url.com/foo/bar/index.html
I can't just grep for it because the page can be written within pages in a number of way:
1) It could be a full or root relative (that's easy enough to search for)
2) It could be a relative link!
I can't grep for "index.html" because there are multiple index pages. I've written some shell scripts, but searching for relative links like this seems overwhelming.
Hopefully my question makes some sense and I'm posting it in an appropriate place. I was thinking of writing my own script, but you know of an existing script or program that does this it would certainly be appreciated!
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi everyone,
Can anyone guide me on how to search through a huge file and look on specific column and if it finds a discrepancy on that column that does not conform to the specified criteria,
ie
(1) Numeric and (3) alpha chars F123 or G333..etc, etc!
then idientify it and redirect... (3 Replies)
Discussion started by: Gerry405
3 Replies
2. UNIX for Dummies Questions & Answers
before i get to it, i would like to say this is the greatest unix site ive ever seen, and im glad to see so many people are out there to help. thanks
well, im trying to make myself a script where i can specify a directory and a file size so that my script will show me any files larger than the... (5 Replies)
Discussion started by: linuxlaptop
5 Replies
3. Shell Programming and Scripting
Hi
experts
problem:
i have a directory "DATA" with lots of subdirectories named as date with hudge data containning files.
Directory = "DATA"
subdirectory = "20090611" & "20090612" ......
20090611 = thousands of files
i wanna apply find command to find all files in... (3 Replies)
Discussion started by: The_Archer
3 Replies
4. Shell Programming and Scripting
Hi All,
I am tring to insert a newline with "/" in a text file whenever there is the text "end;"
right now I have inside file:
.
.
end;
I want to have:
.
.
end;
/
I tried doing the following within the file
:g/^end;/s//end; \/ / (4 Replies)
Discussion started by: jxh461
4 Replies
5. Shell Programming and Scripting
Hi,
I am trying to develop a script which should find a word if a particular word exists.
Below is the content of the file.
insert_job: test_job ----> job name
days_of_week: all
start_times: "16:00"
date_conditions: 1
insert_job: test_job2 ----> job name
days_of_week: all... (16 Replies)
Discussion started by: rpatty
16 Replies
6. UNIX for Advanced & Expert Users
hi,
I want to store from 102 character to 128 character to a variable of header record which can be identified as 'HDR' which is the first 3 characters in the same line of a same.txt file.
Please advise.
Thanks (4 Replies)
Discussion started by: techmoris
4 Replies
7. Shell Programming and Scripting
Hi All,
Thanks in advance
File is generated with following format
31000000.xml to 48999999.xml
74000000.xml to 88999999.xml
Above range should be find and moved into the folder named abc
and below is another range should should be find and moved into folder named xyz
... (1 Reply)
Discussion started by: sujit_kashyap
1 Replies
8. Shell Programming and Scripting
Hi All,
I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length.
1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number... (6 Replies)
Discussion started by: lancesunny
6 Replies
9. Shell Programming and Scripting
I have unix file like below
>newuser
newuser
<hello
hello
newone
I want to find the unique values in the file(excluding <,>),so that the out put should be
>newuser
<hello
newone
can any body tell me what is command to get this new file. (7 Replies)
Discussion started by: shiva2985
7 Replies
10. UNIX for Beginners Questions & Answers
Text in input file is like this
<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold"... (4 Replies)
Discussion started by: aankita30
4 Replies
LEARN ABOUT DEBIAN
hxcopy
HXCOPY(1) HTML-XML-utils HXCOPY(1)
NAME
hxcopy - copy an HTML file and update its relative links
SYNOPSIS
hxcopy [ -i old-URL ] [ -o new-URL ] [ file-or-URL [ file-or-URL ] ]
DESCRIPTION
The hxcopy command copies its first argument to its second argument, while updating relative links. The input is assumed to be HTML or
XHTML and may be slightly reformatted in the process.
If the second argument is omitted, hxcopy writes to standard output. In this case the option -o is required. If the first argument is also
omitted, hxcopy reads from standard input. In this case the option -i is required.
OPTIONS
The following options are supported:
-i old-URL
For the purposes of updating relative links, act as if old-URL is the location from which the input is copied. If this option is
omitted, the actual location of the first argument is used for calculating relative links.
-o new-URL
For the purposed of updating relative links, act as if new-URL is the location to which the input is copied. If this option is
omitted, the actual location of the second argument is used for calculating relative links.
ENVIRONMENT
To use a proxy to retrieve remote files, set the environment variables http_proxy and ftp_proxy. E.g., http_proxy="http://localhost:8080/"
BUGS
Unlike the last argument of cp(1), the last argument of hxcopy must be a file, not a directory.
The second argument must be a local file. Writing to a URL is not yet implemented. To work around this, replace hxcopy file.html
http://example.org/file.html by hxcopy -o http://example.org/file.html file.html tmp.html and then upload tmp.html to the given URL with
some other command, such as curl(1). The first argument, however, may be a URL. hxcopy will download the given file. (Currently only HTTP
is supported.)
EXAMPLE
Assume the HTML file foo.html contains a relative link to "../bar.html". Here are some examples of commands:
hxcopy foo.html bar/foo.html
The file foo.html is copied to ../bar/foo.html and the relative link to "../bar.html" becomes "../../bar.html".
hxcopy foo.html ../foo.html
The file foo.html is copied to ../foo.html and the relative link to "../bar.html" is rewritten as "bar.html".
hxcopy -i http://my.org/dir1/foo.html -o http://my.org/foo.html file1.html file2.html
The file file1.html is copied to file2.html and the relative link to "../bar.html" is rewritten as "bar.html". A command like this
may be useful to update files that are later uploaded to a server.
SEE ALSO
cp(1), curl(1), hxwls(1)
6.x 9 Dec 2008 HXCOPY(1)