htdig(1) General Commands Manual htdig(1)NAME
htpurge - remove unused documents from the database (general maintenance script)
SYNOPSIS
htpurge [-][-a][-c configfile][-u][-v]
DESCRIPTION
Htpurge functions to remove specified URLs from the databases as well as bad URLs, unretrieved URLs, obsolete documents, etc. It is recom-
mended that htpurge be run after htdig to clean out any documents of this sort.
OPTIONS
- Take URL list from standard input (rather than specified with -u). Format of input file is one URL per line. -a Use alternate work
files. Tells htpurge to append .work to database files, causing a second copy of the database to be built. This allows the original
files to be used by htsearch during the run.
-c configfile
Use the specified configfile instead of the default.
-u URL Add this URL to the list of documents to remove. Must be specified multiple times if more than one URL are to be removed. Should nor
be used together with -.
-v Verbose mode. This increases the verbosity of the program. Using more than 2 is probably only useful for debugging purposes. The
default verbose mode (using only one -v) gives a nice progress report while digging.
FILES
/etc/htdig/htdig.conf
The default configuration file.
SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdigconfig(8) ,
htdig(1) and htmerge(1) for a detailed description of ht://Dig and its commands.
AUTHOR
This manual page was written by Robert Ribnitz, based on the HTML documentation of ht://Dig.
January 2004 htdig(1)
Check Out this Related Man Page
htdig(1) General Commands Manual htdig(1)NAME
htpurge - remove unused odocuments from the database (general maintenance script)
SYNOPSIS
htpurge [-][-a][-c configfile][-u][-v]
DESCRIPTION
Htpurge functions to remove specified URLs from the databases as well as bad URLs, unretrieved URLs, obsolete documents, etc. It is recom-
mended that htpurge be run after htdig to clean out any documents of this sort.
OPTIONS
- Take URL list from standard input (rather than specified with -u). Format of input file is one URL per line. -a Use alternate work
files. Tells htpurge to append .work to database files, causing a second copy of the database to be built. This allows the original
files to be used by htsearch during the run.
-c configfile
Use the specified configfile instead of the default.
-u URL Add this URL to the list of documents to remove. Must be specified multiple times if more than one URL are to be removed. Should nor
be used together with -.
-v Verbose mode. This increases the verbosity of the program. Using more than 2 is probably only useful for debugging purposes. The
default verbose mode (using only one -v) gives a nice progress report while digging.
FILES
/etc/htdig/htdig.conf
The default configuration file.
SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdigconfig(8) ,
htdig(1) and htmerge(1) for a detailed description of ht://Dig and its commands.
AUTHOR
This manual page was written by Robert Ribnitz, based on the HTML documentation of ht://Dig.
January 2004 htdig(1)
my friend has just made a website which lets you view pages with URL like page=blah.php and stuff, iv tried to explain to him that it's bad becuase people could do page=/etc/passwd but he said he used a shadow file so is np
is there anyway this still could be exploited or is he right? (1 Reply)
I am creating a script to pull URLs out of a Firefox .json file and create a bookmarks.html file from the URLs. I need to know how to grab the URL from each line of output and to copy it >HERE</A>
Each line will have a different URL, I need on each line of output to have the URL copied before... (1 Reply)
Hi all,
I have two files in my folder
1.index.jsp
2.maintenance.jsp
Once hit the URL in IE,It will call the index.jsp file than application working fine.
I want to some maintenance in my application, during the application maintenance than it will... (1 Reply)
Hi,
I have a list of URLs in a csv file which I'm checking for page status. It just prints the URL and the status as output. This works perfectly fine.
I'm looking to run this in multiple threads to make this process faster.
I'm pretty new to Perl and I managed to complete this. It would be... (9 Replies)
I have a folder will a lot of documents (pdf, xls, doc etc.) which users have uploaded but only 20% of them are currently linking from my html files. So my goal is to copy only the files which are linked in my html files from my Document directory into another directory.
Eg: My documents exist... (5 Replies)
I have noted that Oracle use some kind of hypermarking to create URLs within the terminal on Enterprise Linux.
Any idea how to create a URL such as ..., which when right clicked opens a browser window?
This supposed to be spam/advertisement? Got a PM from OP; it is not supposed to be spam... (1 Reply)
Hi Everyone,
I have a problem with wget using an input file of URLs. When I execute this -> wget -i URLs.txt I get the login.php pages transferred but not the files I have in the URLs.txt file. I need to use the input file because it will have new products to download each week. I want my VA to... (3 Replies)
Hi Friends,
I have a bunch of URLs.
Each URL will open up an abstract page.
But, the source contains a link to the main PDF article.
I am looking for a script to do the following task
1. Read input file with URLs.
2. Parse the source and grab all the lines that has the word 'PDF'.... (1 Reply)
Hi,
I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file.
http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html
I have used the below code to arrange... (6 Replies)
Need help with a a tcl code. Need to find out the ip address from a URL if it is present to do some activity.
The URLs will be of the form
<domain>?a=12345&d=somestring1(Note: c not present)
<domain>?c=10.10.10.100&d=somestring1
<domain>?a=12345&b=somestring1&c=10.1.2.4&d=somestring2... (1 Reply)
Hey guys, looking for a way to encode a string into URL and HTML in a bash script that I'm making to encode strings in various different digests etc.
Can't find anything on it anywhere else on the forums.
Any help much appreciated, still very new to bash and programming etc. (4 Replies)
So, I am writing a script that will read output from Bulk Extractor (which gathers data based on regular expressions). My script then reads the column that has the URL found, hashes it with MD5, then outputs the URL and hash to a file.
Where I am stuck on is that I want to read the bulk... (7 Replies)
I want to clean a html file.
I try to remove the script part in the html and remove the rest of tags and empty lines.
The code I try to use is the following:
sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt
However, in this method, I can not... (10 Replies)
Hi All,
We have a HTML source which will be processed using a informatica workflow. In between these two we have a Unix script which transforms the file.
We are getting an error from past week in the informatica saying invalid format, because the file has unused html reference (0-8,14-31 etc)... (2 Replies)