htdig(1) General Commands Manual htdig(1)NAME
htpurge - remove unused documents from the database (general maintenance script)
SYNOPSIS
htpurge [-][-a][-c configfile][-u][-v]
DESCRIPTION
Htpurge functions to remove specified URLs from the databases as well as bad URLs, unretrieved URLs, obsolete documents, etc. It is recom-
mended that htpurge be run after htdig to clean out any documents of this sort.
OPTIONS
- Take URL list from standard input (rather than specified with -u). Format of input file is one URL per line. -a Use alternate work
files. Tells htpurge to append .work to database files, causing a second copy of the database to be built. This allows the original
files to be used by htsearch during the run.
-c configfile
Use the specified configfile instead of the default.
-u URL Add this URL to the list of documents to remove. Must be specified multiple times if more than one URL are to be removed. Should nor
be used together with -.
-v Verbose mode. This increases the verbosity of the program. Using more than 2 is probably only useful for debugging purposes. The
default verbose mode (using only one -v) gives a nice progress report while digging.
FILES
/etc/htdig/htdig.conf
The default configuration file.
SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdigconfig(8) ,
htdig(1) and htmerge(1) for a detailed description of ht://Dig and its commands.
AUTHOR
This manual page was written by Robert Ribnitz, based on the HTML documentation of ht://Dig.
January 2004 htdig(1)
Check Out this Related Man Page
htmerge(1) General Commands Manual htmerge(1)NAME
htmerge - create document index and word database for the ht://Dig search engine
SYNOPSIS
htmerge [options]
DESCRIPTION
Htmerge is used to create a document index and word database from the files that were created by htdig. These databases are then used by
htsearch to perform the actual searched.
OPTIONS -a Use alternate work files. Tells htdig to append .work to database files, causing a second copy of the database to be built. This
allows the original files to be used by htsearch during the indexing run.
-c configfile
Use the specified configfile instead of the default.
-d Prevent the document index from being created.
-s Print statistics about the document and word databases after htmerge has finished.
-v Run in verbose mode. This will provide some hints as to the progress of the merge. This can be useful when running htmerge interac-
tively since some parts (especially the word database creation) can take a very long time.
-w Prevent the word database from being created.
ENVIRONMENT
TMPDIR In addition to the command line options, the environment variable TMPDIR will be used to designate the directory where intermediate
files are stored during the sorting process.
FILES
/etc/htdig/htdig.conf
The default configuration file.
SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdig(1) and
htsearch(1) for a detailed description of ht://Dig and its commands.
AUTHOR
This manual page was written by Christian Schwarz, modified by Stijn de Bekker, based on the HTML documentation of ht://Dig.
21 July 1997 htmerge(1)
my friend has just made a website which lets you view pages with URL like page=blah.php and stuff, iv tried to explain to him that it's bad becuase people could do page=/etc/passwd but he said he used a shadow file so is np
is there anyway this still could be exploited or is he right? (1 Reply)
I am creating a script to pull URLs out of a Firefox .json file and create a bookmarks.html file from the URLs. I need to know how to grab the URL from each line of output and to copy it >HERE</A>
Each line will have a different URL, I need on each line of output to have the URL copied before... (1 Reply)
Hi all,
I have two files in my folder
1.index.jsp
2.maintenance.jsp
Once hit the URL in IE,It will call the index.jsp file than application working fine.
I want to some maintenance in my application, during the application maintenance than it will... (1 Reply)
Hi,
I have a list of URLs in a csv file which I'm checking for page status. It just prints the URL and the status as output. This works perfectly fine.
I'm looking to run this in multiple threads to make this process faster.
I'm pretty new to Perl and I managed to complete this. It would be... (9 Replies)
I have a folder will a lot of documents (pdf, xls, doc etc.) which users have uploaded but only 20% of them are currently linking from my html files. So my goal is to copy only the files which are linked in my html files from my Document directory into another directory.
Eg: My documents exist... (5 Replies)
I have noted that Oracle use some kind of hypermarking to create URLs within the terminal on Enterprise Linux.
Any idea how to create a URL such as ..., which when right clicked opens a browser window?
This supposed to be spam/advertisement? Got a PM from OP; it is not supposed to be spam... (1 Reply)
Hi Everyone,
I have a problem with wget using an input file of URLs. When I execute this -> wget -i URLs.txt I get the login.php pages transferred but not the files I have in the URLs.txt file. I need to use the input file because it will have new products to download each week. I want my VA to... (3 Replies)
Hi Friends,
I have a bunch of URLs.
Each URL will open up an abstract page.
But, the source contains a link to the main PDF article.
I am looking for a script to do the following task
1. Read input file with URLs.
2. Parse the source and grab all the lines that has the word 'PDF'.... (1 Reply)
Hi,
I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file.
http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html
I have used the below code to arrange... (6 Replies)
Need help with a a tcl code. Need to find out the ip address from a URL if it is present to do some activity.
The URLs will be of the form
<domain>?a=12345&d=somestring1(Note: c not present)
<domain>?c=10.10.10.100&d=somestring1
<domain>?a=12345&b=somestring1&c=10.1.2.4&d=somestring2... (1 Reply)
Hey guys, looking for a way to encode a string into URL and HTML in a bash script that I'm making to encode strings in various different digests etc.
Can't find anything on it anywhere else on the forums.
Any help much appreciated, still very new to bash and programming etc. (4 Replies)
So, I am writing a script that will read output from Bulk Extractor (which gathers data based on regular expressions). My script then reads the column that has the URL found, hashes it with MD5, then outputs the URL and hash to a file.
Where I am stuck on is that I want to read the bulk... (7 Replies)
I want to clean a html file.
I try to remove the script part in the html and remove the rest of tags and empty lines.
The code I try to use is the following:
sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt
However, in this method, I can not... (10 Replies)
Hi All,
We have a HTML source which will be processed using a informatica workflow. In between these two we have a Unix script which transforms the file.
We are getting an error from past week in the informatica saying invalid format, because the file has unused html reference (0-8,14-31 etc)... (2 Replies)