Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

htpurge(1) [debian man page]

htdig(1)						      General Commands Manual							  htdig(1)

NAME
htpurge - remove unused documents from the database (general maintenance script) SYNOPSIS
htpurge [-][-a][-c configfile][-u][-v] DESCRIPTION
Htpurge functions to remove specified URLs from the databases as well as bad URLs, unretrieved URLs, obsolete documents, etc. It is recom- mended that htpurge be run after htdig to clean out any documents of this sort. OPTIONS - Take URL list from standard input (rather than specified with -u). Format of input file is one URL per line. -a Use alternate work files. Tells htpurge to append .work to database files, causing a second copy of the database to be built. This allows the original files to be used by htsearch during the run. -c configfile Use the specified configfile instead of the default. -u URL Add this URL to the list of documents to remove. Must be specified multiple times if more than one URL are to be removed. Should nor be used together with -. -v Verbose mode. This increases the verbosity of the program. Using more than 2 is probably only useful for debugging purposes. The default verbose mode (using only one -v) gives a nice progress report while digging. FILES
/etc/htdig/htdig.conf The default configuration file. SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdigconfig(8) , htdig(1) and htmerge(1) for a detailed description of ht://Dig and its commands. AUTHOR
This manual page was written by Robert Ribnitz, based on the HTML documentation of ht://Dig. January 2004 htdig(1)

Check Out this Related Man Page

htdig(1)						      General Commands Manual							  htdig(1)

NAME
htpurge - remove unused odocuments from the database (general maintenance script) SYNOPSIS
htpurge [-][-a][-c configfile][-u][-v] DESCRIPTION
Htpurge functions to remove specified URLs from the databases as well as bad URLs, unretrieved URLs, obsolete documents, etc. It is recom- mended that htpurge be run after htdig to clean out any documents of this sort. OPTIONS - Take URL list from standard input (rather than specified with -u). Format of input file is one URL per line. -a Use alternate work files. Tells htpurge to append .work to database files, causing a second copy of the database to be built. This allows the original files to be used by htsearch during the run. -c configfile Use the specified configfile instead of the default. -u URL Add this URL to the list of documents to remove. Must be specified multiple times if more than one URL are to be removed. Should nor be used together with -. -v Verbose mode. This increases the verbosity of the program. Using more than 2 is probably only useful for debugging purposes. The default verbose mode (using only one -v) gives a nice progress report while digging. FILES
/etc/htdig/htdig.conf The default configuration file. SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdigconfig(8) , htdig(1) and htmerge(1) for a detailed description of ht://Dig and its commands. AUTHOR
This manual page was written by Robert Ribnitz, based on the HTML documentation of ht://Dig. January 2004 htdig(1)
Man Page

15 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Php

my friend has just made a website which lets you view pages with URL like page=blah.php and stuff, iv tried to explain to him that it's bad becuase people could do page=/etc/passwd but he said he used a shadow file so is np is there anyway this still could be exploited or is he right? (1 Reply)
Discussion started by: ErNci
1 Replies

2. UNIX for Dummies Questions & Answers

AWK help please

I am creating a script to pull URLs out of a Firefox .json file and create a bookmarks.html file from the URLs. I need to know how to grab the URL from each line of output and to copy it >HERE</A> Each line will have a different URL, I need on each line of output to have the URL copied before... (1 Reply)
Discussion started by: glev2005
1 Replies

3. Shell Programming and Scripting

Need small help

Hi all, I have two files in my folder 1.index.jsp 2.maintenance.jsp Once hit the URL in IE,It will call the index.jsp file than application working fine. I want to some maintenance in my application, during the application maintenance than it will... (1 Reply)
Discussion started by: lkeswar
1 Replies

4. Shell Programming and Scripting

How to remove lines before and after with awk / sed ?

Hi guys, I need to remove the pattern (ID=180), one line before and four lines after. Thanks. (5 Replies)
Discussion started by: ashimada
5 Replies

5. Shell Programming and Scripting

How to make this run in multiple threads

Hi, I have a list of URLs in a csv file which I'm checking for page status. It just prints the URL and the status as output. This works perfectly fine. I'm looking to run this in multiple threads to make this process faster. I'm pretty new to Perl and I managed to complete this. It would be... (9 Replies)
Discussion started by: kzenthil
9 Replies

6. UNIX for Dummies Questions & Answers

Copy files into another directory

I have a folder will a lot of documents (pdf, xls, doc etc.) which users have uploaded but only 20% of them are currently linking from my html files. So my goal is to copy only the files which are linked in my html files from my Document directory into another directory. Eg: My documents exist... (5 Replies)
Discussion started by: ankitha
5 Replies

7. Homework & Coursework Questions

How is it possible to include URLs within the terminal?

I have noted that Oracle use some kind of hypermarking to create URLs within the terminal on Enterprise Linux. Any idea how to create a URL such as ..., which when right clicked opens a browser window? This supposed to be spam/advertisement? Got a PM from OP; it is not supposed to be spam... (1 Reply)
Discussion started by: jon80
1 Replies

8. UNIX for Dummies Questions & Answers

Wget -i URLs.txt problem

Hi Everyone, I have a problem with wget using an input file of URLs. When I execute this -> wget -i URLs.txt I get the login.php pages transferred but not the files I have in the URLs.txt file. I need to use the input file because it will have new products to download each week. I want my VA to... (3 Replies)
Discussion started by: Keith londrie
3 Replies

9. Shell Programming and Scripting

Parse Page Source and Extract Links

Hi Friends, I have a bunch of URLs. Each URL will open up an abstract page. But, the source contains a link to the main PDF article. I am looking for a script to do the following task 1. Read input file with URLs. 2. Parse the source and grab all the lines that has the word 'PDF'.... (1 Reply)
Discussion started by: jacobs.smith
1 Replies

10. Shell Programming and Scripting

Split file into multiple files using delimiter

Hi, I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file. http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html I have used the below code to arrange... (6 Replies)
Discussion started by: vel4ever
6 Replies

11. Shell Programming and Scripting

Need help with TCL code to find IP address from a URL

Need help with a a tcl code. Need to find out the ip address from a URL if it is present to do some activity. The URLs will be of the form <domain>?a=12345&d=somestring1(Note: c not present) <domain>?c=10.10.10.100&d=somestring1 <domain>?a=12345&b=somestring1&c=10.1.2.4&d=somestring2... (1 Reply)
Discussion started by: ampak
1 Replies

12. Shell Programming and Scripting

URL/HTML encoding

Hey guys, looking for a way to encode a string into URL and HTML in a bash script that I'm making to encode strings in various different digests etc. Can't find anything on it anywhere else on the forums. Any help much appreciated, still very new to bash and programming etc. (4 Replies)
Discussion started by: 3therk1ll
4 Replies

13. Shell Programming and Scripting

Hashing URLs

So, I am writing a script that will read output from Bulk Extractor (which gathers data based on regular expressions). My script then reads the column that has the URL found, hashes it with MD5, then outputs the URL and hash to a file. Where I am stuck on is that I want to read the bulk... (7 Replies)
Discussion started by: twjolson
7 Replies

14. Shell Programming and Scripting

How to remove html tag which has multiple lines in SHELL?

I want to clean a html file. I try to remove the script part in the html and remove the rest of tags and empty lines. The code I try to use is the following: sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt However, in this method, I can not... (10 Replies)
Discussion started by: YuhuiFeng
10 Replies

15. UNIX for Beginners Questions & Answers

How to remove unused html codes from the file using UNIX?

Hi All, We have a HTML source which will be processed using a informatica workflow. In between these two we have a Unix script which transforms the file. We are getting an error from past week in the informatica saying invalid format, because the file has unused html reference (0-8,14-31 etc)... (2 Replies)
Discussion started by: karthik adiga
2 Replies