Sponsored Content
Special Forums News, Links, Events and Announcements Software Releases - RSS News Combine 3.11 (Default branch) Post 302258136 by Linux Bot on Thursday 13th of November 2008 11:30:08 PM
Old 11-14-2008
Combine 3.11 (Default branch)

Combine is an open and extensible system for crawling Internet resources, including harvesting and indexing. It can be used both as a general and focused crawler. Integration with database systems are provided in order to make complete vertical search engine generation possible. License: GNU General Public License (GPL) Changes:
This release adds the switch ZebraIndexing to combineExport. It enables updating of the configured Zebra server with exported records. It fixes a bug in Zebra recordId handling. It adds the switches 'collapseinlinks' and 'nooutlinks' to combineExport. It improves indexing of PDF documents. It fixes a bug in the processing of pure text documents. Image

Image

More...
 

6 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Combine two lines

Hi I have a file with the records 1 A B C D 2 E F G H 3 I J K L 4 M N O P In the ouput I want 1 A B C D 2 # F G H 3 I J K L 4 M N O P How to achieve this? (10 Replies)
Discussion started by: superprg
10 Replies

2. Shell Programming and Scripting

Combine file

Hi all, If I have 4 kind of files , and each file have different date, all of them are flat file and have the same 4 fields Date|ID|Class|City english.20060228.dat 02/28/2006|ABC|ENG|San Tomas 02/28/2006|BCD|ENG|San Jone 02/28/2006|AFD|ENG|San Luis 02/28/2006|ADD|ENG|San Mateo ... (3 Replies)
Discussion started by: sabercats
3 Replies

3. Shell Programming and Scripting

combine

Hi I am having text file like this 001|ramu|hno221|>< sheshadripuram|delhi|560061>< 002|krishna|hno225|>< newdelhimain|delhi|560061>< i want to combine every two lines as single...line... i.e 001|ramu|hno221|sheshadripuram|delhi|560061 can u pls help me (3 Replies)
Discussion started by: suryanarayana
3 Replies

4. Shell Programming and Scripting

combine

Dear all i am having text file like xxx|yyy|1|2| zzz|rrr|3|4| www|xxx|>< 5|6|>< jjj|kkk|>< 8|9>< i want to join two lines which are having ' >< ' by taking only two lines at a stretch ...using awk command the result output should be xxx|yyy|1|2| zzz|rrr|3|4| www|xxx|5|6|... (2 Replies)
Discussion started by: suryanarayana
2 Replies

5. Shell Programming and Scripting

How to combine lines?

Hi, I have a file like this: "sdfc@abc.com","arovls","some addr ", "more stuff" "ssss@email.com","arovls","some addr", "sss" "edx@email.com","arovls","some addr", "sssdfvv" "ssss@a55.com","arovls","some addr", "lsdsdgf" "ssss@0234.com","aro vls","123 Main", "lSdfv" I want to... (4 Replies)
Discussion started by: erniel
4 Replies

6. Shell Programming and Scripting

combine 2 lines

Hello, I want to combine 2 lines in one I have a text file example: bla123 blo31 xx:yy:zz ->bla43 bli532 00:01:02 bla1237 blo351 aa:ss:dd ->bla433 bli34332 55:10:28 I want the result to be: bla123 blo31 xx:yy:zz, ->bla43 bli532 00:01:02 bla1237 blo351 aa:ss:dd, ->bla433 bli34332... (3 Replies)
Discussion started by: Petko Meshov
3 Replies
VILISTEXTUM(1)						      General Commands Manual						    VILISTEXTUM(1)

NAME
vilistextum - html to ascii converter SYNOPSIS
vilistextum [OPTIONS] [inputfile |-] [outputfile | -] DESCRIPTION
vilistextum is a html to ascii converter specifically programmed to get the best out of incorrect html. OPTIONS
inputfile,- resp. outputfile,- replace inputfile with '-' for reading from standard input, likewise outputfile with '-' for writing to standard output. -a, --no-alt don't output anything for IMG tags even if they have an ALT attribute. Implies --no-image. -c, --convert-tags some tags will be converted to special characters. -e, --errorlevel NUMBER increase level of verbosity for error messages (0: No error messages). -i, --defimage STRING IMG tags without alt attribute are output as [STRING]. -l, --links numbers the links in the document and creates footnotes of each link at the end of the file. -k, --links-inline print the links directly after the html tag. -m, --dont-convert-characters don't convert the entities from windows1252 (&#128;-&#159; and their proper entity names) -n, --no-image don't output [Image] for IMG tags that have no ALT attribute. -p, --palm output text more suitable for reading on a PDA. -r, --remove-empty-alt if there is an empty ALT attribute in a IMG tag (eg <IMG href="..." alt="">), don't output '[]'. -s, --shrink-lines [NUMBER] if there are more than NUMBER empty lines, output only NUMBER. Default: 1. -t, --no-title don't output title. -w, --width NUMBER maximum line width. -h, --help display this help and exit -v, --version output version information and exit MULTIBYTE OPTIONS (Only available if compiled with multibyte support) -u, --output-utf-8 instead of the character set of the html document, everything will be output as utf-8. -x, --translit use the //TRANSLIT feature of libiconv. Consult the iconv manual for details. -y, --charset CHARSET if the HTML document doesn't provide a character set in the meta tags, use CHARSET. LIMITATIONS
The rendering of tables is not very good. The handling of OL is incomplete. The program treats it as UL and more than 10 nested lists confuse it. Text is never justified. REPORTING BUGS
Please report bugs to <bhaak@gmx.net>. AUTHOR
Vilistextum was written by Patric Mueller <bhaak@gmx.net> and may be freely distributed under the terms of the GNU General Public License Version 2. There is ABSOLUTELY NO WARRANTY for this program. SEE ALSO
iconv(3), lynx(1), links(1), w3m(1) 22 OCT 2006 VILISTEXTUM(1)
All times are GMT -4. The time now is 09:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy