hi all
i have a some huge html files (500MB to 1GB). Each file has multiple
<html></html> tags
<html>
.................
....................
....................
</html>
<html>
.................
....................
....................
</html>
<html>
.................... (5 Replies)
To split the files
Hi,
I'm having a xml file with multiple xml header. so i want to split the file into multiple files.
Test.xml
---------
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>... (11 Replies)
Hi All,
I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me.
<A>testing_Location</A>
<value>LA</value>
<zone>US</zone>
<B>Region</B>
<value>Russia</value>
<zone>Washington</zone>
<C>Country</C>... (0 Replies)
HI All,
I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz.
eg:
<?xml version="1.0"?>... (4 Replies)
Hi All,
I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom.
from the 8th line actual record starts and each record contains 15 lines... (14 Replies)
Hi Everyone,
I'm new here and I was checking this old post:
/shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html
(cannot paste link because of lack of points)
I need to do something like this but understand very little of perl.
I also check... (4 Replies)
Hi All,
We need to split a large xml into multiple valid xml with same header(2lines) and footer(last line) for N number of letterId.
In the example below we have first 2 lines as header and last line as footer.(They need to be in each split xml file)
Header:
<?xml version="1.0"... (5 Replies)
Hello Shell Guru's
I have a requirement to split the source xml file into three different text file.
And i need your valuable suggestion to finish this.
Here is my source xml snippet, here i am using only one entry of <jms-system-resource>. There may be multiple entries in the source file.
... (5 Replies)
Hello Gurus,
I have a requirement to split the xml file into different xml files.
Can you please help me with that?
Here is my Source XML file
<jms-system-resource>
<name>PS6SOAJMSModule</name>
<target>soa_server1</target>
<sub-deployment>
... (3 Replies)
I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
Discussion started by: hungryd
9 Replies
LEARN ABOUT DEBIAN
hocr
HOCR(1) User Commands HOCR(1)NAME
hocr - Hebrew OCR utility
DESCRIPTION
Usage:
hocr [OPTION...] - Hebrew OCR utility
Help Options:
-?, --help
Show help options
--help-all
Show all help options
--help-file
Show file options
--help-image-proccesing
Show image proccesing options
--help-segmentation
Show segmentation options
--help-debug
Show debug options
File options
-O, --images-out-path=PATH
use PATH for output images
-u, --data-out=FILE
use FILE as output data file name
-C, --save-copy
save a compy of original image
-b, --save-bw
save proccesd bw image
-B, --save-bw-exit
save proccesd bw image and exit
-l, --save-layout
save layout image
-L, --save-layout-exit
save layout image and exit
-f, --save-fonts
save fonts
-F, --save-fonts-exit
save fonts images and exit
Image proccesing options
-T, --thresholding-type=NUM
thresholding type, 0 normal, 1 none, 2 fine
-t, --threshold=NUM
use NUM as threshold value, 1..100
-a, --adaptive-threshold=NUM
use NUM as adaptive threshold value, 1..100
-s, --scale=SCALE
scale input image by SCALE 1..9, 0 auto
-S, --no-auto-scale
do not auto acale image
-q, --rotate=DEG
rotate image clockwise in deg.
-Q, --no-auto-rotate
do not auto rotate image
Segmentation options
-c, --colums setup=NUM
colums setup: 1.. #colums, 0 auto, 255 free
-x, --slicing=NUM
use NUM as font slicing threshold, 1..250
-X, --slicing-width=NUM
use NUM as font slicing width, 50..250
-w, --font-spacing=NUM
font spacing: tight ..-1, 0, 1.. spaced
Debug options
-g, --draw-grid
draw grid on output images
-d, --debug
print debuging information while running
-D, --debug-extra
print extra debuging information
-y, --font-filter=NUM
debug a font filter, use filter NUM
-Y, --font-filter-list
print a list of debug a font filters
-j, --font-num-out
print font numbers in output text
Application Options:
-i, --image-in=FILE
use FILE as input image file name
-o, --text-out=FILE
use FILE as output text file name
-h, --html-out
output text in html format
-N, --no-gtk
do not use gtk for file input and output
-z, --font=NUM
use font NUM
-n, --no-nikud
do not recognize nikud
-v, --version
print version information and exit
libhocr-0.10.5-i686-pc-linux-gnu-12022008 http://hocr.berlios.de Copyright (C) 2005-2008 Yaacov Zamir <kzamir@walla.co.il>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MER-
CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
SEE ALSO
gocr(1), ocrad(1), unpaper(1)hocr - Hebrew OCR utility February 2008 HOCR(1)