Best way to go about this? Post: 302706909

Sponsored Content

Homework and Emergencies Emergency UNIX and Linux Support Best way to go about this? Post 302706909 by stevensw on Thursday 27th of September 2012 03:47:35 PM

09-27-2012

Registered User

Best way to go about this?

I am processing a very large file, which is a text csv report of a database.

I would like to parse this csv file into a bunch of XML files.

I am trying to decide the most efficient way to go about doing this.

Should I open all the XML files at the same time, and as I encounter data I write to whichever descriptor? This approach would only require iterating through the csv file once. But I would be maintaining a bunch of descriptors at the same time, is that efficient?

Should I open and close a descriptor each time I need to write a piece of information to one of the XML files? This approach would also only require iterating through the csv file once. But I would be constantly opening and closing descriptors.

Should I fill out each XML file one at a time, iterating through the whole csv file each time?

Help much appreciated.

stevensw

View Public Profile for stevensw

Find all posts by stevensw

LEARN ABOUT DEBIAN

ocr4gamera

OCR4GAMERA(1)															     OCR4GAMERA(1)

NAME

       ocr4gamera - OCR system using the Gamera framework

USAGE

       ocr4gamera -x <traindata> [options] <imagefile>

OPTIONS

       -v <int>, --verbosity=<int>
	      Set verbosity level to <int>.  Possible values are 0 (default): silent operation; 1:  information on progress; >2: segmentation info
	      is written to PNG files with prefix debug_.

       -h, --help
	      Display help and exit.

       -d, --deskew
	      Do a skew correction (recommended).

       -f, --filter
	      Filter out very large (images) and very small components (noise).

       -a, --automatic-group
	      Autogroup glyphs with classifier.

       -x <file>, --xmlfile=<file>
	      Read training data from <file>.

       -o <xml>, --output=<xml>
	      Write recognized text to file <xml> (otherwise it is written to stdout).

       -c <csv>, --extra_chars_csvfile=<csv>
	      Read additional class name conversions from file <csv>.  <csv> must contain one conversion per line.

       -R <rules>, --heuristic_rules=<rules>
	      Apply heuristic rules <rules> for disambiguation of some chars.  <rules> can be roman (default) or none (for no rules).

       -D, --dictionary-correction
	      Correct words using a dictionary (requires aspell or ispell).

       -L <lang>, --dictionary-language=<lang>
	      Use <lang> as language for aspell (when option -D is set).

       -e <int>, --edit-distance=<int>
	      Correct words only when edit distance not more than <int>.

																     OCR4GAMERA(1)