Sponsored Content
Full Discussion: Best way to go about this?
Homework and Emergencies Emergency UNIX and Linux Support Best way to go about this? Post 302706909 by stevensw on Thursday 27th of September 2012 03:47:35 PM
Old 09-27-2012
Best way to go about this?

I am processing a very large file, which is a text csv report of a database.

I would like to parse this csv file into a bunch of XML files.

I am trying to decide the most efficient way to go about doing this.

Should I open all the XML files at the same time, and as I encounter data I write to whichever descriptor? This approach would only require iterating through the csv file once. But I would be maintaining a bunch of descriptors at the same time, is that efficient?

Should I open and close a descriptor each time I need to write a piece of information to one of the XML files? This approach would also only require iterating through the csv file once. But I would be constantly opening and closing descriptors.

Should I fill out each XML file one at a time, iterating through the whole csv file each time?

Help much appreciated.
 
OCR4GAMERA(1)															     OCR4GAMERA(1)

NAME
ocr4gamera - OCR system using the Gamera framework USAGE
ocr4gamera -x <traindata> [options] <imagefile> OPTIONS
-v <int>, --verbosity=<int> Set verbosity level to <int>. Possible values are 0 (default): silent operation; 1: information on progress; >2: segmentation info is written to PNG files with prefix debug_. -h, --help Display help and exit. -d, --deskew Do a skew correction (recommended). -f, --filter Filter out very large (images) and very small components (noise). -a, --automatic-group Autogroup glyphs with classifier. -x <file>, --xmlfile=<file> Read training data from <file>. -o <xml>, --output=<xml> Write recognized text to file <xml> (otherwise it is written to stdout). -c <csv>, --extra_chars_csvfile=<csv> Read additional class name conversions from file <csv>. <csv> must contain one conversion per line. -R <rules>, --heuristic_rules=<rules> Apply heuristic rules <rules> for disambiguation of some chars. <rules> can be roman (default) or none (for no rules). -D, --dictionary-correction Correct words using a dictionary (requires aspell or ispell). -L <lang>, --dictionary-language=<lang> Use <lang> as language for aspell (when option -D is set). -e <int>, --edit-distance=<int> Correct words only when edit distance not more than <int>. OCR4GAMERA(1)
All times are GMT -4. The time now is 08:11 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy