sort on fixed length files Post: 302094164

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

creating a fixed length output from a variable length input

Is there a command that sets a variable length? I have a input of a variable length field but my output for that field needs to be set to 32 char. Is there such a command? I am on a sun box running ksh Thanks

2. Shell Programming and Scripting

Awk with fixed length files

Hi Unix Champs, I want to awk on a fixed length file. Instead if the file was a delimited file, then I could have used -F and then could have easily done manipulation on the fields. How do i do the same in case of fixed length file? Thanks in Advance. Regards.

3. Shell Programming and Scripting

Join two fixed length Files in Unix

Hi, Can we join two fixed length files in Unix using JOIN command? Is there any other command to accomplish the same? Thanks, G.Harikrishnan

4. Shell Programming and Scripting

Awk - Working with fixed length files

OK I am somewhat new to UNIX programming please see what you can do to help. I have a flat file that is a fixed length file containing different records based on the 1st character of each line. The 1st number at the beginning of the line is the record number, in this case it's record #1. I...

5. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know?

6. Shell Programming and Scripting

Need a sort solution for fixed length file

I have a 1250 byte record that I need to sort in column 10-19 and in column 301. I have tried the sort command, but it looks like it needs delimiters to work. The record can have spaces in a lot of its 1250 columns, but 10-19, and 301 are guaranteed. These columns are numeric too. A sample...

7. Shell Programming and Scripting

changing a variable length text to a fixed length

Hi, Can anyone help with a effective solution ? I need to change a variable length text field (between 1 - 18 characters) to a fixed length text of 18 characters with the unused portion, at the end, filled with spaces. The text field is actually field 10 of a .csv file however I could cut...

8. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ...

9. Shell Programming and Scripting

Unix sort for fixed length columns and records

I was trying to use the AIX 6.1 sort command to sort fixed-length data records, sorting by specific columns only. It took some time to figure out how to get it to work, so I wanted to share the solution. The sort man page wasn't much help, because it talks about field delimeters (default space...

10. Shell Programming and Scripting

Convert variable length record to fixed length

Hi Team, I have an issue to split the file which is having special chracter(German Char) using awk command. I have a different length records in a file. I am separating the files based on the length using awk command. The command is working fine if the record is not having any...

LEARN ABOUT DEBIAN

combine_tessdata

COMBINE_TESSDATA(1)													       COMBINE_TESSDATA(1)

NAME

       combine_tessdata - combine/extract/overwrite Tesseract data

SYNOPSIS

       combine_tessdata [OPTION] FILE...

DESCRIPTION

       combine_tessdata(1) is the main program to combine/extract/overwrite tessdata components in [lang].traineddata files.

       To combine all the individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, language configs) located at, say,
       /home/$USER/temp/eng.* run:

	   combine_tessdata /home/$USER/temp/eng.

       The result will be a combined tessdata file /home/$USER/temp/eng.traineddata

       Specify option -e if you would like to extract individual components from a combined traineddata file. For example, to extract language
       config file and the unicharset from tessdata/eng.traineddata run:

	   combine_tessdata -e tessdata/eng.traineddata 
	     /home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset

       The desired config file and unicharset will be written to /home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset

       Specify option -o to overwrite individual components of the given [lang].traineddata file. For example, to overwrite language config and
       unichar ambiguities files in tessdata/eng.traineddata use:

	   combine_tessdata -o tessdata/eng.traineddata 
	     /home/$USER/temp/eng.config /home/$USER/temp/eng.unicharambigs

       As a result, tessdata/eng.traineddata will contain the new language config and unichar ambigs, plus all the original DAWGs, classifier
       templates, etc.

       Note: the file names of the files to extract to and to overwrite from should have the appropriate file suffixes (extensions) indicating
       their tessdata component type (.unicharset for the unicharset, .unicharambigs for unichar ambigs, etc). See k*FileSuffix variable in
       ccutil/tessdatamanager.h.

       Specify option -u to unpack all the components to the specified path:

	   combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.

       This will create /home/$USER/temp/eng.* files with individual tessdata components from tessdata/eng.traineddata.

OPTIONS

       -e .traineddata FILE...: Extracts the specified components from the .traineddata file

       -o .traineddata FILE...: Overwrites the specified components of the .traineddata file with those provided on the comand line.

       -u .traineddata PATHPREFIX Unpacks the .traineddata using the provided prefix.

CAVEATS

       Prefix refers to the full file prefix, including period (.)

COMPONENTS

       The components in a Tesseract lang.traineddata file as of Tesseract 3.02 are briefly described below; For more information on many of these
       files, see http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

       lang.config
	   (Optional) Language-specific overrides to default config variables.

       lang.unicharset
	   (Required) The list of symbols that Tesseract recognizes, with properties. See unicharset(5).

       lang.unicharambigs
	   (Optional) This file contains information on pairs of recognized symbols which are often confused. For example, rn and m.

       lang.inttemp
	   (Required) Character shape templates for each unichar. Produced by mftraining(1).

       lang.pffmtable
	   (Required) The number of features expected for each unichar. Produced by mftraining(1) from .tr files.

       lang.normproto
	   (Required) Character normalization prototypes generated by cntraining(1) from .tr files.

       lang.punc-dawg
	   (Optional) A dawg made from punctuation patterns found around words. The "word" part is replaced by a single space.

       lang.word-dawg
	   (Optional) A dawg made from dictionary words from the language.

       lang.number-dawg
	   (Optional) A dawg made from tokens which originally contained digits. Each digit is replaced by a space character.

       lang.freq-dawg
	   (Optional) A dawg made from the most frequent words which would have gone into word-dawg.

       lang.fixed-length-dawgs
	   (Optional) Several dawgs of different fixed lengths -- useful for languages like Chinese.

       lang.cube-unicharset
	   (Optional) A unicharset for cube, if cube was trained on a different set of symbols.

       lang.cube-word-dawg
	   (Optional) A word dawg for cube's alternate unicharset. Not needed if Cube was trained with Tesseract's unicharset.

       lang.shapetable
	   (Optional) When present, a shapetable is an extra layer between the character classifier and the word recognizer that allows the
	   character classifier to return a collection of unichar ids and fonts instead of a single unichar-id and font.

       lang.bigram-dawg
	   (Optional) A dawg of word bigrams where the words are separated by a space and each digit is replaced by a ?.

       lang.unambig-dawg
	   (Optional) TODO: Describe.

       lang.params-training-model
	   (Optional) TODO: Describe.

HISTORY

       combine_tessdata(1) first appeared in version 3.00 of Tesseract

SEE ALSO

       tesseract(1), wordlist2dawg(1), cntraining(1), mftraining(1), unicharset(5), unicharambigs(5)

COPYING

       Copyright (C) 2009, Google Inc. Licensed under the Apache License, Version 2.0

AUTHOR

       The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).

								    02/09/2012						       COMBINE_TESSDATA(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

creating a fixed length output from a variable length input

Discussion started by: r1500

2. Shell Programming and Scripting

Awk with fixed length files

Discussion started by: c2b2

3. Shell Programming and Scripting

Join two fixed length Files in Unix

Discussion started by: gharikrishnan

4. Shell Programming and Scripting

Awk - Working with fixed length files

Discussion started by: ambroze