Sponsored Content
Full Discussion: sort on fixed length files
Top Forums Shell Programming and Scripting sort on fixed length files Post 302094164 by sach_in on Wednesday 25th of October 2006 05:05:08 PM
Old 10-25-2006
sort on fixed length files

Hi

How to sort a fixed length file on a given char range and just display the duplicates.

I did search for man sort to find any option but could find any.,something similar to cut -c 1-5,25-35.

I have alternate way of doing this by using combination of cut,awk. but this
creates extra temp files.

any suggestions will be helpful without need to create temp files.

Thanks
Sach.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

creating a fixed length output from a variable length input

Is there a command that sets a variable length? I have a input of a variable length field but my output for that field needs to be set to 32 char. Is there such a command? I am on a sun box running ksh Thanks (2 Replies)
Discussion started by: r1500
2 Replies

2. Shell Programming and Scripting

Awk with fixed length files

Hi Unix Champs, I want to awk on a fixed length file. Instead if the file was a delimited file, then I could have used -F and then could have easily done manipulation on the fields. How do i do the same in case of fixed length file? Thanks in Advance. Regards. (7 Replies)
Discussion started by: c2b2
7 Replies

3. Shell Programming and Scripting

Join two fixed length Files in Unix

Hi, Can we join two fixed length files in Unix using JOIN command? Is there any other command to accomplish the same? Thanks, G.Harikrishnan (6 Replies)
Discussion started by: gharikrishnan
6 Replies

4. Shell Programming and Scripting

Awk - Working with fixed length files

OK I am somewhat new to UNIX programming please see what you can do to help. I have a flat file that is a fixed length file containing different records based on the 1st character of each line. The 1st number at the beginning of the line is the record number, in this case it's record #1. I... (3 Replies)
Discussion started by: ambroze
3 Replies

5. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know? (9 Replies)
Discussion started by: tranq01
9 Replies

6. Shell Programming and Scripting

Need a sort solution for fixed length file

I have a 1250 byte record that I need to sort in column 10-19 and in column 301. I have tried the sort command, but it looks like it needs delimiters to work. The record can have spaces in a lot of its 1250 columns, but 10-19, and 301 are guaranteed. These columns are numeric too. A sample... (1 Reply)
Discussion started by: mb1201
1 Replies

7. Shell Programming and Scripting

changing a variable length text to a fixed length

Hi, Can anyone help with a effective solution ? I need to change a variable length text field (between 1 - 18 characters) to a fixed length text of 18 characters with the unused portion, at the end, filled with spaces. The text field is actually field 10 of a .csv file however I could cut... (7 Replies)
Discussion started by: dc18
7 Replies

8. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ... (9 Replies)
Discussion started by: bds052189
9 Replies

9. Shell Programming and Scripting

Unix sort for fixed length columns and records

I was trying to use the AIX 6.1 sort command to sort fixed-length data records, sorting by specific columns only. It took some time to figure out how to get it to work, so I wanted to share the solution. The sort man page wasn't much help, because it talks about field delimeters (default space... (1 Reply)
Discussion started by: CheeseHead1
1 Replies

10. Shell Programming and Scripting

Convert variable length record to fixed length

Hi Team, I have an issue to split the file which is having special chracter(German Char) using awk command. I have a different length records in a file. I am separating the files based on the length using awk command. The command is working fine if the record is not having any... (7 Replies)
Discussion started by: Anthuvan
7 Replies
COMBINE_TESSDATA(1)													       COMBINE_TESSDATA(1)

NAME
combine_tessdata - combine/extract/overwrite Tesseract data SYNOPSIS
combine_tessdata [OPTION] FILE... DESCRIPTION
combine_tessdata(1) is the main program to combine/extract/overwrite tessdata components in [lang].traineddata files. To combine all the individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, language configs) located at, say, /home/$USER/temp/eng.* run: combine_tessdata /home/$USER/temp/eng. The result will be a combined tessdata file /home/$USER/temp/eng.traineddata Specify option -e if you would like to extract individual components from a combined traineddata file. For example, to extract language config file and the unicharset from tessdata/eng.traineddata run: combine_tessdata -e tessdata/eng.traineddata /home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset The desired config file and unicharset will be written to /home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset Specify option -o to overwrite individual components of the given [lang].traineddata file. For example, to overwrite language config and unichar ambiguities files in tessdata/eng.traineddata use: combine_tessdata -o tessdata/eng.traineddata /home/$USER/temp/eng.config /home/$USER/temp/eng.unicharambigs As a result, tessdata/eng.traineddata will contain the new language config and unichar ambigs, plus all the original DAWGs, classifier templates, etc. Note: the file names of the files to extract to and to overwrite from should have the appropriate file suffixes (extensions) indicating their tessdata component type (.unicharset for the unicharset, .unicharambigs for unichar ambigs, etc). See k*FileSuffix variable in ccutil/tessdatamanager.h. Specify option -u to unpack all the components to the specified path: combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng. This will create /home/$USER/temp/eng.* files with individual tessdata components from tessdata/eng.traineddata. OPTIONS
-e .traineddata FILE...: Extracts the specified components from the .traineddata file -o .traineddata FILE...: Overwrites the specified components of the .traineddata file with those provided on the comand line. -u .traineddata PATHPREFIX Unpacks the .traineddata using the provided prefix. CAVEATS
Prefix refers to the full file prefix, including period (.) COMPONENTS
The components in a Tesseract lang.traineddata file as of Tesseract 3.02 are briefly described below; For more information on many of these files, see http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 lang.config (Optional) Language-specific overrides to default config variables. lang.unicharset (Required) The list of symbols that Tesseract recognizes, with properties. See unicharset(5). lang.unicharambigs (Optional) This file contains information on pairs of recognized symbols which are often confused. For example, rn and m. lang.inttemp (Required) Character shape templates for each unichar. Produced by mftraining(1). lang.pffmtable (Required) The number of features expected for each unichar. Produced by mftraining(1) from .tr files. lang.normproto (Required) Character normalization prototypes generated by cntraining(1) from .tr files. lang.punc-dawg (Optional) A dawg made from punctuation patterns found around words. The "word" part is replaced by a single space. lang.word-dawg (Optional) A dawg made from dictionary words from the language. lang.number-dawg (Optional) A dawg made from tokens which originally contained digits. Each digit is replaced by a space character. lang.freq-dawg (Optional) A dawg made from the most frequent words which would have gone into word-dawg. lang.fixed-length-dawgs (Optional) Several dawgs of different fixed lengths -- useful for languages like Chinese. lang.cube-unicharset (Optional) A unicharset for cube, if cube was trained on a different set of symbols. lang.cube-word-dawg (Optional) A word dawg for cube's alternate unicharset. Not needed if Cube was trained with Tesseract's unicharset. lang.shapetable (Optional) When present, a shapetable is an extra layer between the character classifier and the word recognizer that allows the character classifier to return a collection of unichar ids and fonts instead of a single unichar-id and font. lang.bigram-dawg (Optional) A dawg of word bigrams where the words are separated by a space and each digit is replaced by a ?. lang.unambig-dawg (Optional) TODO: Describe. lang.params-training-model (Optional) TODO: Describe. HISTORY
combine_tessdata(1) first appeared in version 3.00 of Tesseract SEE ALSO
tesseract(1), wordlist2dawg(1), cntraining(1), mftraining(1), unicharset(5), unicharambigs(5) COPYING
Copyright (C) 2009, Google Inc. Licensed under the Apache License, Version 2.0 AUTHOR
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present). 02/09/2012 COMBINE_TESSDATA(1)
All times are GMT -4. The time now is 05:33 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy