Split file based on distinct value at specific position

07-22-2013

Registered User

59, 1

Join Date: Jan 2013

Last Activity: 19 December 2017, 10:27 AM EST

Posts: 59

Thanks Given: 6

Thanked 1 Time in 1 Post

Split file based on distinct value at specific position

OS : Linux 2.6x
Shell : Korn

In a single file , how can I identify all the Uniqe values at a specific character position and length of each record ,
and simultaneously SPLIT the records of the file based on each of these values and write them in seperate files .

Lets say :

Code:

a) I want to know what are the distinct values in the field marked by start character position 15 , and the next three
   characters , for each record in a file
b) If there are TWO SUCH DISTINCT VALUES , how to get the records for each of the distinct values in seperate files ?

Please help , it urgent , and its not a HOMEWORK ASSIGNMENT .

Thanks
Kumarjit.

kumarjt

View Public Profile for kumarjt

Find all posts by kumarjt

07-22-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Could you provide sample input and output files?

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

07-22-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try

Code:

awk '{fn=substr($0, 15,3); print > fn}' file

If the number of output files gets larger and larger, you may need to close (fn) in between...

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-23-2013

Registered User

59, 1

Join Date: Jan 2013

Last Activity: 19 December 2017, 10:27 AM EST

Posts: 59

Thanks Given: 6

Thanked 1 Time in 1 Post

@RudiC : Danke Rudi .......

But , didnt understand what you tried to mean by saying :

Code:

you may need to close (fn) in between...

If the numbre distinct values indicated by the field whose start position is 15 th character spanning the next three characters , how to ensure that this code performs optimally , cause I had tried and awk command over 10 million records , and it was going at snails's pace.

Please validate if my undertstanding is true.

Thanks to all of you.

Regards
Kumarjit.

---------- Post updated at 03:25 AM ---------- Previous update was at 03:21 AM ----------

Actually , what I tried to mean is :

If the number distinct values indicated by the field whose start position is 15 th character spanning the next three characters is significantly on the larger side , how to ensure that this code performs optimally , cause I had tried and awk command over 10 million records , and it was going at snails's pace.

Thanks again
Kumarjit.

kumarjt

View Public Profile for kumarjt

Find all posts by kumarjt

07-26-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

At some point, the number of open files per process is exhausted. Certainly, with three characters, you will be able to reach 1000+ files, which is very close to OPEN_MAX (1024 on Linux). You need to append to the files, then, using >>, and close (fn) after each write.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Split file based on distinct value at specific position

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is it possible to rename fasta headers based on its position specified in another file?

Discussion started by: dineshkumarsrk

2. Shell Programming and Scripting

Count specific character of a file in each line and delete this character in a specific position

Discussion started by: teokon90

3. Shell Programming and Scripting

Search for a string at a particular position and replace with blank based on position

Discussion started by: Pradhikshan

4. Shell Programming and Scripting

Fixed width file search based on position value

Discussion started by: onesuri

5. Shell Programming and Scripting

position specific replace in file

Discussion started by: greenworld123

6. UNIX for Dummies Questions & Answers

Script to delete a word based on position in a file

Discussion started by: nbks2u

7. Shell Programming and Scripting

Copy an entire file to specific position to another file

Discussion started by: Pratik4891

8. UNIX for Dummies Questions & Answers

To Extract words from File based on Position

Discussion started by: kuttu123

9. Shell Programming and Scripting

Add characters at specific position in file

Discussion started by: dashing201

10. Shell Programming and Scripting

Insert character in a specific position of a file

Discussion started by: gpaulose