Split the file based on column

01-16-2013

Registered User

59, 0

Join Date: Aug 2009

Last Activity: 31 March 2014, 5:15 PM EDT

Posts: 59

Thanks Given: 16

Thanked 0 Times in 0 Posts

Split the file based on column

Hi,

I have a file sample_1.txt (300k rows) which has data like below:

* Also each record is around 64k bytes

Code:

11|1|abc|102553|125589|64k bytes of data
10|2|def|123452|123356|......
13|2|geh|144351|121123|...
25|4|fgh|165250|118890|..
14|1|abc|186149|116657|......
21|7|def|207048|114424|......
23|7|geh|227947|112191|......
26|32|fgh|248846|109958|......
27|23|abc|269745|107725|......
29|34|def|290644|105492|......
30|32|geh|311543|103259|......
33|23|fgh|332442|101026|......
35|34|abc|353341|98793|......
37|7|def|374240|96560|......
39|4|geh|395139|94327|......
41|2|fgh|416038|92094|......
44|23|abc|436937|89861|......
46|1|def|457836|87628|......
48|3|geh|478735|85395|......
50|23|fgh|499634|83162|......

I am trying to split the files based on the 2nd column like below
sample_1_1.txt

Code:

11|1|abc|102553|125589|......
14|1|abc|186149|116657|......
46|1|def|457836|87628|......

sample_1_2.txt

Code:

10|2|def|123452|123356|......
13|2|geh|144351|121123|......
41|2|fgh|416038|92094|......

sample_1_3.txt

Code:

48|3|geh|478735|85395|......

and so on

Could some help me on this.

Thanks in advance

Last edited by sol_nov; 01-16-2013 at 03:45 PM..

sol_nov

View Public Profile for sol_nov

Find all posts by sol_nov

01-16-2013

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Code:

awk -F\| '{ filename="sample_1_"$2".txt"; print $0 > filename } ' sample_1.txt

This User Gave Thanks to Yoda For This Post:

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

01-16-2013

Registered User

178, 1

Join Date: Dec 2012

Last Activity: 16 September 2018, 1:57 AM EDT

Posts: 178

Thanks Given: 35

Thanked 1 Time in 1 Post

please check this way

Code:

grep "|1|" filename >> filename --> change |1| and redirect to file

---------- Post updated at 12:58 AM ---------- Previous update was at 12:54 AM ----------

mirwasim

View Public Profile for mirwasim

Find all posts by mirwasim

01-16-2013

Registered User

59, 0

Join Date: Aug 2009

Last Activity: 31 March 2014, 5:15 PM EDT

Posts: 59

Thanks Given: 16

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by bipinajith

Code:

awk -F\| '{ filename="sample_1_"$2".txt"; print $0 > filename } ' sample_1.txt

Thanks, But I have the record size around 64000 bytes, so it is failing with too long.

Can this be done using perl?

Last edited by sol_nov; 01-16-2013 at 03:46 PM..

sol_nov

View Public Profile for sol_nov

Find all posts by sol_nov

01-16-2013

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

How about a KSH script?

Code:

#!/bin/ksh

while IFS="|" read f1 f2 f3
do
        echo "$f1|$f2|$f3" >> sample_1_$f2.txt
done < sample_1.txt

This User Gave Thanks to Yoda For This Post:

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

01-16-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

An awk version that might circumvent the line limit get's a bit complicated, but you could try this:

Code:

awk '
  NR==1 && getline n {
    f="sample_1_" n ".txt"
    print $1 >> f
    print RS n >> f
    next
  }  
  NF==2 && getline n {
    print RS $1 FS >> f
    close(f)
    f="sample_1_" n ".txt"
    print $2  >> f
    print RS n  >> f
    next
  }
  {
    print RS $1 >> f
  } 
  END{
    print FS >> f
  }
' RS=\| ORS= FS='\n' infile

Last edited by Scrutinizer; 01-16-2013 at 04:59 PM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

01-16-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Any utility specified to read text files (including awk, grep, read, and sed) may fail on any line longer than LINE_MAX bytes long. The value of LINE_MAX on your system can be found by running the command: getconf LINE_MAX. The cut, paste, and fold utilities, however, are required to work with text files with unlimited line lengths. So, a way to do this is to:
1. Use cut to create a file just containing field 2 from your intput file into a file (e.g., name_list).
2. Use cut to create a file with the first LINE_MAX-5 bytes from of your input file into a file (e.g., part001).
3. Use cut to create other files with sequential sets of LINE_MAX-5 bytes from your input file (e.g., part002 ... partXXX) such that every of part of your input file has been split into a file with lines less than LINE_MAX bytes long.
4. Read name_list and calculate the name of the file to contain the reassembled input line.
5. Read a line from each of the partXXX files and write it to the appropriate output file. (Note that the writes may have to be done as a separate write for each partXXX file line adding a trailing newline character to the write of the last partXXX file.) You could also create separate output_field2_partXXX files, and use paste to create the final output files from these intermediate files.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Split the file based on column

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To Split the file based on column value

Discussion started by: ginrkf

2. UNIX for Beginners Questions & Answers

How to split a column based on |?

Discussion started by: BioBing

3. Shell Programming and Scripting

Split file based on a column/field value

Discussion started by: galaxy_rocky

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

5. UNIX for Dummies Questions & Answers

Split file based on column

Discussion started by: radius

6. Shell Programming and Scripting

Split a file into multiple files based on line numbers and first column value

Discussion started by: sarav.shan

7. Shell Programming and Scripting

How to split a fixed width text file into several ones based on a column value?

Discussion started by: bhanja_trinanja

8. Shell Programming and Scripting

split the file based on the 2nd column passing as a parameter

Discussion started by: number10

9. Shell Programming and Scripting

Split large file based on last digit from a column

Discussion started by: alain.kazan

10. Shell Programming and Scripting

Split single file into multiple files based on the number in the column

Discussion started by: tomasl