Visit Our UNIX and Linux User Community


awk script to split file into multiple files based on many columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk script to split file into multiple files based on many columns
# 1  
Old 04-02-2013
awk script to split file into multiple files based on many columns

So I have a space delimited file that I'd like to split into multiple files based on multiple column values.
This is what my data looks like
Code:
1bc9A02 1 10 1000 FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH
1ku1A02 1 10 1000 DFSGLRVDEAIRILLTKFRLPGESQQIERIIEAFSSAYCENQDYDPSKISDNAEDDISTVQPDADSVFILSYSIIMLNTDLHNPQVKEHMSFEDYSGNLKGCCNHKDFPFWYLDRVYCSIRDKEIVMPEEHHGNE
1b9gA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1bqtA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1efeA00 1 10 100 FVNQHLCGSHLVEALYLVCGERGFFYTPKTRRYPGDVKRGIVEQCCTSICSLYQLENYCN
1eakA01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV
1eakB01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV

This is what I'd like the output to look like
1.10.1000.txt
Code:
1bc9A02 1 10 1000  FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH
1ku1A02 1 10 1000  DFSGLRVDEAIRILLTKFRLPGESQQIERIIEAFSSAYCENQDYDPSKISDNAEDDISTVQPDADSVFILSYSIIMLNTDLHNPQVKEHMSFEDYSGNLKGCCNHKDFPFWYLDRVYCSIRDKEIVMPEEHHGNE

1.10.100.txt
Code:
1b9gA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1bqtA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1efeA00 1 10 100 FVNQHLCGSHLVEALYLVCGERGFFYTPKTRRYPGDVKRGIVEQCCTSICSLYQLENYCN

1.10.101.txt
Code:
1eakA01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV
1eakB01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV

Columns 2, 3, and 4 all vary, so I need to split it based on all three values. I know how to do this using awk and close for one column, but I don't know how to extend it to three columns. Thank you so much in advance!!!
# 2  
Old 04-02-2013
One way:

Code:
$ awk '{print > $2"."$3"."$4".txt" }' file

After running the above awk command:
Code:
$ ls 1.10*
1.10.100.txt  1.10.1000.txt  1.10.101.txt

Guru.
This User Gave Thanks to guruprasadpr For This Post:
# 3  
Old 04-02-2013
This just creates a bunch of empty text files. I'd like those text files to include the information from the original file as indicated in the original post. Thanks.
# 4  
Old 04-02-2013
Its does create files with content. Which OS you are using?

Guru.
# 5  
Old 04-03-2013
I'm using a UNIX terminal. The code doesn't do what is requested.
# 6  
Old 04-03-2013
Quote:
Originally Posted by viored
I'm using a UNIX terminal. The code doesn't do what is requested.
It will be much better if you can show us what exactly you did, what output you got in code tags rather than just simply saying "the code doesn't do what is requested"

Guru's code should work fine, I don't see any issues in it!

But I would also recommend to close the file, because if there are too many files opened, eventually awk may exceed a system limit on the number of open files in one process.

It is best to close each one when the program has finished writing it.
Code:
awk '{F=$2"."$3"."$4".txt";print >> F;close(F)}' inputfile

This User Gave Thanks to Yoda For This Post:
# 7  
Old 04-03-2013
Yoda's code works, Thanks!!

---------- Post updated at 07:00 PM ---------- Previous update was at 10:57 AM ----------

What if, instead of wanting to output the entire line, I wanted to output just the last column in the text files, but with with the same file names? so
1.10.1000.txt
Code:
FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH
DFSGLRVDEAIRILLTKFRLPGESQQIERIIEAFSSAYCENQDYDPSKISDNAEDDISTVQPDADSVFILSYSIIMLNTDLHNPQVKEHMSFEDYSGNLKGCCNHKDFPFWYLDRVYCSIRDKEIVMPEEHHGNE

and so on?

Previous Thread | Next Thread
Test Your Knowledge in Computers #570
Difficulty: Medium
The C programming language allows you to pass a parameter to a function by providing its memory address instead of the value stored in it.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split into multiple files by using Unique columns in a UNIX file

I have requirement to split below file (sample.csv) into multiple files by using the unique columns (first 3 are unique columns) sample.csv 123|22|56789|ABCDEF|12AB34|2019-07-10|2019-07-10|443.3400|1|1 123|12|5679|BCDEFG|34CD56|2019-07-10|2019-07-10|896.7200|1|2... (3 Replies)
Discussion started by: RVSP
3 Replies

2. UNIX for Beginners Questions & Answers

Split file into multiple files based on empty lines

I am using below code to split files based on blank lines but it does not work. awk 'BEGIN{i=0}{RS="";}{x="F"++i;}{print > x;}' Your help would be highly appreciated find attachment of sample.txt file (2 Replies)
Discussion started by: imranrasheedamu
2 Replies

3. Shell Programming and Scripting

Split a single file into multiple files based on a value.

Hi All, I have the sales_data.csv file in the directory as below. SDDCCR; SOM ; MD6546474777 ;05-JAN-16 ABC ; KIRAN ; CB789 ;04-JAN-16 ABC ; RAMANA; KS566767477747 ;06-JAN-16 ABC ; KAMESH; A33535335 ;04-JAN-16 SDDCCR; DINESH; GD6674474747 ;08-JAN-16... (4 Replies)
Discussion started by: ROCK_PLSQL
4 Replies

4. Shell Programming and Scripting

Split a big file into multiple files based on first four characters

I have a requirement to split a huge file to smaller text files based on first four characters which look like ABCD 1234 DFGH RREX : : : : : 0000 Each of these records are OF EQUAL bytes with a different internal layout based on the above first digit identifier.. Any help to start... (5 Replies)
Discussion started by: etldev
5 Replies

5. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

6. Shell Programming and Scripting

Split a file into multiple files based on field value

Hi, I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values. How can I achieve this Unix Here is the sample data. In this case I have split the files based on date column(c4) Input file c1,c2,c3,c4,c5... (1 Reply)
Discussion started by: manasvi24
1 Replies

7. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

8. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

9. Shell Programming and Scripting

Split single file into multiple files based on the number in the column

Dear All, I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...): ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Discussion started by: tomasl
3 Replies

10. Shell Programming and Scripting

awk 3 files to one based on multiple columns

Hi all, I have three files, one is a navigation file, one is a depth file and one is a file containing the measured field of gravity. The formats of the files are; navigation file: 2006 320 17 39 0 0 *nav 21.31542 -157.887 2006 320 17 39 10 0 *nav 21.31542 -157.887 2006 320 17 39 20 0... (2 Replies)
Discussion started by: andrealphus
2 Replies

Featured Tech Videos