Split file by column value, each with header


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split file by column value, each with header
# 1  
Old 11-14-2016
Split file by column value, each with header

Hello all,

I have a csv with with different testcase values in column 5.


Code:
year,min,max,Instrument,Testcase
2016,201,1003,GEOTROPH-02116,TATA7980
2016,53,1011,GEOTROPH-01963,TATA7980
2016,3,1024,GEOTROPH-02067,TATA7980
2016,203,1027,GEOTROPH-02011,TATA7980
2016,135,1016,GEOTROPH-01932,TATA3405
2016,71,1036,GEOTROPH-02030,TATA7980
2016,173,1046,GEOTROPH-02033,TATA7980
2016,183,1015,GEOTROPH-02120,TATA7980
2016,47,1037,GEOTROPH-02154,TATA7980
2016,63,1017,GEOTROPH-01959,TATA7980
2016,73,1019,GEOTROPH-01966,TATA7980
2016,65,1023,GEOTROPH-01931,TATA3405
2016,317,1012,GEOTROPH-02077,TATA7980
2016,9,1027,GEOTROPH-02089,TATA7980
2016,61,1018,GEOTROPH-01947,TATA7980
2016,341,1029,GEOTROPH-02087,TATA7980
2016,341,1015,GEOTROPH-02004,TATA7980
2016,93,1027,GEOTROPH-02066,TATA7980
2016,205,1007,GEOTROPH-02021,TATA7980
2016,79,1013,GEOTROPH-02143,TATA7980
2016,51,1050,GEOTROPH-02000,TATA7980
2016,85,1010,GEOTROPH-02048,TATA7980
2016,47,1036,GEOTROPH-02048,TATA7980
2016,259,1079,GEOTROPH-01909,TATA3405
2016,55,1003,GEOTROPH-02147,TATA7980
2016,205,1031,GEOTROPH-02037,TATA7980
2016,21,1036,GEOTROPH-02060,TATA7980
2016,207,1032,GEOTROPH-01984,TATA7980
2016,51,1037,GEOTROPH-02154,TATA7980
2016,55,1021,GEOTROPH-02011,TATA7980
2016,341,1018,GEOTROPH-01968,TATA7980
2016,43,1020,GEOTROPH-02089,TATA7980
2016,181,1016,GEOTROPH-02047,TATA7980
2016,89,1032,GEOTROPH-02158,TATA7980
2016,71,1032,GEOTROPH-01923,TATA3405
2016,175,1025,GEOTROPH-01906,TATA3405
2016,43,1066,GEOTROPH-01925,TATA3405
2016,201,1028,GEOTROPH-01961,TATA7980
2016,201,1042,GEOTROPH-01916,TATA3405
2016,31,1021,GEOTROPH-02184,TATA7980
2016,39,1017,GEOTROPH-02141,TATA7980
2016,223,1014,GEOTROPH-02119,TATA7980
2016,41,1040,GEOTROPH-02052,TATA7980
2016,57,1034,GEOTROPH-02118,TATA7980
2016,79,1061,GEOTROPH-01909,TATA3405
2016,35,1006,GEOTROPH-02147,TATA7980
2016,341,1032,GEOTROPH-02176,TATA7980
2016,33,1020,GEOTROPH-02116,TATA7980
2016,167,1043,GEOTROPH-01929,TATA3405


I want to split this file into as many files as are testcases in column 5 ( 2 files in this case TATA34052016__testcase.csv and TATA79802016__testcase.csv), each containing the header line

Here is my code

Code:
awk -F, -v fl="testcase"  'NR == 1 {hdr = $0
       next
}
f != $5"2016__"fl".csv" {
        if(f) close(f)
        f = $5"2016__"fl".csv"
        print hdr > f
}
{       print >> f
}' tmp



which produces the 2 files but deletes some records.


My desired output is
Code:
TATA79802016__testcase.csv

year,min,max,Instrument,Testcase
2016,201,1003,GEOTROPH-02116,TATA7980
2016,53,1011,GEOTROPH-01963,TATA7980
2016,3,1024,GEOTROPH-02067,TATA7980
2016,203,1027,GEOTROPH-02011,TATA7980
2016,71,1036,GEOTROPH-02030,TATA7980
2016,173,1046,GEOTROPH-02033,TATA7980
2016,183,1015,GEOTROPH-02120,TATA7980
2016,47,1037,GEOTROPH-02154,TATA7980
2016,63,1017,GEOTROPH-01959,TATA7980
2016,73,1019,GEOTROPH-01966,TATA7980
2016,317,1012,GEOTROPH-02077,TATA7980
2016,9,1027,GEOTROPH-02089,TATA7980
2016,61,1018,GEOTROPH-01947,TATA7980
2016,341,1029,GEOTROPH-02087,TATA7980
2016,341,1015,GEOTROPH-02004,TATA7980
2016,93,1027,GEOTROPH-02066,TATA7980
2016,205,1007,GEOTROPH-02021,TATA7980
2016,79,1013,GEOTROPH-02143,TATA7980
2016,51,1050,GEOTROPH-02000,TATA7980
2016,85,1010,GEOTROPH-02048,TATA7980
2016,47,1036,GEOTROPH-02048,TATA7980
2016,55,1003,GEOTROPH-02147,TATA7980
2016,205,1031,GEOTROPH-02037,TATA7980
2016,21,1036,GEOTROPH-02060,TATA7980
2016,207,1032,GEOTROPH-01984,TATA7980
2016,51,1037,GEOTROPH-02154,TATA7980
2016,55,1021,GEOTROPH-02011,TATA7980
2016,341,1018,GEOTROPH-01968,TATA7980
2016,43,1020,GEOTROPH-02089,TATA7980
2016,181,1016,GEOTROPH-02047,TATA7980
2016,89,1032,GEOTROPH-02158,TATA7980
2016,201,1028,GEOTROPH-01961,TATA7980
2016,31,1021,GEOTROPH-02184,TATA7980
2016,39,1017,GEOTROPH-02141,TATA7980
2016,223,1014,GEOTROPH-02119,TATA7980
2016,41,1040,GEOTROPH-02052,TATA7980
2016,57,1034,GEOTROPH-02118,TATA7980
2016,35,1006,GEOTROPH-02147,TATA7980
2016,341,1032,GEOTROPH-02176,TATA7980
2016,33,1020,GEOTROPH-02116,TATA7980

and TATA34052016__testcase.csv

year,min,max,Instrument,Testcase
2016,135,1016,GEOTROPH-01932,TATA3405
2016,65,1023,GEOTROPH-01931,TATA3405
2016,259,1079,GEOTROPH-01909,TATA3405
2016,71,1032,GEOTROPH-01923,TATA3405
2016,175,1025,GEOTROPH-01906,TATA3405
2016,43,1066,GEOTROPH-01925,TATA3405
2016,201,1042,GEOTROPH-01916,TATA3405
2016,79,1061,GEOTROPH-01909,TATA3405
2016,167,1043,GEOTROPH-01929,TATA3405



Please assist with correcting my code.
# 2  
Old 11-14-2016
Hi,

Can you try this one ?

Code:
awk -F, 'NR==1{a=$0;next} $5 ~ /TATA7980/ { if (NR==2) print a > $5".csv";else print > $5".csv"} $5 ~ /TATA3405/ { print > $5"_f1.csv"} ' OFS=","file

Modify for filenames a bit.

Last edited by greet_sed; 11-14-2016 at 07:06 PM.. Reason: Update code
# 3  
Old 11-14-2016
Code:
yr=$(date +%Y)
rm -f *_testcase.csv
while read line
do
   [[ -z "$line1" ]] && { line1=$line ; continue ; }
   fl=${line##*,}${yr}__testcase.csv
   [[ -f "$fl" ]] || echo "$line1" > "$fl"
   echo "$line" >> "$fl"
done < infile.csv

This User Gave Thanks to rdrtx1 For This Post:
# 4  
Old 11-14-2016
Can you do Perl?

This one has the advantage of taking in consideration the year of the data.

Save as process_input.pl
Run as perl process_input.pl testcase senhia83.input
Make sure you enter two arguments:
The suffix to the filename
The input filename

Code:
#!/usr/bin/perl
use strict;
use warnings;

my %data;
my $file_concat = shift;

my $header = <>;
while(<>) {
    chomp;
    my @fields = split ',';
    push @{$data{$fields[-1]}{$fields[0]}}, $_;
}

for my $fname (keys %data) {
    for my $year (keys $data{$fname}) {
        open my $fw, '>', "$fname${year}__$file_concat.csv" || die;
        print $fw $header;
        for my $value (@{$data{$fname}{$year}}) {
            print $fw "$value\n";
        }
        close $fw;
    }
}

This User Gave Thanks to Aia For This Post:
# 5  
Old 11-14-2016
This is great, is it too much trouble to modify Aia and rdrtx1 code, if I want to specify the column number to key off on? The testcase column may not be the last column in the actual dataset.

So how do I change the code if the key column is col62 among 128 columns? Thanks in advance.
# 6  
Old 11-14-2016
Instead of:
Code:
push @{$data{$fields[-1]}{$fields[0]}}, $_;

Change to:
Code:
push @{$data{$fields[61]}{$fields[0]}}, $_;

Always, one less than what you count. The amount of total columns is not consequential. It can be an arbitrary amount of columns.

Last edited by Aia; 11-14-2016 at 10:26 PM..
# 7  
Old 11-14-2016
Aia,

This is the error i get. Am I running it wrong?
Code:
$ perl spltwheader.pl testcase tmp6

Type of arg 1 to keys must be hash (not hash element) at spltwheader.pl line 16, near "}) "
Execution of spltwheader.pl aborted due to compilation errors.

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Split large xml into mutiple files and with header and footer in file tried below it splits unevenly and also i need help in adding header and footer command : csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1} sample xml <?xml version="1.0" encoding="UTF-8"?><Recipient>... (36 Replies)
Discussion started by: karthik
36 Replies

2. Shell Programming and Scripting

Split a file based on encountering header

I need to split a file based on headers found Input file file1 content: ADD john mickey DROP matt sam output of file F1 john mickey output of file F2 matt sam (5 Replies)
Discussion started by: Diddy
5 Replies

3. Shell Programming and Scripting

Sort and Split file with header and custom name

Hi, I am using SUN SOLARIS (SunOS sun4v sparc SUNW, T5240). I have a huge data file with header and trailer. This file gets used into an ETL process. ETL skips the header record (which is the first record of the file) and loads the rest of the record. The file can be delimited (comma,... (5 Replies)
Discussion started by: Saanvi1
5 Replies

4. Shell Programming and Scripting

Split and add header and trailer from input file

I need to split the file based on pattern from position 34-37 while retaining the header and trailer records in each individual split file Also is it possible to output the TOM and PAT records in the same output file ? I need the output file names same as xyz_pattern_Datetimestamp.txt ... (23 Replies)
Discussion started by: techedipro
23 Replies

5. UNIX for Dummies Questions & Answers

Column Header in the Spool file

Hi All, I have a problem with the column heading while spooling the data from the database. Since i want the column header, therefore i am not using SET HEADING OFF, and i am getting the header, but for the longer column name, some of the character from the column name is missing. for... (7 Replies)
Discussion started by: Pramod_009
7 Replies

6. UNIX for Dummies Questions & Answers

Rename a header column by adding another column entry to the header column name

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (1 Reply)
Discussion started by: Vavad
1 Replies

7. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (4 Replies)
Discussion started by: Vavad
4 Replies

8. UNIX for Dummies Questions & Answers

split header row into one column

So, I have a massive file with thousands of columns I want a list of the headers in one column in another file. So I need to strip off the top line (can use head-1) But how can I convert from this format: A B C D E F G to A B C D E F G (6 Replies)
Discussion started by: polly_falconer
6 Replies

9. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (7 Replies)
Discussion started by: ashish4422
7 Replies

10. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (1 Reply)
Discussion started by: ashish4422
1 Replies
Login or Register to Ask a Question