Split file by column value, each with header Post: 302985797

Sponsored Content

Top Forums Shell Programming and Scripting Split file by column value, each with header Post 302985797 by drl on Tuesday 15th of November 2016 11:18:05 AM

11-15-2016

Registered User

Hi.

At our shop, we often try to generalize problem solutions, rather than having a number of separate solutions for similar problems.

In this kind of case we created a code gate to handle these kind of situations. The center of this would be the single command:

Code:

gate -f=5 $FILE

Here is how we would start to solve this:

Code:

#!/usr/bin/env bash

# @(#) s1       Demonstrate collection of matching lines to separate files, gate.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dixf gate
pe
dixf gate

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE, lines = $( wc -l < $FILE):"
head $FILE

pl " Expected output:"
grep csv $E

pl " Results, dry run:"
gate --dryrun --field=5 $FILE

pl " Results, real run:"
gate -f=5 $FILE
wc -l Testcase TATA*

pl " Documentation for gate:"
gate -h

exit 0

producing:

Code:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.6 (jessie) 
bash GNU bash 4.3.30
dixf (local) 1.12
gate (local) 1.10

gate    Group, collect, sift lines, separate lines into buckets. (what)
Path    : ~/bin/gate
Length  : 278 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Help    : probably available with [     ]-h

-----
 Input data file data1, lines = 50:
year,min,max,Instrument,Testcase
2016,201,1003,GEOTROPH-02116,TATA7980
2016,53,1011,GEOTROPH-01963,TATA7980
2016,3,1024,GEOTROPH-02067,TATA7980
2016,203,1027,GEOTROPH-02011,TATA7980
2016,135,1016,GEOTROPH-01932,TATA3405
2016,71,1036,GEOTROPH-02030,TATA7980
2016,173,1046,GEOTROPH-02033,TATA7980
2016,183,1015,GEOTROPH-02120,TATA7980
2016,47,1037,GEOTROPH-02154,TATA7980

-----
 Expected output:
TATA79802016__testcase.csv
and TATA34052016__testcase.csv

-----
 Results, dry run:
     9 entries in group (good) TATA3405
    40 entries in group (good) TATA7980
     1 entries in group (good) Testcase

-----
 Results, real run:
   1 Testcase
   9 TATA3405
  40 TATA7980
  50 total

-----
 Documentation for gate:
NAME
    gate - Group And Transfer Entries

VERSION
    1.0

USAGE
    gate [-d] [-c m,n] [-f n] [-h] [-s c] [files]

REQUIRED ARGUMENTS
    None.

OPTIONS
    -d <off>
        dry-run, determine filenames and counts. A basic sanity check on
        filenames is performed. This is recommended for a first run.

    -c m,n <0>
        Select column mode: define columns m through n as the filenames to
        which lines will be written. The columns are numbered beginning with
        1 on the left.

    -f number <0>
        Choose field number as the group. Data will be written to filenames
        corresponding to the contents of this field. The field numbering
        begins with 1 on the left.

    -h <off>
        Print this man page.

    -s character string <,>
        Select one or more characters as the separator character.

    files <STDIN>
        Specify files to process.

DESCRIPTION
    For each line read, the id value from the field or columns specified is
    isolated from each and used as a filename, to which that line is
    written. As many files are opened and written as are needed.

This may not be the final approach, but it would be a start for us.

Best wishes ... cheers, drl

---------- Post updated at 10:18 ---------- Previous update was at 09:46 ----------

Hi.

Quote:

Originally Posted by RavinderSingh13

... I would like to mention here wouldn't be it good to close the opened files or if there are too many files opened it may be an issue with it. ...

I seem to recall testing the timing between closing and not closing files in this kind of operation. My recollection is that it added a large amount of time to do the close-open-close... operation.

While there may be situations where this might be needed. I'd be tempted to consider splitting the input file, and do separate operations on the split pieces (for example). Multiple cores/CPU can help with parallelization in such cases, say with command parallel.

Best wishes ... cheers, drl

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file?

2. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file?

3. UNIX for Dummies Questions & Answers

split header row into one column

So, I have a massive file with thousands of columns I want a list of the headers in one column in another file. So I need to strip off the top line (can use head-1) But how can I convert from this format: A B C D E F G to A B C D E F G

4. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256...

5. UNIX for Dummies Questions & Answers

Rename a header column by adding another column entry to the header column name

6. UNIX for Dummies Questions & Answers

Column Header in the Spool file

Hi All, I have a problem with the column heading while spooling the data from the database. Since i want the column header, therefore i am not using SET HEADING OFF, and i am getting the header, but for the longer column name, some of the character from the column name is missing. for...

7. Shell Programming and Scripting

Split and add header and trailer from input file

I need to split the file based on pattern from position 34-37 while retaining the header and trailer records in each individual split file Also is it possible to output the TOM and PAT records in the same output file ? I need the output file names same as xyz_pattern_Datetimestamp.txt ...

8. Shell Programming and Scripting

Sort and Split file with header and custom name

Hi, I am using SUN SOLARIS (SunOS sun4v sparc SUNW, T5240). I have a huge data file with header and trailer. This file gets used into an ETL process. ETL skips the header record (which is the first record of the file) and loads the rest of the record. The file can be delimited (comma,...

9. Shell Programming and Scripting

Split a file based on encountering header

I need to split a file based on headers found Input file file1 content: ADD john mickey DROP matt sam output of file F1 john mickey output of file F2 matt sam

10. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Split large xml into mutiple files and with header and footer in file tried below it splits unevenly and also i need help in adding header and footer command : csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1} sample xml <?xml version="1.0" encoding="UTF-8"?><Recipient>...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split large file and add header and footer to each file

Discussion started by: ashish4422

2. Shell Programming and Scripting

Split large file and add header and footer to each small files

Discussion started by: ashish4422

3. UNIX for Dummies Questions & Answers

split header row into one column

Discussion started by: polly_falconer

4. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Discussion started by: Vavad

5. UNIX for Dummies Questions & Answers

Rename a header column by adding another column entry to the header column name

Discussion started by: Vavad

6. UNIX for Dummies Questions & Answers

Column Header in the Spool file

Discussion started by: Pramod_009

7. Shell Programming and Scripting

Split and add header and trailer from input file

Discussion started by: techedipro

8. Shell Programming and Scripting

Sort and Split file with header and custom name

Discussion started by: Saanvi1

9. Shell Programming and Scripting

Split a file based on encountering header

Discussion started by: Diddy

10. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Discussion started by: karthik