KSH script for split a txt file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting KSH script for split a txt file
# 1  
Old 12-22-2011
KSH script for split a txt file

I have a problem which I would like to solve by using UNIX power and inspired minds around world. Here is the problem

I have a text file and it has data as follows
Code:
1X.....................1234567890123456789T1234598765XT1 (header)
1Z01............(sub HEADER)
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002.........
.
.
.
.
.
.
P110000..............
.
.
.
.
1Z502
.
.
.
.
P110003.......
Q1.....
R1...
S1....
4C.........(TRAILER)


So , in between header and trailer, there could be any no of records enclosed.

I need to split the file for every 10000 P1's, I want to create a new file but at the same time when create new file, I need to create a header header will have the count of 1Z and number of P1 in it as if you see the first line 12345 (1Z) and 98765 (P1) at the end.

The other thing is from header 1 to 19 chars should be unique in each file.

How to write a script in ksh


thank you for your help and support.

The out put file should be like below:

File 1:
Code:
1X.....................1234567890123456789T0050510000XT1 (header)
1Z01............(sub HEADER)
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
P100002.........
.
.
.
.
.
.
1Z505.............
P110000..............
Q1......
R1.......
4C............(Trailer static)

File 2:
Code:
1X.....................1234567890123456790T0000200003XT1 (header)
 1Z01............(sub HEADER)
 P100001............
 Q1........
 R1.................
 P100002.........
 Q1..................
 R1......................
 1Z02............
 P100001..........
 Q1.............
 R1...........
4C.........(TRAILERstatic)

Chowhan
Smilie

Moderator's Comments:
Mod Comment How to use code tags

Last edited by Franklin52; 12-23-2011 at 03:53 AM.. Reason: Please use code tags for code and data samples, thank you
# 2  
Old 12-23-2011
Try and adapt the following script :

Code:
#!/usr/bin/ksh

InFile=./chowhan.dat        # Input file
OutSpec='./chowhan_%d.dat'  # Outfiles specification, %d for file number
Pmax=5                      # Split every Pmax P1 record

awk -v PMAX="${Pmax}" -v OUT="${OutSpec}" '
    #----------------------
    # Functions
    #-----------------------
    
    function newOutFile() {
        ++FilesCount;
        Files[FilesCount      ] = sprintf(OUT ".tmp", FilesCount);
        Files[FilesCount, "1Z"] = 0;
        Files[FilesCount, "P1"] = 0;
        return FilesCount;
    }
    function writeSubHeader() {
        if (! (FileIndx in Files)) newOutFile();
        FileName = Files[FileIndx];
        Files[FileIndx, "1Z"  ]++;
        print SubHeader > FileName;
    }

    #----------------------
    # Actions
    #----------------------

    /^1X/ { 
        Header   = $0;
        FileIndx = 1;
        next;
    }
    /^1Z/ { 
        SubHeader = $0;
        FileIndx  = 1;
        Pcount    = 0;
        writeSubHeader();
        next 
    }
    /^P1/ {
        if (++Pcount > PMAX) {
            Pcount = 0;
            ++FileIndx;
            writeSubHeader();
        }
        FileName = Files[FileIndx];
        Files[FileIndx, "P1"]++;
        print $0 > FileName;       
        next;
    }           
    /^4C/   {
        Trailer = $0;
        exit;
    }
    {
        print $0 > FileName;
        next;
    }
    
    END {
        for (i=1; i<=FilesCount; i++) {
            FileName = Files[i];
            print Trailer > FileName;
            close(FileName);
            ResultFileName = sprintf(OUT, i)
            printf("%s%0.5d%0.5d%s\n", substr(Header,1,length(Header)-13), 
                                     Files[i,"1Z"], 
                                     Files[i,"P1"], 
                                     substr(Header, length(Header)-2,3)) > ResultFileName;
            close(ResultFileName);
            system(sprintf("/usr/bin/cat %s >>%s && /usr/bin/rm %s", FileName, ResultFileName, FileName));
        }
    }
' ${InFile}

InputFile chowhan.dat :
Code:
1X.....................1234567890123456789T0000300013XT1
1Z01............
P101001............
Q1........
R1.................
P101002.........
Q1..................
R1......................
1Z02............
P102001..........
Q1.............
R1...........
S1.....
P102002..........
Q1.............
R1...........
S1.....
P102003..........
Q1.............
R1...........
S1.....
P102004..........
Q1.............
R1...........
S1.....
P102005..........
Q1.............
R1...........
S1.....
1Z03
P103001..........
Q1.............
R1...........
S1.....
P103002..........
Q1.............
R1...........
S1.....
P103003..........
Q1.............
R1...........
S1.....
P103004..........
Q1.............
R1...........
S1.....
P103005..........
Q1.............
R1...........
S1.....
P103006..........
Q1.............
R1...........
S1.....
4C.........

Output File chowhan_1.dat :
Code:
1X.....................1234567890123456789T0000300012XT1
1Z01............
P101001............
Q1........
R1.................
P101002.........
Q1..................
R1......................
1Z02............
P102001..........
Q1.............
R1...........
S1.....
P102002..........
Q1.............
R1...........
S1.....
P102003..........
Q1.............
R1...........
S1.....
P102004..........
Q1.............
R1...........
S1.....
P102005..........
Q1.............
R1...........
S1.....
1Z03
P103001..........
Q1.............
R1...........
S1.....
P103002..........
Q1.............
R1...........
S1.....
P103003..........
Q1.............
R1...........
S1.....
P103004..........
Q1.............
R1...........
S1.....
P103005..........
Q1.............
R1...........
S1.....
4C.........

Output file chowhan_2.dat :
Code:
1X.....................1234567890123456789T0000100001XT1
1Z03
P103006..........
Q1.............
R1...........
S1.....
4C.........

Jean-Pierre.

Last edited by aigles; 12-23-2011 at 10:33 AM..
# 3  
Old 12-23-2011
You are almost close just steps away.

Thank you Jean for your quick reply.

You are very useful on this issue, let me reconfirm the requirement so it is easy to understand and write the program as needed.

as you assumed the input file values are below: (the file is fixed length)

all the P1 values are incremental.


1X.....................1234567890123456111T0000300013XT1
1Z01............no of P1's as 2 (sub header 1)
P100001............ (it is 5 digit value for subheader and not 01001)
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............no of P1's as 5 (sub header 2)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
1Z03............no of P1's as 6(sub header 3)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
P100006..........
Q1.............
R1...........
S1.....
4C.........


--------------Out puts should be like below-----------

If we think of doing split for every 5 P1's in to a seperate file then the out put should be like below:

File 1:

1X.....................1234567890123456111T0000200005XT1
1Z01............ (sub header 1)
P100001........no of P1's as 2(it is 5 digit value for subheader and not 01001)
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............no of P1's as 3(sub header 2)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

File 2:

1X.....................1234567890123456112T0000200005XT1
1Z01(2)............no of P1's as 2(sub header 1(2) recounting again)
P100001(4).......... four should be recounted again
Q1.............
R1...........
S1.....
P100002(5)..........
Q1.............
R1...........
S1.....
1Z02(3)............no of P1's as 3(sub header 3)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

File 3:

1X.....................1234567890123456113T0000100003XT1
1Z01(3)............(sub header 1(3))
P100001(4)..........
Q1.............
R1...........
S1.....
P100002(5)..........
Q1.............
R1...........
S1.....
P100003(6)..........
Q1.............
R1...........
S1.....
4C.........


I think the above example gives good idea how it looks like in the output files.

1) for every 5 P1's , there should be a new file which has header and trailer
2) For the 1Z records, the number starts counts again with 1 for next file, if the 1Z has more than 5 P1's even the rule is same.
3) The header should represent how many 1Z and P1's in it aslo needs to increment the last digit by one (or some thing to add one for the 19 digits) to make it unique for each out put files.


I really appreciate your help and support.


Happy New year
in advance and
Smilie
Smilie

Marry Christmas.


Chowhan Smilie
# 4  
Old 01-02-2012
Happy New Year !

Try and adapt the new version of the script (chowhan.ksh):
Code:
#!/usr/bin/ksh

InFile=./chowhan.dat        # Input file
OutSpec='./chowhan_%d.dat'  # Outfiles specification, %d for file number
P1max=5                      # Split every Pmax P1 record

rm -f ./chowhan_*.dat* >/dev/null 2<&1

/usr/xpg4/bin/awk -v P1MAX="${P1max}" -v OUT="${OutSpec}" '

    #----------------------
    # Functions
    #-----------------------

    function closeOutFile() {
        if (FileIndx > 0) close(FileName);
    }

    function openNewOutFile() {
        FileName = sprintf(OUT ".tmp", ++FileIndx);
        Files[FileIndx      ] = FileName;
        Files[FileIndx, "1Z"] = 0;
        Files[FileIndx, "P1"] = 0;
        SubHeaderCount        = 0;
        P1Zcount              = 0;
        P1count               = 0;
    }

    function switchOutFile() {
        closeOutFile();
        openNewOutFile();
    }

    function writeSubHeader() {
        if (! FileIndx) openNewOutFile();
        Files[FileIndx, "1Z"]++;
        printf "1Z%0.2d%s\n", ++SubHeaderCount, SubHeader > FileName;
    }

    function writeP1() {
        if (P1count >= P1MAX) {
            switchOutFile();
            writeSubHeader();
        }
        P1Zcount++;
        P1count++;
        Files[FileIndx, "P1"]++;
        printf "P1%0.5d%s\n", P1Zcount, substr($0,8) > FileName;
    }

    #----------------------
    # Actions
    #----------------------

    /^1X/ {
        Header   = $0;
        HeaderHead = substr($0, 1, length($0)-15);
        HeaderMid  = substr($0, length($0)-13, 1);
        HeaderTail = substr($0, length($0)-2);
        FileIndx = 0;
        next;
    }

    /^1Z/ {
        SubHeader = substr($0, 5);
        P1Zcount   = 0;
        writeSubHeader();
        next;
    }

    /^P1/ {
        writeP1();
        next;
    }
    /^4C/   {
        Trailer = $0;
        exit;
    }
    {
        print $0 > FileName;
        next;
    }

    END {
        closeOutFile();
        HeaderId = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        for (i=1; i<=FileIndx; i++) {
            FileName = Files[i];
            print Trailer >> FileName;
            close(FileName);
            ResultFileName = sprintf(OUT, i)
            printf("%s%1.1s%s%0.5d%0.5d%s\n",   HeaderHead,
                                                substr(HeaderId, i, 1) ,
                                                HeaderMid,
                                                Files[i,"1Z"],
                                                Files[i,"P1"],
                                                HeaderTail ) > ResultFileName;
            close(ResultFileName);
            system(sprintf("/usr/bin/cat %s >>%s && /usr/bin/rm %s", FileName, ResultFileName, FileName));
        }
    }
' ${InFile}

Input file (chowhan.dat) :
Code:
1X.....................1234567890123456111T0000300013XT1
1Z01............
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
1Z03............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
P100006..........
Q1.............
R1...........
S1.....
4C.........

Running the script :
Code:
$ ls chowhan*
chowhan.dat       chowhan.ksh
$ ./chowhan.ksh
$ ls chowhan*
chowhan_1.dat     chowhan_2.dat     chowhan_3.dat     chowhan.dat       chowhan.ksh

Output file 1 (chowhan_1.dat) :
Code:
1X.....................1234567890123456110T0000200005XT1
1Z01............
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

Output file 2 (chowhan_2.dat) :
Code:
1X.....................1234567890123456111T0000200005XT1
1Z01............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

Output file 3 (chowhan_3.dat) :
Code:
1X.....................1234567890123456112T0000100003XT1
1Z01............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........


Last edited by aigles; 01-02-2012 at 10:52 AM..
# 5  
Old 01-09-2012
Thank you
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split a txt file on the basis of line number

I have to split a file containing 100 lines to 5 files say from lines ,1-20 ,21-30 ,31-40 ,51-60 ,61-100 Here is i can do it for 2 file but how to handle it for more than 2 files awk 'NR < 21{ print >> "a"; next } {print >> "b" }' $input_file Please advidse. Thanks (4 Replies)
Discussion started by: abhaydas
4 Replies

2. UNIX for Dummies Questions & Answers

Split Every Line In Txt Into Separate Txt File, Named Same As The Line

Hi All Is there a way to export every line into new txt file where by the title of each txt output are same as the line ? I have this txt files containing names: Kandra Vanhooser Rhona Menefee Reynaldo Hutt Houston Rafferty Charmaine Lord Albertine Poucher Juana Maes Mitch Lobel... (2 Replies)
Discussion started by: Nexeu
2 Replies

3. Shell Programming and Scripting

Call a Perl script within a bash script and store the ouput in a .txt file

I'm attempting to write a bash script that will create a network between virtual machines. It accepts three arguments: an RSpec that describes the network topology, and two list of machines (servers and clients). I have a (working) Perl script that I want to call. This Perl script takes an RSpec... (6 Replies)
Discussion started by: mecaka
6 Replies

4. Shell Programming and Scripting

How to split this txt file into small files?

Dear shell experts, I would like to spilt a txt file into small ones. However, I did not know how to program use shell. If someone could help, it is greatly appreciated! Specifically, I supposed there is file named A.txt. The content of the file likes this: Subject run condtion ACC time... (3 Replies)
Discussion started by: psychmyluo
3 Replies

5. Shell Programming and Scripting

ksh program that finds the lowest number in a .txt file

i am having a problem finding the lowest number after punching in a bunch of numbers in the .txt file but its probably the way i have the code set up. help please! (4 Replies)
Discussion started by: tinsteer
4 Replies

6. Programming

import .txt and split word into array C

Hi, if I want to import .txt file that contain information and the number separate by space how can I split and put into array In C Example of .txt file 3 Aqaba 49789 10000 5200 25.78 6987 148976 12941 15.78 99885 35262 2501 22.98 Thank (3 Replies)
Discussion started by: guidely
3 Replies

7. Shell Programming and Scripting

ksh shell script to add date (YYYYMMDDHHMISS) to all .txt files in a folder

Everyday 15 files are written to a folder \app\where\thefiles\are\destined\CURRFOLDER Task1: I need to add date in YYYYMMDDHHMISS format to each of them. Example: File: ACCOUNT.txt Should be updated as: ACCOUNT_20101005175059.txt Task 2: After I update the files, they need to be ... (2 Replies)
Discussion started by: Duminix
2 Replies

8. Shell Programming and Scripting

KSH script to run other ksh scripts and output it to a file and/or email

Hi I am new to this Scripting process and would like to know How can i write a ksh script that will call other ksh scripts and write the output to a file and/or email. For example ------- Script ABC ------- a.ksh b.ksh c.ksh I need to call all three scripts execute them and... (2 Replies)
Discussion started by: pacifican
2 Replies

9. UNIX for Dummies Questions & Answers

Binary txt file received when i use uuencode to send txt file as attachment

Hi, I have already read a lot of posts on sending attachments in unix...but none of them were of help for my problem...so here goes.. i wanna attach a text file and send to a mail id..used the following code : uuencode "$File1" "$File1" ;|mail -s "$Mail_sub" abc@abc.com it works... (2 Replies)
Discussion started by: ash22
2 Replies

10. Shell Programming and Scripting

unix script to takes the old data from a TXT file and compress them into new file

Hi, I am looking for the unix script which can takes the 2 month old data from a TXT file (there is one txt file in whiche messages are appended on daily basis) and compress them into new file.Please halp me out. (2 Replies)
Discussion started by: vpandey
2 Replies
Login or Register to Ask a Question