Sponsored Content
Top Forums Shell Programming and Scripting Split Large Files Based On Row Pattern.. Post 302876627 by aimy on Monday 25th of November 2013 10:21:37 PM
Old 11-25-2013
Split Large Files Based On Row Pattern..

Hi all.

I've tried searching the web but could not find similar problem to mine.

I have one large file to be splitted into several files based on the matching pattern found in each row.

For example, let's say the file content:
Code:
MONTH,ACCOUNT_NO,CUSTOMER_NAME,SEGMENT_CODE,COST_CENTER,BILLED_DATE,BILL_FREQ,BILL_START_DATE,BILL_END_DATE,BILLED_RENTAL,BILLED_EARNED,PREVIOUS_BILLED_EARNED,REMAINING_UNEARNED_BALANCE,TOTAL_REVENUE_EARNED
08,D919518500104,HENG POH MING,R20,YRAC33,BP 04,Monthly,04/08/13,03/09/13,25,22.58,0,2.42,22.58
08,A100027860305,PANG KAM SENG,S10,YNDSAA,BP 13,Monthly,13/07/13,12/08/13,25,15.32,9.68,0,25
08,A100026920403,LIM YOKE TIN,S10,YNDSAA,BP 13,Monthly,13/07/13,12/08/13,111,68.03,42.97,0,111
08,D925038340109,SITI SHARAH BINTI OTHMAN,R20,YRAC33,BP 04,Monthly,04/07/13,03/08/13,25,22.58,2.42,0,25
08,D217414580206,ABDOL ADI BIN BABA,R20,YRACMM,BP 19,Monthly,19/08/13,18/09/13,85,35.64,0,49.36,35.64
08,A100015330204,CHUA THIAN SONG,S10,YNDSAA,BP 13,Monthly,13/08/13,12/09/13,45,27.58,0,17.42,27.58
08,A100011760104,TOYO ENTERPRISE,S10,YNDSAA,BP 13,Monthly,13/08/13,12/09/13,45,27.58,0,17.42,27.58
08,A100036551202,LEE HUA FONG,S10,YNDSAA,BP 13,Monthly,13/07/13,12/08/13,113,69.26,43.74,0,113
08,A600022365604,LIM HOOI LENG,R30,YRACPP,BP 19,Monthly,19/07/13,18/08/13,135,56.61,78.39,0,135
08,A100036551202,LEE HUA FONG,S10,YNDSAA,BP 13,Monthly,13/08/13,12/09/13,113,69.26,0,43.74,69.26
08,N351670650204,TING SIEW KIONG,R30,YRAUSI,BP 04,Monthly,04/07/13,03/08/13,115,103.87,11.13,0,115
08,N351670650204,TING SIEW KIONG,R30,YRAUSI,BP 04,Monthly,04/08/13,03/09/13,115,103.87,0,11.13,103.87
08,D919518500104,HENG POH MING,R20,YRAC33,BP 04,Monthly,04/07/13,03/08/13,25,22.58,2.42,0,25
08,Y102551020201,LIM LIAN HOCK,S10,YNDSJJ,BP 04,Monthly,04/08/13,03/09/13,20,18.06,0,1.94,18.06
08,D925038340109,SITI SHARAH BINTI OTHMAN,R20,YRAC33,BP 04,Monthly,04/08/13,03/09/13,25,22.58,0,2.42,22.58
08,D214091570207,CHUAH ANG TUAN,R60,YRACAA,BP 19,Monthly,19/08/13,18/09/13,113,47.38,0,65.62,47.38
08,Y502630980403,BAN SENG CHAN SDN BHD,S10,YNDSPP,BP 04,Monthly,04/07/13,03/08/13,45,40.65,4.35,0,45
08,D214091570207,CHUAH ANG TUAN,R60,YRACAA,BP 19,Monthly,19/07/13,18/08/13,113,47.38,65.62,0,113
08,A100015330204,CHUA THIAN SONG,S10,YNDSAA,BP 13,Monthly,13/07/13,12/08/13,45,27.58,17.42,0,45
08,Y502630980403,BAN SENG CHAN SDN BHD,S10,YNDSPP,BP 04,Monthly,04/08/13,03/09/13,45,40.65,0,4.35,40.65
08,F221456710106,NORHAZLINA BINTI ABDUL HAMID,R10,YRACAA,BP 19,Monthly,19/07/13,18/08/13,91,38.16,52.84,0,91
08,A100011760104,TOYO ENTERPRISE,S10,YNDSAA,BP 13,Monthly,13/07/13,12/08/13,45,27.58,17.42,0,45
08,A600022365604,LIM HOOI LENG,R30,YRACPP,BP 19,Monthly,19/08/13,18/09/13,135,56.61,0,78.39,56.61
08,D208631350106,ZAILAN BIN KHASRAN,R30,YRACNN,BP 19,Monthly,19/08/13,18/09/13,25,10.48,0,14.52,10.48
08,D217414580206,ABDOL ADI BIN BABA,R20,YRACMM,BP 19,Monthly,19/07/13,18/08/13,85,35.64,49.36,0,85
08,A100027860305,PANG KAM SENG,S10,YNDSAA,BP 13,Monthly,13/08/13,12/09/13,25,15.32,0,9.68,15.32
08,F221456710106,NORHAZLINA BINTI ABDUL HAMID,R10,YRACAA,BP 19,Monthly,19/08/13,18/09/13,91,38.16,0,52.84,38.16
08,A100026920403,LIM YOKE TIN,S10,YNDSAA,BP 13,Monthly,13/08/13,12/09/13,111,68.03,0,42.97,68.03

So, I want to split based on that 6th column value (comma delimited) and thus
Code:
08,D919518500104,HENG POH MING,R20,YRAC33,BP 04,Monthly,04/08/13,03/09/13,25,22.58,0,2.42,22.58
08,D925038340109,SITI SHARAH BINTI OTHMAN,R20,YRAC33,BP 04,Monthly,04/07/13,03/08/13,25,22.58,2.42,0,25
08,N351670650204,TING SIEW KIONG,R30,YRAUSI,BP 04,Monthly,04/07/13,03/08/13,115,103.87,11.13,0,115
08,N351670650204,TING SIEW KIONG,R30,YRAUSI,BP 04,Monthly,04/08/13,03/09/13,115,103.87,0,11.13,103.87
08,D919518500104,HENG POH MING,R20,YRAC33,BP 04,Monthly,04/07/13,03/08/13,25,22.58,2.42,0,25
08,Y102551020201,LIM LIAN HOCK,S10,YNDSJJ,BP 04,Monthly,04/08/13,03/09/13,20,18.06,0,1.94,18.06
08,D925038340109,SITI SHARAH BINTI OTHMAN,R20,YRAC33,BP 04,Monthly,04/08/13,03/09/13,25,22.58,0,2.42,22.58
08,Y502630980403,BAN SENG CHAN SDN BHD,S10,YNDSPP,BP 04,Monthly,04/07/13,03/08/13,45,40.65,4.35,0,45
08,Y502630980403,BAN SENG CHAN SDN BHD,S10,YNDSPP,BP 04,Monthly,04/08/13,03/09/13,45,40.65,0,4.35,40.65

---> BP 04.TXT
Code:
MONTH,ACCOUNT_NO,CUSTOMER_NAME,SEGMENT_CODE,COST_CENTER,BILLED_DATE,BILL_FREQ,BILL_START_DATE,BILL_END_DATE,BILLED_RENTAL,BILLED_EARNED,PREVIOUS_BILLED_EARNED,REMAINING_UNEARNED_BALANCE,TOTAL_REVENUE_EARNED
08,D217414580206,ABDOL ADI BIN BABA,R20,YRACMM,BP 19,Monthly,19/08/13,18/09/13,85,35.64,0,49.36,35.64
08,A600022365604,LIM HOOI LENG,R30,YRACPP,BP 19,Monthly,19/07/13,18/08/13,135,56.61,78.39,0,135
08,D214091570207,CHUAH ANG TUAN,R60,YRACAA,BP 19,Monthly,19/08/13,18/09/13,113,47.38,0,65.62,47.38
08,D214091570207,CHUAH ANG TUAN,R60,YRACAA,BP 19,Monthly,19/07/13,18/08/13,113,47.38,65.62,0,113
08,F221456710106,NORHAZLINA BINTI ABDUL HAMID,R10,YRACAA,BP 19,Monthly,19/07/13,18/08/13,91,38.16,52.84,0,91
08,A600022365604,LIM HOOI LENG,R30,YRACPP,BP 19,Monthly,19/08/13,18/09/13,135,56.61,0,78.39,56.61
08,D208631350106,ZAILAN BIN KHASRAN,R30,YRACNN,BP 19,Monthly,19/08/13,18/09/13,25,10.48,0,14.52,10.48
08,D217414580206,ABDOL ADI BIN BABA,R20,YRACMM,BP 19,Monthly,19/07/13,18/08/13,85,35.64,49.36,0,85
08,F221456710106,NORHAZLINA BINTI ABDUL HAMID,R10,YRACAA,BP 19,Monthly,19/08/13,18/09/13,91,38.16,0,52.84,38.16

---> BP 19.TXT

and so on.

Also, I want to retain the heading
Code:
MONTH,ACCOUNT_NO,CUSTOMER_NAME,SEGMENT_CODE,COST_CENTER,BILLED_DATE,BILL_FREQ,BILL_START_DATE,BILL_END_DATE,BILLED_RENTAL,BILLED_EARNED,PREVIOUS_BILLED_EARNED,REMAINING_UNEARNED_BALANCE,TOTAL_REVENUE_EARNED

in each file

Thank you very much for helping.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

split large file based on field criteria

I have a file containing date/time sorted data of the form ... 2009/06/10,20:59:59.950,XAG/USD,Q,1,1115, 14.3025,100,1,1 2009/06/10,20:59:59.950,XAG/USD,Q,1,1116, 14.3026,125,1,1 2009/06/10,20:59:59.950,XAG/USD,R,0,0, , 0,0,0 2009/06/10,20:59:59.950,XAG/USD,R,1,0, 14.1910,100,1,1... (6 Replies)
Discussion started by: asriva
6 Replies

2. Shell Programming and Scripting

Split large file based on last digit from a column

Hello, What's the best way to split a large into multiple files based on the last digit in the first column. input file: f 2738483300000x0y03772748378831x1y13478378358383x2y23743878383802x3y33787828282820x4y43748838383881x5y5 Desired Output: f0 3738483300000x0y03787828282820x4y4 f1... (9 Replies)
Discussion started by: alain.kazan
9 Replies

3. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

4. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

5. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

6. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

7. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

8. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

9. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

10. Shell Programming and Scripting

Split files based on row delimiter count

I have a huge file (around 4-5 GB containing 20 million rows) which has text like: <EOFD>11<EOFD>22<EORD>2<EOFD>2222<EOFD>3333<EORD>3<EOFD>44<EOFD>55<EORD>66<EOFD>888<EOFD>9999<EORD> Actually above is an extracted file from a Sql Server with each field delimited by <EOFD> and each row ends... (8 Replies)
Discussion started by: amvip
8 Replies
All times are GMT -4. The time now is 12:01 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy