Visit Our UNIX and Linux User Community


How to Split File based on String?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to Split File based on String?
# 1  
Old 08-15-2013
How to Split File based on String?

hi ,


The scenario is like this,

i have a large text files (max 5MB , about 5000 file per day ),
Inside almost each line of this file there is a tag 3100.2.22.1 (represent Call_Type) , i need to generate many filess , each one with distinct (3100.2.22.1 Call_Type ) , and one more file to collect all lines without (3100.2.22.1 Call_Type)


the question is how can i split that file by using bash/sed/awk.

sample file hd_auto_22700123_0021 content (there are alot of Call_Type ) ;
Code:


Code:
! HISTORICAL DATA ! ONE FILE DECODING REPORT ! SERVICE : ce20 ! FILE : /osp/spm/svc/ !
! TICKET NBR : 1 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665004551 ! 3100.2.22.8 Browsing !
! TICKET NBR : 2 ! GSI : 102 ! 3100.2.137.4 665017728 !3100.2.22.2 7 ! 3100.2.70.8 1050 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 3 ! GSI : 102 ! 3100.2.137.4 665017728 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665017728 ! 3100.2.22.2 7 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 4 ! GSI : 102 ! 3100.2.137.4 665002105 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665002105 ! 3100.2.22.1 410 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 5 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665009058 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 6 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665012633 ! 3100.2.97.1 192.168.0.12 ! 3100.2.18.1 0 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 7 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665019277 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !   
! TICKET NBR : 8 ! GSI : 102 ! 3100.2.112.1 15/08/2013 10:42:43 ! 3100.2.22.8 Free_Traffic ! 3100.2.97.1 192.168.0.12  ! 3100.2.22.11 2 !
.
.
.
! RESULT = successfull 1657 tickets treated !


the result of split should look likes below ,

Code:
hd_auto_22700123_0021_without_tag  (without 3100.2.22.1 tag)
! TICKET NBR : 1 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665004551 ! 3100.2.22.8 Browsing !
! TICKET NBR : 3 ! GSI : 102 ! 3100.2.137.4 665017728 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665017728 ! 3100.2.22.2 7 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 8 ! GSI : 102 ! 3100.2.112.1 15/08/2013 10:42:43 ! 3100.2.22.8 Free_Traffic ! 3100.2.97.1 192.168.0.12  ! 3100.2.22.11 2 !
! RESULT = successfull 3 tickets treated !

Code:
hd_auto_22700123_0021_189 (with tag 3100.2.22.1 189)
! TICKET NBR : 2 ! GSI : 102 ! 3100.2.137.4 665017728 !3100.2.22.2 7 ! 3100.2.70.8 1050 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 6 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665012633 ! 3100.2.97.1 192.168.0.12 ! 3100.2.18.1 0 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! RESULT = successfull 2 tickets treated !

Code:
hd_auto_22700123_0021_410 (with tag 3100.2.22.1 410)
! TICKET NBR : 4 ! GSI : 102 ! 3100.2.137.4 665002105 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665002105 ! 3100.2.22.1 410 ! 3100.2.70.11 016c6f63000431333000 !
! RESULT = successfull 1 tickets treated !

Code:
hd_auto_22700123_0021_164 (with tag 3100.2.22.1 164)
! TICKET NBR : 5 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665009058 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 7 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665019277 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !
! RESULT = successfull 2 tickets treated !


Last edited by OTNA; 08-15-2013 at 11:06 AM..
# 2  
Old 08-15-2013
Try
Code:
awk -F! 'match ($0, "3100.2.22.1[^!]*") {print >FILENAME " " substr ($0, RSTART, RLENGTH); next}
                                        {print >FILENAME " without_tag"}
        ' hd_auto_*

This User Gave Thanks to RudiC For This Post:
# 3  
Old 08-15-2013
Wow , thank you
can you please explain how this script doing this magic
# 4  
Old 08-16-2013
It tries to match the entire record to your 3100... plus call type represented by a regex. If found, RSTART and RLENGTH (see man awk) are sufficient to locate the whole string and extract it for use as a filename, to which the entire record then is printed. If no match, print to "without" file.
I see now that the -F! is not needed at all...

Previous Thread | Next Thread
Test Your Knowledge in Computers #956
Difficulty: Medium
A polymorphic virus infects files with an encrypted copy of itself which is decoded by a decryption module which is modified on each infection.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split the File based on Size

I have a file that is about 7 GB in size. The requirement is I should split the file equally in such a way that the size of the split files is less than 2Gb. If the file is less than 2gb, than nothing needs to be done. ( need to done using shell script) Thanks, (4 Replies)
Discussion started by: rudoraj
4 Replies

2. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

3. Shell Programming and Scripting

Split File based on different conditions

I need to split the file Conditions: Ignore any record that either starts with 1 or 9 Split the file at position 404 , if position 404 is abc or def then write all the records in a file > File 1 , the remaining records should go in to a file > File 2 Further I want to split the... (7 Replies)
Discussion started by: protech
7 Replies

4. Shell Programming and Scripting

A command to split a file into two based on a string

Hello What command can i use to split a tab delimited txt file into two files base on the occurrence of a string my file name is EDIT.txt The content of file is below XX 1234 PROCEDURES XY 1634 PROCEDURES XM 1245 CODES XZ 1256 CODES It has more than a million record If there is... (16 Replies)
Discussion started by: madrazzii
16 Replies

5. Shell Programming and Scripting

KSH: Split String into smaller substrings based on count

KSH HP-SOL-Lin Cannot use xAWK I have several strings that are quite long and i want to break them down into smaller substrings. What I have String = "word1 word2 word3 word4 .....wordx" What I want String1="word1 word2" String2="word 3 word4" String3="word4 word5" Stringx="wordx... (5 Replies)
Discussion started by: nitrobass24
5 Replies

6. Shell Programming and Scripting

Split file based on size

Hi Friends, Below is my requirement. I have a file with the below structure. 0001A1.... 0001B1.. .... 0001L1 0002A1 0002B1 ...... 0002L1 .. the first 4 characters are the sequence numbers for a record, A record will start with A1 and end with L1 with same sequence number. Now the... (2 Replies)
Discussion started by: diva_thilak
2 Replies

7. Shell Programming and Scripting

How to split file based on subtitle

Hi, unix Gurus, I want to split file based on sub_title. for example: original file fruit apple watermelon meat pork fish beef expected result file file1 fruit apple watermelon file2 meat pork fish beef. (4 Replies)
Discussion started by: ken002
4 Replies

8. Shell Programming and Scripting

Split the file based on date value

Hi frnds, I have flat file as . Say : output-file1.txt Output-file2.txt (1 Reply)
Discussion started by: Gopal_Engg
1 Replies

9. Shell Programming and Scripting

Split file based on field

Hi I have a large file 2.6 million records and I am trying to split the file based on last column. I am doing awk -F"|" '{ print > $NF }' filename1 After around 1000 splits it gives me a error awk: can't open file 3332332423 input record number 1068, file filename1 source... (6 Replies)
Discussion started by: s_adu
6 Replies

10. Shell Programming and Scripting

How to split the String based on condition?

hi , I have a String str="/opt/ibm/lotus/ibw/latest" or ="/opt/lotus/ibw/latest" this value is dynamic..I want to split this string into 2 strings 1. /opt/ibm/lotus(/opt/lotus) this string must ends with "lotus" 2./ibw/latest can any body help me on this? Regards, sankar (2 Replies)
Discussion started by: sankar reddy
2 Replies

Featured Tech Videos