File split question


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers File split question
# 1  
Old 02-19-2009
Bug File split question

I have a flat file in UNIX and I have to perform two tasks based on the below data. The data I have printed here is just sample the original data is too long.

The position 110 to 111 (two digit value I have bolded the values) theygives the record type detail in the sample above the record types in the sample are 32,32,31,31 and 35. The real data contains thousands of more records and there are more than 100 record types in a file. I have to split the file based on the record types in position 110 THRU 112.

000000008101 000011000700000000000000000000000001234567454002000 832I20090109 1234567097009967
123450007101 000000000000000000007856343446560000007856454540000 832I20090109 9864536670002456
957645465778 000011000700000000000000000000000067645333567743355 831I20090109 7854536670005647
676767497101 000011000700000000000000000000000008898675335767676 831I20090109 4565767665545469
767865444567 000011000700000000000000000000000007876564454676877 835I20090109 8786756656677887

TASK1: I have to split the file based on the record types. So the out put in this case will be three files

File1 RecordType32
000000008101 000011000700000000000000000000000001234567454002000 832I20090109 1234567097009967
123450007101 000000000000000000007856343446560000007856454540000 832I20090109 9864536670002456

File2 RecordType31
957645465778 000011000700000000000000000000000067645333567743355 831I20090109 7854536670005647
676767497101 000011000700000000000000000000000008898675335767676 831I20090109 4565767665545469

File3 RecordType35
767865444567 000011000700000000000000000000000007876564454676877 835I20090109 8786756656677887

Can any body help me with a solution for this? I am not good at UNIX shell scripting


TASK2: I need to get a unique list of record types in a file in my sample the result should be
32
31
35
# 2  
Old 03-05-2009
Man this is pretty easy. Surprised no one followed up:
Code:
awk 'length($0) > 111 { type=substr($0,110,2); ofile="type-" type ".dat"; print $0 > file; } 
       length($0) <= 111 { print $0 >"type-short.dat" }'


Last edited by otheus; 03-09-2009 at 05:39 AM.. Reason: corrected per zTodd
# 3  
Old 03-08-2009
Bug

Hi otheus,
Thank you for your reply. I appreciate it. I will try this at work tomorrow as I do not have access fro home.

Have a nice day
# 4  
Old 03-08-2009
Assuming the record types are in the 67th position as in your example and not in position 110, this should be sufficient:

Code:
awk '{print > "RecordType" substr($0,67,2)}' file

Regards
# 5  
Old 03-09-2009
Quote:
Originally Posted by otheus
Man this is pretty easy. Surprised no one followed up:
Code:
awk 'length($0) > 111 { type=substr($0,110,2); ofile="type-" type ".dat"; print $0 > test; } 
       length($0) <= 111 { print $0 >"type-short.dat" }'

Is the part I highlighted in red a typo? Was it supposed to be ofile instead of type? I didn't test it- just seemed so...

Last edited by otheus; 03-09-2009 at 05:39 AM.. Reason: oops!
# 6  
Old 03-09-2009
For task 2- I believe you can use the sed command or awk command, piped to the sort command with the -u option. Google search for "sed" and for "unix sort" should probably turn up lots of good info for you to learn. Smilie
# 7  
Old 03-09-2009
zTodd is correct. the Red 'type' should have been 'ofile' (since corrected). Thanks...

Last edited by otheus; 03-09-2009 at 05:38 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

2. UNIX for Dummies Questions & Answers

[Solved] Perl Question - split function with csv file

Hi all, I have a csv file that appears as follows: ,2013/03/26,2012/12/26,4,1,"2017/09/26,5.75%","2017/09/26,1,2018/09/26,1,2019/09/26,1,2020/09/26,1,2021/09/26,1",,,2012/12/26,now when i use the split function like this: my @f = split/,/; the split function will split the data that is... (2 Replies)
Discussion started by: WongSifu
2 Replies

3. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

4. Shell Programming and Scripting

Split File by Pattern with File Names in Source File... Awk?

Hi all, I'm pretty new to Shell scripting and I need some help to split a source text file into multiple files. The source has a row with pattern where the file needs to be split, and the pattern row also contains the file name of the destination for that specific piece. Here is an example: ... (2 Replies)
Discussion started by: cul8er
2 Replies

5. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

6. Shell Programming and Scripting

simple awk question: split field with :

Hi, Probably a very weak question.. but I have tried all I know.. BPC0001:ANNUL_49542 0.0108 -0.0226 -0.0236 0.0042 0.0033 -0.0545 0.0376 0.0097 -0.0093 -0.032 Control BPC0002:ANNUL_49606 0.0190 -0.0142 -0.0060 -0.0217 -0.0027 ... (3 Replies)
Discussion started by: genehunter
3 Replies

7. Shell Programming and Scripting

Perl split question

hi, I have a seemingly really stupid question, but here goes! What do you enter into split delimiter to seperate something like this "December 12, 1995" and get December 12 1995 ? thanks (5 Replies)
Discussion started by: ade214
5 Replies

8. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

9. UNIX for Dummies Questions & Answers

Split and recombine question

Hi guys I would like to be able to split a large file into many smaller part. Then, these smaller files will be transfered onto a windows machine where they need to be recombined. I think tar files may be the best to do this. How can I tar a large file into many small tar files which can be... (1 Reply)
Discussion started by: white_raven0
1 Replies

10. Shell Programming and Scripting

split question perl

I am interested in 2 and 36th fields in this input file. I was wondering if there was a more efficeint way to do this. ($pt1,$bkup_name,$pt3,$pt4,$pt5,$pt6,$pt7,$pt8,$pt9, $pt10,$pt11,$pt12,$pt13,$pt14,$pt15,$pt16,$pt17, ... (7 Replies)
Discussion started by: reggiej
7 Replies
Login or Register to Ask a Question