Visit Our UNIX and Linux User Community


compare & split files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting compare & split files
# 1  
Old 08-11-2009
compare & split files

Hi All,

I've 1 big file like:

Code:
cat nid_lec_rej_20090804_merged
10084MOCLEC         0408090061480739nid090804132259.03.148990533               
2526716790000008947850036448540401014 R030007150692000                         
2535502720000000010100036165742685000 R030007150354000                         
2554132380000000298300036428156061013 R030007150082000                         
2608117990000000145250036428153472007 R030007148586000                         
2612547640000000055750036452910607010 R030007148076000                         
2511131960000000100000036008715245008 R030007133681000                         
2587377210000000171100036182913145003 R030007131966000                         
2588157990000000190200036459337192005 R030007131975000                         
2599294600000000179600036181101445019 R030007131676000                         
2626160970000000075500036165716085005 R030007131171000                         
2939008270000001725100036182920694027 R030007106040000                         
2941677890000000068000036001629351020 R030007105976000                         
2954673550000000234200036001620285029 R030007105655000                         
2956336840000000038650036001620285029 R030007105697000                         
2956389380000000048000036001620285029 R030007105593000                         
3000150000012287605000001994675

and 3 small files
Code:
cat CC29072009_CXXXCU01.rnd
10020MOCLEC         2907090061480739nid090729181916.03.147814552               
2526716790000008947850036448540401014  030007150692000                         
2535502720000000010100036165742685000  030007150354000                         
2554132380000000298300036428156061013  030007150082000                         
2608117990000000145250036428153472007  030007148586000                         
2612547640000000055750036452910607010  030007148076000                         
300005000000945725                                                             

cat CC04082009_CXXXCU04.rnd
10020MOCLEC         0408090061480739nid090804132259.06.148990533               
2511131960000000100000036008715245008  030007133681000                         
2587377210000000171100036182913145003  030007131966000                         
2588157990000000190200036459337192005  030007131975000                         
2599294600000000179600036181101445019  030007131676000                         
2626160970000000075500036165716085005  030007131171000                         
300005000000071640                                                             

cat CC25072009_CXXXCU07.rnd
10020MOCLEC         2507090061480739nid090725021957.09.146887198               
2939008270000001725100036182920694027  030007106040000                         
2941677890000000068000036001629351020  030007105976000                         
2954673550000000234200036001620285029  030007105655000                         
2956336840000000038650036001620285029  030007105697000                         
2956389380000000048000036001620285029  030007105593000                         
300005000001440155

Now im comparing big file with the 3 small files on the basis of id. This field is in 2nd column from 39th position to 79th position in detail record (whose 1st number is 2).

The same field in 3 small files is in 2nd column from 39th position to 80th position in detail record (whose 1st number is 2).

So right now in order to compare 1 big file with the 3 small files im writing 3 while loops but 3 while loops will scan the big file 3 times whereas i want to do in 1 go i.e. big file should be scanned once only.

Code:
cat nid_lec_rej_20090804_merged|grep ^2 | while read i

do

x=`echo $i |awk '{print substr($2,2,40)}'`

y=`awk '/[[:digit:]]{37}[[:space:]]{2}'$x'/' /data/output/TEMP/toRND/CC04082009_CXXXCU04.rnd`

[ "$y" !=  "" ] && echo $i >> rnd.out4

done

same for comparing rest 2 small files also.

Please suggest me some efficient ways for the same.

Thanks
# 2  
Old 08-11-2009
What should be the desired output?
# 3  
Old 08-12-2009
The output should be 3 files splitted from the big file.
Code:
cat rnd.out1
2526716790000008947850036448540401014 R030007150692000                         
2535502720000000010100036165742685000 R030007150354000                         
2554132380000000298300036428156061013 R030007150082000                         
2608117990000000145250036428153472007 R030007148586000                         
2612547640000000055750036452910607010 R030007148076000                         

cat rnd.out4
2511131960000000100000036008715245008 R030007133681000                         
2587377210000000171100036182913145003 R030007131966000                         
2588157990000000190200036459337192005 R030007131975000                         
2599294600000000179600036181101445019 R030007131676000                         
2626160970000000075500036165716085005 R030007131171000                         

cat rnd.out7
2939008270000001725100036182920694027 R030007106040000                         
2941677890000000068000036001629351020 R030007105976000                         
2954673550000000234200036001620285029 R030007105655000                         
2956336840000000038650036001620285029 R030007105697000                         
2956389380000000048000036001620285029 R030007105593000

# 4  
Old 08-12-2009
This should work:

Code:
awk -F" |_" 'NR==FNR && /^2/{a[substr($0,40,15)]=$0;next}
FILENAME=="CC29072009_CXXXCU01.rnd" && /^2/ && a[$3]{print a[$3] > "rnd.out1"}
FILENAME=="CC04082009_CXXXCU04.rnd" && /^2/ && a[$3]{print a[$3] > "rnd.out4"}
FILENAME=="CC25072009_CXXXCU07.rnd" && /^2/ && a[$3]{print a[$3] > "rnd.out7"}
' file CC29072009_CXXXCU01.rnd CC04082009_CXXXCU04.rnd CC25072009_CXXXCU07.rnd

Use nawk or /usr/xpg4/bin/awk on Solaris if you get errors.

Regards
# 5  
Old 08-12-2009
Im bit confused do i need to put this in a while loop?

Secondly the big file & the 3 small files are in different paths.
# 6  
Old 08-12-2009
You can copy and paste the code in a file and make it executable:

Code:
#!/usr/bin

awk -F" |_" 'NR==FNR && /^2/{a[substr($0,40,15)]=$0;next}
FILENAME=="CC29072009_CXXXCU01.rnd" && /^2/ && a[$3]{print a[$3] > "rnd.out1"}
FILENAME=="CC04082009_CXXXCU04.rnd" && /^2/ && a[$3]{print a[$3] > "rnd.out4"}
FILENAME=="CC25072009_CXXXCU07.rnd" && /^2/ && a[$3]{print a[$3] > "rnd.out7"}
' file CC29072009_CXXXCU01.rnd CC04082009_CXXXCU04.rnd CC25072009_CXXXCU07.rnd

Use full path names if the files are in different paths.

Regards
# 7  
Old 08-12-2009
Could you please tell me in the piece of code provided by you, where is the comparison part with the big file?

And here is my complete code appended with the lines given by you.

Code:
#!/usr/bin

awk -F" |_" 'NR==FNR && /^2/{a[substr($0,40,15)]=$0;next}
FILENAME=="/arbor/FX/data/remote/cpm/output/WORK_TEMP/toDINER/CC29072009_CELPCU01.dnr" && /^2/ && a[$3]{print a[$3] > "rnd.out1"}
FILENAME=="/arbor/FX/data/remote/cpm/output/WORK_TEMP/toDINER/CC04082009_CELPCU04.dnr" && /^2/ && a[$3]{print a[$3] > "rnd.out4"}
FILENAME=="/arbor/FX/data/remote/cpm/output/WORK_TEMP/toDINER/CC25072009_CELPCU07.dnr" && /^2/ && a[$3]{print a[$3] > "rnd.out7"}
' file CC29072009_CELPCU01.dnr CC04082009_CELPCU04.dnr CC25072009_CELPCU07.dnr

total_amnt_01=`awk '{a += (substr($1,10,12))}END{printf a}' rnd.out1`
total_amnt_06=`awk '{a += (substr($1,10,12))}END{printf a}' rnd.out4`
total_amnt_09=`awk '{a += (substr($1,10,12))}END{printf a}' rnd.out7`

rec_cnt_01=`(awk 'END{print NR}' rnd.out1_CU01)`
rec_cnt_06=`(awk 'END{print NR}' rnd.out4_CU04)`
rec_cnt_09=`(awk 'END{print NR}' rnd.out7_CU07)`

sed -n '2p' /arbor/FX/data/remote/cpm/output/WORK_TEMP/CTRL/ctrl_DINER >>  /arbor/FX/data/remote/cpm/input/WORK_TEMP/frDINER/tmp.1
sed -n '4p' /arbor/FX/data/remote/cpm/output/WORK_TEMP/CTRL/ctrl_DINER >>  /arbor/FX/data/remote/cpm/input/WORK_TEMP/frDINER/tmp.4
sed -n '6p' /arbor/FX/data/remote/cpm/output/WORK_TEMP/CTRL/ctrl_DINER >>  /arbor/FX/data/remote/cpm/input/WORK_TEMP/frDINER/tmp.7

cat tmp.1 rnd.out1_CU01 >> din_cel_rej_20090804_CU01
cat tmp.4 rnd.out4_CU04 >> din_cel_rej_20090804_CU04
cat tmp.7 rnd.out7_CU07 >> din_cel_rej_20090804_CU07

rm tmp.1 tmp.4 tmp.7 rnd.out1_CU01 rnd.out4_CU04 rnd.out7_CU07

Thanks & Regards

Previous Thread | Next Thread
Test Your Knowledge in Computers #496
Difficulty: Easy
Many programming languages assign special meaning to keywords such as for, if, and while (for example) that are used to define various control structures.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to Compare local & remote Files over ssh?

I want to make a script to compare list of files in terms of its size on local & remote server whose names are same & this is required over ssh. How can I accomplish this. Any help would be appreciated. (1 Reply)
Discussion started by: m_raheelahmed
1 Replies

2. Shell Programming and Scripting

Compare files & extract column awk

I have two tab delimited files as given below: File_1: PV16 E1 865 2814 1950 PV16 E2 2756 3853 1098 PV16 E4 3333 3620 288 PV16 E5 3850 4101 252 PV16 E6 83 559 477 PV16 E7 562 858 297 PV16 L2 4237 5658 ... (10 Replies)
Discussion started by: vaibhavvsk
10 Replies

3. Shell Programming and Scripting

Compare & subtract lines in files by column using awk.

I have two files with similar column pattern as given below : 2 sample lines from file1 are given below. 18 12630 . G T 49.97 . AC=2;AF=1.00;AN=2;DP=3;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=0.0000;MQ=60.00;MQ0=0;NDA=1;QD=16.66;SB=-0.01 GT:AD:DP:GQ:PL ... (2 Replies)
Discussion started by: vaibhavvsk
2 Replies

4. Shell Programming and Scripting

Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs. FILE1 ----- 2001,"John",25,19901130,21211.41,Unix Forum... (3 Replies)
Discussion started by: Sheel
3 Replies

5. Shell Programming and Scripting

How to Read & Compare Two Files

Hi forumers, How is it going. Ok i need some advice on the following problem. I have 2 files to read and compare data.FileA and FileB. FileA will return either status 1 or 0. FileB on the other hand is trickier and has the following details:- Count DeviceID CurrentStatus ... (7 Replies)
Discussion started by: prakash1111
7 Replies

6. Shell Programming and Scripting

Compare two files A & B and accordingly modify file A

Friends, i have two huge complex files (for eg :A & B)as output , the sample contents of the files are as follows : A == ID,DATE,SUM1,SUM2,TOTAL(SUM1+2) A5066,20/04/2010,25000,50000,75000 A5049,20/04/2010,25000,60000,85000 B == ID,DATE,SUM1,SUM2,TOTAL(SUM1+2)... (2 Replies)
Discussion started by: appu2176
2 Replies

7. UNIX for Dummies Questions & Answers

How to compare 2 files & get specific value & replace it in other file.

Hiiii Friends I have 2 files with huge data. I want to compare this 2 files & if they hav same set of vales in specific rows & columns i need to get that value from one file & replace it in other. For example: I have few set data of both files here: a.dat: PDE-W 2009 12 16 5 29 11.11 ... (10 Replies)
Discussion started by: reva
10 Replies

8. Shell Programming and Scripting

PHP read large string & split in multidimensional arrays & assign fieldnames & write into MYSQL

Hi, I hope the title does not scare people to look into this thread but it describes roughly what I'm trying to do. I need a solution in PHP. I'm a programming beginner, so it might be that the approach to solve this, might be easier to solve with an other approach of someone else, so if you... (0 Replies)
Discussion started by: lowmaster
0 Replies

9. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

10. Shell Programming and Scripting

How to search & compare paragraphs between two files

Hello Guys, Greetings to All. I am stuck in my work here today while trying to comapre paragraphs between two files, I need your help on urgent basis, without your inputs I can not proceed. Kindly find some time to answer my question, I'll be grateful to you for ever. My detailed issue is as... (10 Replies)
Discussion started by: NARESH1302
10 Replies

Featured Tech Videos