![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Read each word from File1 and search each file in file2 | clem2610 | Shell Programming and Scripting | 8 | 04-23-2009 09:13 AM |
| read mp3 filename and create one XML for each file | jason7 | Shell Programming and Scripting | 4 | 03-21-2009 02:57 PM |
| Read words from file and create new file using K-shell. | bsrajirs | Shell Programming and Scripting | 4 | 06-01-2007 01:15 PM |
| Korn Shell Script - Read File & Search On Values | run_unx_novice | Shell Programming and Scripting | 2 | 06-15-2005 08:20 AM |
| sendmail.cf: How can I read a .db file and search for a token? | Devyn | Shell Programming and Scripting | 0 | 02-18-2005 03:43 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread |
Rating:
|
Display Modes |
|
|
|
||||
|
hope below perl script can help you some.
Code:
while(<DATA>){
my @tmp=split(",",$_);
push @{$tmp[0]}, $tmp[1];
}
open $fh,"<", "a.txt";
while(<$fh>){
chomp;
if(/KEY=([0-9]+)/){
my $tmp=shift @{$1};
print $_,"<RESULT>",$tmp;
}
}
__DATA__
000000000160191837,00140000637006925269
000000000160191837,00140000637006925270
000000000160191838,00140000637006925271
000000000160191840,00140000637006925272
|
|
||||
|
Thanks Cherry! I do not know perl so this is kind of out of scope for me but I heard perl is very fast. I have to learn that in future.
The below awk code runs fine for small number of records but now I'm running it on 200K records and it's taking lot of time, it's been 25 minutes and it wrote just 50 records in to the output file. Not sure how much more time it will take to complete the process. Is there any thing wrong with the code that is making it to run long time? Generally awk is very fast, right? Actually this code is already availble in C++ and I'm trying to re-write in awk because of performance issues as awk is faster. ![]() awk -f king.awk FS=, file1 FS='#KEY=' file2 Code:
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2; default_num=$2;next}
{x=index($2,">");
key=substr($2,1,x-1);
}
key in f1 {
n=split(f1[key], a, SUBSEP)
delete f1[key]
printf("%s<RESULT>%s\n", $0, a[1])
for(i=2;i<=n;i++)
f1[key]=(i==2)?a[i]:f1[key] SUBSEP a[i]
next
}
{
print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}
---------- Post updated at 05:00 PM ---------- Previous update was at 11:37 AM ---------- My bad.. I used wrong data that has only 7 unique records and rest all of it is duplicate which will not happen in real world. So, I'm good. For all 200K unique records and 1 duplicate for each record, it ran in ~3 mins. Thanks for all your support! Last edited by King Kalyan; 06-18-2009 at 12:43 PM.. |
|
|||||
|
I'm not sure why you changed the invocation:
Code:
nawk -f king.awk FS=, file1 FS='(#KEY=|>)' file2 Code:
awk -f king.awk FS=, file1 FS='#KEY=' file2 Also if you take my last version - it should be a little faster as I don't rebuild the array if I just have 1 entry in it (probably the majority of your records in file2). You could probably think of a different implementation that doesn't require rebuilding the array all together. This is left as an exercise for the OP ![]() |
|
||||
|
Thanks for giving me a new direction!
I changed the invocation because field <#KEY> can be anywhere in the file it's not always at second position. Sorry I did not mention that in my first post. Yes I saw your last code and forgot to include that in my code, now I included it (skipping rebuilding array if there is only one entry). After you told that, I thought of different implementation and here it is.. this is more faster... Code:
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2; default_num=$2;next}
{x=index($2,">");
key=substr($2,1,x-1);
}
key in f1 {
n=split(f1[key], a, SUBSEP)
printf("%s<RESULT>%s\n", $0, a[1])
if (n==1) {next}
y=index(f1[key],SUBSEP);
f1[key]=substr(f1[key],y+1)
next
}
{
print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}
|
![]() |
| Bookmarks |
| Tags |
| array, awk, dulplicate, search, two files |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|