The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Read each word from File1 and search each file in file2 clem2610 Shell Programming and Scripting 8 04-23-2009 09:13 AM
read mp3 filename and create one XML for each file jason7 Shell Programming and Scripting 4 03-21-2009 02:57 PM
Read words from file and create new file using K-shell. bsrajirs Shell Programming and Scripting 4 06-01-2007 01:15 PM
Korn Shell Script - Read File & Search On Values run_unx_novice Shell Programming and Scripting 2 06-15-2005 08:20 AM
sendmail.cf: How can I read a .db file and search for a token? Devyn Shell Programming and Scripting 0 02-18-2005 03:43 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rating: Thread Rating: 1 votes, 5.00 average. Display Modes
  #1 (permalink)  
Old 06-18-2009
summer_cherry summer_cherry is offline Forum Advisor  
Registered User
  
 

Join Date: Jun 2007
Location: Beijing China
Posts: 1,088
hope below perl script can help you some.

Code:
while(<DATA>){
	my @tmp=split(",",$_);
	push @{$tmp[0]}, $tmp[1];
}
open $fh,"<", "a.txt";
while(<$fh>){
	chomp;
	if(/KEY=([0-9]+)/){
		my $tmp=shift @{$1};
		print $_,"<RESULT>",$tmp;
	}
}
__DATA__
000000000160191837,00140000637006925269
000000000160191837,00140000637006925270
000000000160191838,00140000637006925271
000000000160191840,00140000637006925272
  #2 (permalink)  
Old 06-18-2009
King Kalyan King Kalyan is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 12
Thanks Cherry! I do not know perl so this is kind of out of scope for me but I heard perl is very fast. I have to learn that in future.

The below awk code runs fine for small number of records but now I'm running it on 200K records and it's taking lot of time, it's been 25 minutes and it wrote just 50 records in to the output file. Not sure how much more time it will take to complete the process.

Is there any thing wrong with the code that is making it to run long time?
Generally awk is very fast, right?

Actually this code is already availble in C++ and I'm trying to re-write in awk because of performance issues as awk is faster.

awk -f king.awk FS=, file1 FS='#KEY=' file2

Code:
 
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2; default_num=$2;next}
{x=index($2,">");
key=substr($2,1,x-1);
}
key in f1 {
n=split(f1[key], a, SUBSEP)
delete f1[key]
printf("%s<RESULT>%s\n", $0, a[1])
for(i=2;i<=n;i++)
f1[key]=(i==2)?a[i]:f1[key] SUBSEP a[i]
next
}
{
print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}


---------- Post updated at 05:00 PM ---------- Previous update was at 11:37 AM ----------

My bad.. I used wrong data that has only 7 unique records and rest all of it is duplicate which will not happen in real world. So, I'm good.

For all 200K unique records and 1 duplicate for each record, it ran in ~3 mins.

Thanks for all your support!

Last edited by King Kalyan; 06-18-2009 at 12:43 PM..
  #3 (permalink)  
Old 06-18-2009
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,128
or better yet - to take care of the mismatching NUMBER of keys in either one of the files:
Code:
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2;default_num=$2;next}
$3 in f1 {
   n=split(f1[$3], a, SUBSEP)
   printf("%s<RESULT>%s\n", $0, a[1])
   if (n==1) next;
   delete f1[$3]
   for(i=2;i<=n;i++)
      f1[$3]=(i==2)?a[i]:f1[$3] SUBSEP a[i]
   next
}
{
  print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}
}
  #4 (permalink)  
Old 06-18-2009
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,128
I'm not sure why you changed the invocation:
Code:
nawk -f king.awk FS=, file1 FS='(#KEY=|>)' file2
TO
Code:
awk -f king.awk FS=, file1 FS='#KEY=' file2
and do the 'index/substr' for each record/line in file2. It's definitely adding time to the execution.
Also if you take my last version - it should be a little faster as I don't rebuild the array if I just have 1 entry in it (probably the majority of your records in file2).
You could probably think of a different implementation that doesn't require rebuilding the array all together. This is left as an exercise for the OP
  #5 (permalink)  
Old 06-19-2009
King Kalyan King Kalyan is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 12
Thanks for giving me a new direction!

I changed the invocation because field <#KEY> can be anywhere in the file it's not always at second position. Sorry I did not mention that in my first post.

Yes I saw your last code and forgot to include that in my code, now I included it (skipping rebuilding array if there is only one entry).

After you told that, I thought of different implementation and here it is.. this is more faster...

Code:
 
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2; default_num=$2;next}
{x=index($2,">");
key=substr($2,1,x-1);
}
key in f1 {
n=split(f1[key], a, SUBSEP)
printf("%s<RESULT>%s\n", $0, a[1])
if (n==1) {next}
y=index(f1[key],SUBSEP);
f1[key]=substr(f1[key],y+1)
next
}
{
print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}
Reply

Bookmarks

Tags
array, awk, dulplicate, search, two files

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 12:26 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0