Read a file and search a value in another file create third file using AWK

06-18-2009

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

hope below perl script can help you some.

Code:

while(<DATA>){
	my @tmp=split(",",$_);
	push @{$tmp[0]}, $tmp[1];
}
open $fh,"<", "a.txt";
while(<$fh>){
	chomp;
	if(/KEY=([0-9]+)/){
		my $tmp=shift @{$1};
		print $_,"<RESULT>",$tmp;
	}
}
__DATA__
000000000160191837,00140000637006925269
000000000160191837,00140000637006925270
000000000160191838,00140000637006925271
000000000160191840,00140000637006925272

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

06-18-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

or better yet - to take care of the mismatching NUMBER of keys in either one of the files:

Code:

FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2;default_num=$2;next}
$3 in f1 {
   n=split(f1[$3], a, SUBSEP)
   printf("%s<RESULT>%s\n", $0, a[1])
   if (n==1) next;
   delete f1[$3]
   for(i=2;i<=n;i++)
      f1[$3]=(i==2)?a[i]:f1[$3] SUBSEP a[i]
   next
}
{
  print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}
}

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

06-18-2009

Registered User

12, 0

Join Date: Sep 2008

Last Activity: 15 July 2012, 11:15 AM EDT

Posts: 12

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks Cherry! I do not know perl so this is kind of out of scope for me but I heard perl is very fast. I have to learn that in future.

The below awk code runs fine for small number of records but now I'm running it on 200K records and it's taking lot of time, it's been 25 minutes and it wrote just 50 records in to the output file. Not sure how much more time it will take to complete the process.

Is there any thing wrong with the code that is making it to run long time?
Generally awk is very fast, right?

Actually this code is already availble in C++ and I'm trying to re-write in awk because of performance issues as awk is faster.

awk -f king.awk FS=, file1 FS='#KEY=' file2

Code:

 
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2; default_num=$2;next}
{x=index($2,">");
key=substr($2,1,x-1);
}
key in f1 {
n=split(f1[key], a, SUBSEP)
delete f1[key]
printf("%s<RESULT>%s\n", $0, a[1])
for(i=2;i<=n;i++)
f1[key]=(i==2)?a[i]:f1[key] SUBSEP a[i]
next
}
{
print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}

---------- Post updated at 05:00 PM ---------- Previous update was at 11:37 AM ----------

My bad.. I used wrong data that has only 7 unique records and rest all of it is duplicate which will not happen in real world. So, I'm good.

For all 200K unique records and 1 duplicate for each record, it ran in ~3 mins.

Thanks for all your support!

Last edited by King Kalyan; 06-18-2009 at 12:43 PM..

King Kalyan

View Public Profile for King Kalyan

Find all posts by King Kalyan

06-18-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

I'm not sure why you changed the invocation:

Code:

nawk -f king.awk FS=, file1 FS='(#KEY=|>)' file2

Code:

awk -f king.awk FS=, file1 FS='#KEY=' file2

and do the 'index/substr' for each record/line in file2. It's definitely adding time to the execution.
Also if you take my last version - it should be a little faster as I don't rebuild the array if I just have 1 entry in it (probably the majority of your records in file2).
You could probably think of a different implementation that doesn't require rebuilding the array all together. This is left as an exercise for the OP

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

06-19-2009

Registered User

12, 0

Join Date: Sep 2008

Last Activity: 15 July 2012, 11:15 AM EDT

Posts: 12

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks for giving me a new direction!

I changed the invocation because field <#KEY> can be anywhere in the file it's not always at second position. Sorry I did not mention that in my first post.

Yes I saw your last code and forgot to include that in my code, now I included it (skipping rebuilding array if there is only one entry).

After you told that, I thought of different implementation and here it is.. this is more faster...

Code:

 
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2; default_num=$2;next}
{x=index($2,">");
key=substr($2,1,x-1);
}
key in f1 {
n=split(f1[key], a, SUBSEP)
printf("%s<RESULT>%s\n", $0, a[1])
if (n==1) {next}
y=index(f1[key],SUBSEP);
f1[key]=substr(f1[key],y+1)
next
}
{
print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}

King Kalyan

View Public Profile for King Kalyan

Find all posts by King Kalyan

Shell Programming and Scripting

Read a file and search a value in another file create third file using AWK

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Discussion started by: LMHmedchem

2. Shell Programming and Scripting

Use while loop to read file and use ${file} for both filename input into awk and as string to print

Discussion started by: pathunkathunk

3. Shell Programming and Scripting

Read in search strings from text file, search for string in second text file and output to CSV

Discussion started by: An0mander

4. Shell Programming and Scripting

Bash to search file based off user input then create new file

Discussion started by: cmccabe

5. Shell Programming and Scripting

Using awk to read one file and search in another file

Discussion started by: pchang

6. Shell Programming and Scripting

awk read one delimited file, search another delimited file

Discussion started by: dagamier

7. Shell Programming and Scripting

Want to read data from a file name.txt and search it in another file and then matching...

Discussion started by: ektubbe

8. Shell Programming and Scripting

Select some lines from a txt file and create a new file with awk

Discussion started by: capnino

9. Shell Programming and Scripting

Need help with awk - how to read a content of a file from every file from file list

Discussion started by: tanit

10. Shell Programming and Scripting

Read words from file and create new file using K-shell.

Discussion started by: bsrajirs