The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Read each word from File1 and search each file in file2 clem2610 Shell Programming and Scripting 8 04-23-2009 08:13 AM
read mp3 filename and create one XML for each file jason7 Shell Programming and Scripting 4 03-21-2009 02:57 PM
Read words from file and create new file using K-shell. bsrajirs Shell Programming and Scripting 4 06-01-2007 12:15 PM
Korn Shell Script - Read File & Search On Values run_unx_novice Shell Programming and Scripting 2 06-15-2005 07:20 AM
sendmail.cf: How can I read a .db file and search for a token? Devyn Shell Programming and Scripting 0 02-18-2005 03:43 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rating: Thread Rating: 1 votes, 5.00 average. Display Modes
  #1 (permalink)  
Old 06-17-2009
King Kalyan King Kalyan is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 12
Question Read a file and search a value in another file create third file using AWK

Hi,

I have two files with the format shown below. I need to read first field(value before comma) from file 1 and search for a record in file 2 that has the same value in the field "KEY=" and write the complete record of file 2 with corresponding field 2 of the first file in to result file.

File 1:

000000000160191837,00140000637006925269
000000000160191837,00140000637006925270
000000000160191838,00140000637006925271
000000000160191840,00140000637006925272

File 2:

<DATA1><#KEY=000000000160191837><DATA2>
<DATA3><#KEY=000000000160191837><DATA4>
<DATA5><#KEY=000000000160191838><DATA6>
<DATA6><#KEY=000000000160191840><DATA8>

Result File:

<DATA1><#KEY=000000000160191837><DATA2><RESULT>00140000637006925269
<DATA3><#KEY=000000000160191837><DATA4><RESULT>00140000637006925270
<DATA5><#KEY=000000000160191838><DATA6><RESULT>00140000637006925271
<DATA6><#KEY=000000000160191840><DATA8><RESULT>00140000637006925272

I wrote awk command for it but my code doesn't take care of duplicate records. please look at first two records in File 1 in the above example, field 1 is same but field 2 is different. In the same way I will have two exact same entries (same KEY value) in File 2 and I need to assign different values.

My code:

Code:
awk '{ 
  if (FNR==NR) {
    FS=","  
    sample_array[$1]=$2; 
    next 
   }
  FS="KEY=" 
  x=index($2,">")
  sample_num=substr($2,1,x-1);
  if (sample_num in sample_array)
      print $0 "<RESULT>" Sample_array[Sample_num] 
    
 } ' file1 file2 > result_file
Thanks in advance!
  #2 (permalink)  
Old 06-17-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
nawk -f king.awk FS=, file1 FS='(#KEY=|>)' file2

king.awk:
Code:
FNR==NR{f1[$1];next}
$3 in f1 {out[$3]=($3 in out)?$0:out[$3] $0}
END {
  for (i in out)
    print out[i]
}
  #3 (permalink)  
Old 06-17-2009
King Kalyan King Kalyan is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 12
Thanks for quick reponse!

The code is kind of suppressing duplicates and it's not giving corresponding field 2 of file 1 in the output. I need all records in the output with different field 2 values for duplicates as I shown in the example.

I'm just asking, does it require muti-dimensional array to store different values for duplicates. Not sure as I'm not good at using multi-dimensional arrays.
  #4 (permalink)  
Old 06-17-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
sorry - misread the data samples.

Assuming there're equal number of same 'keys' in file1 and file2.

king.awk:
Code:
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2;next}
$3 in f1 {
   n=split(f1[$3], a, SUBSEP)
   delete f1[$3]
   printf("%s<RESULT>%s\n", $0, a[1])
   for(i=2;i<=n;i++)
    f1[$3]=(i==2)?a[i]:f1[$3] SUBSEP a[i]
}
  #5 (permalink)  
Old 06-17-2009
King Kalyan King Kalyan is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 12
Thumbs up

Perfect!! Thanks a lot!!!
It works great!! I never thought it in that angle.

I added one more part, please check and let me know if I did it right.
If there is no match for a value in file 2 then I need to take first 11 digits from any value and append zeros to it and output the record.

It was working fine before but now it's not working not sure where I went wrong.

Addition:

FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2;next}
$3 in f1 {
n=split(f1[$3], a, SUBSEP)
delete f1[$3]
printf("%s<RESULT>%s\n", $0, a[1])
for(i=2;i<=n;i++)
f1[$3]=(i==2)?a[i]:f1[$3] SUBSEP a[i] ; next}
for ( temp in f1) {
tmp_value=substr(f1[temp],1,11)
print $0 "<RESULT>" tmp_value "000000000"
}
  #6 (permalink)  
Old 06-17-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
Code:
FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2;next}
$3 in f1 {
   n=split(f1[$3], a, SUBSEP)
   delete f1[$3]
   printf("%s<RESULT>%s\n", $0, a[1])
   for(i=2;i<=n;i++)
      f1[$3]=(i==2)?a[i]:f1[$3] SUBSEP a[i]
   next
}
{
   for( i in f1) {
      print $0 "<RESULT>" substr(f1(i), 1, 11) "000000000"
      break
  }
}
  #7 (permalink)  
Old 06-17-2009
King Kalyan King Kalyan is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 12
Thumbs up

Thanks!! You are the best!!
BTW Thanks for calling the awk code as king.awk

This is not giving the desired results if the missing record is last one in the file 2. I figured it out, as we are deleting the array element everytime and when we reach last record we would have deleted all array elements and so it's not printing the last record.

I changed the code a liitle bit and it's working fine now.

FNR==NR{f1[$1]=($1 in f1)? f1[$1] SUBSEP $2 : $2; default_num=$2;next}
$3 in f1 {
n=split(f1[$3], a, SUBSEP)
delete f1[$3]
printf("%s<RESULT>%s\n", $0, a[1])
for(i=2;i<=n;i++)
f1[$3]=(i==2)?a[i]:f1[$3] SUBSEP a[i]
next
}
{
print $0 "<RESULT>" substr(default_num, 1, 11) "000000000"
}

This is my first post to this forum and I'm really astonished with the quality/quick response.
Reply

Bookmarks

Tags
array, awk, dulplicate, search, two files

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 08:33 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0