AWK: matching patterns in 2 different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK: matching patterns in 2 different files
# 1  
Old 09-14-2010
AWK: matching patterns in 2 different files

In a directory, there are two different file extensions (*.txt and *.xyz) having similar names of numerical strings (*). The (*.txt) contains 5000 multiple files and the (*.xyz) also contains 5000 multiple files. Each of the files has around 4000 rows and 8 columns, with several unique string patterns at 5th column.

Files *.txt
Code:
1.txt
2.txt
3.txt

Files *.xyz
Code:
1.doc
2.doc
3.doc

3.txt
Code:
OT   3328   CA    CT   268       5.800      7.500      4.700      
OT   3329   HA    CT   268       8.500      8.900      3.600      
OT   3330   NB    CT   268       6.700      5.500      7.600      
OT   3331   O     AT   269       1.200      7.700      5.500      
OT   3332   C1    AT   269       3.800      5.800      5.200 
OT   3333   C2    AT   269       8.800      0.800      0.200 
OT   3334   O     VT   270       9.800      2.800      5.600 
OT   3335   C1    VT   270       5.200      5.132      2.031
OT   3336   C2    VT   270       0.236      5.234      8.351

3.xyz
Code:
OT   3328   CA    CT   268       5.800      7.500      4.700        
OT   3329   NB    CT   268       6.700      5.500      7.600      
OT   3330   O     AT   269       1.200      7.700      5.500

Tasks:

(Step-1) At 5th column of '3.xyz' file, find all matching patterns in '3.txt' file.

(Step-2) Write the entire row into a new file, based on the condition in step-1 above.

This is how the output looks like:

newfile.dat

Code:
OT   3328   CA    CT   268       5.800      7.500      4.700      
OT   3329   HA    CT   268       8.500      8.900      3.600      
OT   3330   NB    CT   268       6.700      5.500      7.600      
OT   3331   O     AT   269       1.200      7.700      5.500      
OT   3332   C1    AT   269       3.800      5.800      5.200 
OT   3333   C2    AT   269       8.800      0.800      0.200

I tried awk to get the expected output, such that it creates a field array and compares file *.txt with file *.xyz and prints the corresponding matching values into a new file.

Code:
awk  '{FS="|"} NF==5 {acc[$5]=5} NF>1 {if( ( $5 in acc ) ) {print  $1"|"$2"|"$3"|"$4"|"$5"|"$6"|"$7|"$8} }' 3.xyz 3.txt

And, for iteration to multiple files in directory:

Code:
#!/bin/bash

for  d in `ls *`
do
  awk '{FS="|"} NF==5 {acc[$5]=5} NF>1 {if( (  $5 in acc ) ) {print $1"|"$2"|"$3"|"$4|"$5"|"$6"|"$7|"$8} }' $.xyz $.txt  $d > EF_$d
done

Awk error shows 'unterminated string', yet I check the code and coudnt find solution. Please help.

Thank you for your time and attention.

-A
# 2  
Old 09-14-2010
try this,
Code:
 awk 'NR==FNR{a[$5]=++i;next} { if ( $5 in a) {print $0}}' 3.xyz 3.txt

These 2 Users Gave Thanks to pravin27 For This Post:
# 3  
Old 09-14-2010
Thanks, Pravin for the helpful reply. The code work perfectly at command prompt. Would you please further help on how to do an iteration script for multiple files in directory using the code you gave ? I tried this:

Code:
!/bin/bash

for d in `ls *`
do
  awk 'NR==FNR{a[$5]=++i;next} { if ( $5 in a) {print $0}}' $.xyz $.txt  > EF_$d
done

The $.xyz and $.txt have same numeric string, and only file extension differs.
The EF_$d is to write the output correspondingly.

Thanks so much for your kind help.

-A

Last edited by asanjuan; 09-14-2010 at 07:02 AM..
# 4  
Old 09-14-2010
try this,

Code:
#!/bin/sh

for i in {1..4}
do
awk 'NR==FNR{a[$5]=++i;next} { if ( $5 in a) {print $0}}' $i.xyz $i.txt > EF_$i
done

These 2 Users Gave Thanks to pravin27 For This Post:
# 5  
Old 09-14-2010
I used this Command to pull the certainly a quite nearing to output, but not exactly.

Code:
gzmore dealerlistservice_LogS[1-5]_2010*.log.gz |  grep -i ClientError.700 |  cut -b2470-2590 | awk -RS '/<fault(c|s).*>.*<\/fault.*>/'

So here is the output from the above command.

Quote:

Line 1: V2.xsd"><SOAP-ENV:Fault><faultcode>ClientError.700</faultcode><faultstring>Application processing error</faultstring><faultact

Line 2: -ENV:Fault><faultcode>ClientError.700</faultcode><faultstring>Application processing error</faultstring><faultactor>5PS</fault

Can we still fine tune this output.
# 6  
Old 09-14-2010
Thanks so much pravin ! It works well.

-A
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete patterns matching

Delete patterns matching OS version: RHEL 7.3 Shell : Bash I have a file like below (pattern.txt). I need to delete all lines starting with the following words (words separated by comma below) and ) character. LOGGING, NOCOMPRESS, TABLESPACE , PCTFREE, INITRANS, MAXTRANS, STORAGE,... (3 Replies)
Discussion started by: John K
3 Replies

2. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

3. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Hi, I have multiple files in my log folder. e.g: a_m1.log b_1.log c_1.log d_1.log b_2.log c_2.log d_2.log e_m1.log a_m2.log e_m2.log I need to keep latest 10 instances of each file. I can write multiple find commands but looking if it is possible in one line. m file are monthly... (4 Replies)
Discussion started by: wahi80
4 Replies

4. Shell Programming and Scripting

Replacing matched patterns in multiple files with awk

Hello all, I have since given up trying to figure this out and used sed instead, but I am trying to understand awk and was wondering how someone might do this in awk. I am trying to match on the first field of a specific file with the first field on multiple files, and append the second field... (2 Replies)
Discussion started by: karlmalowned
2 Replies

5. Shell Programming and Scripting

Finding matching patterns in two files

Hi, I have requirement to find the matching patterns of two files in Unix. One file is the log file and the other is the error list file. If any pattern in the log file matches the list of errors in the error list file, then I would need to find the counts of the match. For example, ... (5 Replies)
Discussion started by: Bobby_2000
5 Replies

6. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

7. UNIX for Dummies Questions & Answers

[SOLVED] awk: matching degenerate patterns

Hi Folks, I have two arrays a: aaa bbb ccc ddd ddd aaa bbb ccc ddd ccc aaa bbb b: aaa bbb ccc aaa ccc bbb bbb aaa ccc ccc bbb aaa I want to compare row by row a(c1:c4) to b(c1:c3). If elements of 'b' match... (5 Replies)
Discussion started by: heecha
5 Replies

8. Shell Programming and Scripting

Using AWK to match CSV files with duplicate patterns

Dear awk users, I am trying to use awk to match records across two moderately large CSV files. File1 is a pattern file with 173,200 lines, many of which are repeated. The order in which these lines are displayed is important, and I would like to preserve it. File2 is a data file with 456,000... (3 Replies)
Discussion started by: isuewing
3 Replies

9. Shell Programming and Scripting

matching patterns inside a condition in awk

I have the following in an awk script. I want to do them on condition that: fext == "xt" FNR == NR { />/ && idx = ++i $2 || val = $1 next } FNR in idx { v = val] } { !/>/ && srdist = abs($1 - v) } />/ || NF == 2 && srdist < dsrmx {... (1 Reply)
Discussion started by: kristinu
1 Replies

10. Shell Programming and Scripting

Matching patterns

I have a file name in $f. If $f has "-" at the beginning, or "=", or does not have extension ".ry" or ".xt" or ".dat" then cerr would not be empty. Tried the following but having some problems. set cerr = `echo $f | awk '/^-|=|!.ry|!.xt|!.dat/'` (4 Replies)
Discussion started by: kristinu
4 Replies
Login or Register to Ask a Question