Awk Array doesnt match for substring


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk Array doesnt match for substring
# 1  
Old 06-02-2009
Awk Array doesnt match for substring

Awk Array doesnt match for substring

Quote:
file1
cluster1,565,345
cluster2,345,345
cluster3,345,564
cluster4,345,5643
xyz.cluster1,345,64
xyz.cluster2,345,434

Quote:
file2
458,xyz.cluster1
123,cluster1
456,cluster2
767,cluster3int
Code:
nawk -F"," 'FNR==NR{a[$1]=$2 OFS $3;next} a[$2]{print $1,$2,a[$2]}' OFS="," file1 file2

I want cluster3 in file1 to match with cluster3int in file2
output getting:
Quote:
458,xyz.cluster1,345,64
123,cluster1,565,345
456,cluster2,345,345
Output required:
Quote:
458,xyz.cluster1,345,64
123,cluster1,565,345
456,cluster2,345,345
767,cluster3int,345,564
Help is appreciated
# 2  
Old 06-02-2009
where's 'cluster3int' in file1??
# 3  
Old 06-03-2009
Quote:
Originally Posted by vgersh99
where's 'cluster3int' in file1??

'cluster3int' is not in file1 but i want the code to match with substring.

cluster3 exists in file2 which is substring of cluster3int from file1 .
I want to match cluster3 from file2 with 'cluster3int' which exists in file1


Appreciate help

Thanks
# 4  
Old 06-03-2009
Quote:
Originally Posted by pinnacle
'cluster3int' is not in file1 but i want the code to match with substring.

cluster3 exists in file2 which is substring of cluster3int from file1 .
I want to match cluster3 from file2 with 'cluster3int' which exists in file1


Appreciate help

Thanks
Also, 'cluster2' exists in file2. And 'cluster2' is a 'substring' of 'cluster2' and 'xyz.cluster2' from file1.
What's the algorithm?
# 5  
Old 06-03-2009
Quote:
Originally Posted by vgersh99
Also, 'cluster2' exists in file2. And 'cluster2' is a 'substring' of 'cluster2' and 'xyz.cluster2' from file1.
What's the algorithm?

basically the file1 will have incomplete names on the trailing side from entries in file2.

So if
xyz.cluster2 in file2
can match only with

xyz.clust
or
xyz.cluste
or
xyz.clus

---------------------------------------------------------
'xyz.cluster2' in file2 ,
cluster2 in file1 has no "xyz" in leading side so its a different entry.


Hope this clarifies the requirement
# 6  
Old 06-03-2009
Code:
nawk -F"," 'FNR==NR{a["^" $1]=$2 OFS $3;next} {for (i in a) if ($2 ~ i) { print $1,$2,a[i];next}}' OFS="," file1 file2

Now... could you please explain it how it works - step by step with code comments.

Last edited by vgersh99; 06-03-2009 at 04:24 PM.. Reason: ooops - wrong index in 'print'
# 7  
Old 06-03-2009
Quote:
Originally Posted by vgersh99
Code:
nawk -F"," 'FNR==NR{a["^" $1]=$2 OFS $3;next} {for (i in a) if ($2 ~ i) { print $1,$2,a[i];next}}' OFS="," file1 file2

Now... could you please explain it how it works - step by step with code comments.

Thanks vgersh99
Here is the explanation

Code:
nawk -F"," 'FNR==NR{a["^" $1]=$2 OFS $3;next} {for (i in a) if ($2 ~ i) { print $1,$2,a[i];next}}' OFS="," file1 file2

Code:
FNR==NR

Checking for first file i.e file1

Code:
a["^" $1]=$2 OFS $3;next}

Creating array for first field of file1 with "^" appended to it
so array will be like
a[^cluster1]
a[^cluster2]
a[^cluster3]

Code:
next

the remaining code after next statement is not executed when awk is processing first file file1

Code:
{for (i in a) if ($2 ~ i) { print $1,$2,a[i];next}}

Code:
for (i in a)

looping through each element of array using awk special for loop

Code:
if ($2 ~ i)

checking for second field of file2 in each array index.
if found
print $1,$2,a[i];next


Code:
print $1,$2,a[i]

prints column 1 and 2 from file2 and array content

Code:
next

this is not needed.But doesnt hurt as well.

Code:
OFS=","

output field seperator is comma
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match substring from a column of the second file

I want to merge the lines by matching substring of the first file with first column of the second file. file1: S00739A_ACAGTG_L001_R1.fq.gz S00739A_ACAGTG_L001_R2.fq.gz S00739B_GCCAAT_L001_R1.fq.gz S00739B_GCCAAT_L001_R2.fq.gz S00739D_GTGAAA_L001_R1.fq.gz S00739D_GTGAAA_L001_R2.fq.gz... (14 Replies)
Discussion started by: yifangt
14 Replies

2. Shell Programming and Scripting

Parsing the longest match substring

Hello gurus, I have a database of possible primary signal strings pp22 pt22dx pp22dx jty2234 Also I have a list of scrambled signals which has a shorter string and a longer string separated by // (double slash ). Always the shorter string of a scrambled signal will have the primary... (6 Replies)
Discussion started by: senhia83
6 Replies

3. UNIX for Beginners Questions & Answers

Count the number of files to delete doesnt match

Good evening, need your help please Need to delete certain files before octobre 1 2016, so need to know how many files im going to delete, for instance ls -lrt file_20160*.lis!wc -l but using grep -c to another file called bplist which contains the list of all files backed up doesn match... (7 Replies)
Discussion started by: alexcol
7 Replies

4. UNIX for Dummies Questions & Answers

Deleting files based on Substring match

In folder there are files (eg ABS_18APR2012_XYZ.csv DSE_17APR2012_ABE.csv) . My requirement is to delete all the files except today's timestamp I tried doing this to list all the files not having today's date timestamp #!/bin/ksh DATE=`date +"%d%h%Y"` DIR=/data/rfs/... (9 Replies)
Discussion started by: manushi88
9 Replies

5. Shell Programming and Scripting

array and awk match function in SunOS 5.10

Hi Experts, Need help in writing a shell script in SunOS 5.10. I want to use array but it is not running in SunOs where as it is running in unix. pls help Want to print the alue store in array as below but it is giving error. p=1 p=6 p=15 p=20 for i in 1 2 3 4 do echo ${p} done ... (2 Replies)
Discussion started by: forroughuse
2 Replies

6. Shell Programming and Scripting

PERL : Sort substring occurrences in array of strings

Hi, My developer is on vacation and I am not sure if there is something which is easier for this. I have an array of strings. Each string in the array has "%" characters in it. I have to get the string(s) which have the least number of "%" in them. I know how I can get occurrences : ... (7 Replies)
Discussion started by: sinpeak
7 Replies

7. Shell Programming and Scripting

Match elements in an AWK multi-dimensional array

Hello, I have two files in the following format; file1: A B C D E F G H I J K L file2: 1 2 3 4 5 6 7 8 9 10 11 12 I have read them both in to multi-dimensional arrays. I need a file that has column 2 of the first file printed out for each column 3 of the second file ie... ... (3 Replies)
Discussion started by: cold_Que
3 Replies

8. Shell Programming and Scripting

Substring match

Hi, I want to find a file / directory with the name xxxxCELLxxx in the given path. The CELL is can be either in a UPPER or lower case. Thanks (4 Replies)
Discussion started by: youknowme
4 Replies

9. Shell Programming and Scripting

awk should output if one input file doesnt have matching key

nawk -F, 'FNR==NR{a= $3 ;next} $2 in a{print $1, 'Person',$2, a}' OFS=, filea fileb Input filea Input fileb output i am getting : (2 Replies)
Discussion started by: pinnacle
2 Replies

10. UNIX for Dummies Questions & Answers

compare two files if doesnt match then display error message

hi , i have one file ,i need to search particular word from this file and if content is matched then echo MATCHED else NOT MATCHED file contains : mr x planned to score 75% in exam but end up with 74%. word to be searched id 75% please help me out . waiting for reply thanks in advance (2 Replies)
Discussion started by: atl@mav
2 Replies
Login or Register to Ask a Question