Match substring from a column of the second file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match substring from a column of the second file
# 8  
Old 09-08-2017
I meant A[1] = $0 for the mapping part as I thought A is the array from split(). So that later A[$2] will get what I want by $2 as the key/subscript of the array. What did I miss?
# 9  
Old 09-08-2017
Quote:
Originally Posted by yifangt
I meant A[1] = $0 for the mapping part as I thought A is the array from split(). So that later A[$2] will get what I want by $2 as the key/subscript of the array. What did I miss?
Hello yifangt,

So here A[1] means array named A whose index is 1(digit one) and value is A[1]=current line of Input_file1. So now when you try to print A[$2] or A[$1] then it means it will look for $2/$1's value from current line from Input_file2 into array A(eg-->A[S00739A]) which is NOT present at all in array A. Thus it will NOT print anything then. Kindly do let me know if I was NOT clear, will try to explain more on same then.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 10  
Old 09-08-2017
split makes A[1] A[2] ...
Nothing that you can lookup via a column.
Your initial attempt would need a second array that is string-addressed.
Code:
awk 'FNR==NR {split($1, A, "_"); B[A[1]]=$0; next} {print B[$1], $2}' file1 file2

But now the output corresponds to file2.
This User Gave Thanks to MadeInGermany For This Post:
# 11  
Old 09-08-2017
Quote:
Originally Posted by yifangt
I meant A[1] = $0 for the mapping part as I thought A is the array from split(). So that later A[$2] will get what I want by $2 as the key/subscript of the array. What did I miss?
Code:
awk 'FNR==NR {split($1, A, "_"); A[1]=$0; next} {print A[$1], $2}' file1 file2

Yes, A is the receiving array of the split() function. It has index values 1 .. 4 (which never will match $1 nor $2 in file2) and is overwritten for every line read from the input file, so after reading the entire file1 it will hold the last line in A[1] and the residual fields in A[2] till A[4], never to be matched by following records from file2.
Plus, with file2 being the last file worked upon, the output - should it be generated at all - would have four lines only.
This User Gave Thanks to RudiC For This Post:
# 12  
Old 09-08-2017
I figured out the problem which should be:
Code:
awk 'FNR==NR {split($1, A, "_"); B[A[1]]=$0; next} {print B[$1],$2} ' file1 file2

Thanks again!
Aha, I am soooooo glad I got the same as MadeInGermany!

Thanks for all your input!
# 13  
Old 09-08-2017
Are you aware that you don't get your desired output from post#1 with your approach in post#12?
It would yield four lines only, and all R1.fq.gz would have disappeared
This User Gave Thanks to RudiC For This Post:
# 14  
Old 09-08-2017
Thanks RudiC!
I was too excited to notice the problem, which is a serious bug for sure.
Seems the files order must be changed because of the similarity of the _R1/R2.fq.gz file names.
Code:
awk 'FNR==NR {B[$1]=$2; next} split($1, A, "_"){print $0, B[A[1]]}' file2 file1

However, if I do not change the files order, how to fix this bug if possible, Please?
Thanks again!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all, trying this using shell/bash with sed/awk/grep I have two files, one containing one column, the other containing multiple columns (comma delimited). file1.txt abc12345 def12345 ghi54321 ... file2.txt abc1,text1,texta abc,text2,textb def123,text3,textc gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies

2. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

3. Shell Programming and Scripting

Parsing the longest match substring

Hello gurus, I have a database of possible primary signal strings pp22 pt22dx pp22dx jty2234 Also I have a list of scrambled signals which has a shorter string and a longer string separated by // (double slash ). Always the shorter string of a scrambled signal will have the primary... (6 Replies)
Discussion started by: senhia83
6 Replies

4. Shell Programming and Scripting

Match column 8 in file 1 with column 2 in file 2 and replace..

I am looking at the NR==FNR posts and trying to use them to achieve the following but I am not getting it. I have 2 files. I want to match column 8 in file 1 with column 2 in file 2. When they match I want to replace column 9 in file 1 with column 1 in file 2. This is and extract from file 1 ... (5 Replies)
Discussion started by: kieranfoley
5 Replies

5. Shell Programming and Scripting

How to match mandatory column in file.?

(3 Replies)
Discussion started by: Rishabh Jain
3 Replies

6. Shell Programming and Scripting

Search substring in a column of file

Hi all, I have 2 files, the first one containing a list of ids and the second one is a master file. I want to search each id from the first file from the 5th col in the second file. The 5th column in master file has values separated by ';', if not a single value is present. Each id must occur... (2 Replies)
Discussion started by: ritakadm
2 Replies

7. Shell Programming and Scripting

Match same file column data

File A B07 U51C 4434 L662C 4412 B07 L64U 612 L651B 4434 B07 L11C 4434 R151B 4434 B05 L12Z 612 L51B 4434 B01 651Z 612 L651C 4434 B04 A51Z 612 L51A 4434 L07 B08D 4434 B1B 4434 B07 RU8D 4434 L51A 4434 B07 L58D 4434 B51C 4434 B07 LA8D 4434 L4B 4434 Now i want File B Output B07... (2 Replies)
Discussion started by: asavaliya
2 Replies

8. Shell Programming and Scripting

print when column match with other file

Hello all, please help. There are two file like this: file1: 1197510.0 294777.7 9666973.0 21.6 1839.8 1197510.0 294777.7 9666973.0 413.2 2075.9 1197510.0 294777.7 9666973.0 689.3 2260.0 ... (1 Reply)
Discussion started by: attila
1 Replies

9. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

10. Shell Programming and Scripting

Substring match

Hi, I want to find a file / directory with the name xxxxCELLxxx in the given path. The CELL is can be either in a UPPER or lower case. Thanks (4 Replies)
Discussion started by: youknowme
4 Replies
Login or Register to Ask a Question