Match substring from a column of the second file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match substring from a column of the second file
# 1  
Old 09-08-2017
Match substring from a column of the second file

I want to merge the lines by matching substring of the first file with first column of the second file.
Code:
file1:

S00739A_ACAGTG_L001_R1.fq.gz
S00739A_ACAGTG_L001_R2.fq.gz
S00739B_GCCAAT_L001_R1.fq.gz
S00739B_GCCAAT_L001_R2.fq.gz
S00739D_GTGAAA_L001_R1.fq.gz
S00739D_GTGAAA_L001_R2.fq.gz
S00739E_ATCACG_L001_R1.fq.gz
S00739E_ATCACG_L001_R2.fq.gz

Code:
file2:

S00739A WT-1
S00739B WT-2
S00739D mt-1
S00739E mt-2

output:
Code:
S00739A_ACAGTG_L001_R1.fq.gz WT-1
S00739A_ACAGTG_L001_R2.fq.gz WT-1
S00739B_GCCAAT_L001_R1.fq.gz WT-2
S00739B_GCCAAT_L001_R2.fq.gz WT-2
S00739D_GTGAAA_L001_R1.fq.gz mt-1
S00739D_GTGAAA_L001_R2.fq.gz mt-1
S00739E_ATCACG_L001_R1.fq.gz mt-2
S00739E_ATCACG_L001_R2.fq.gz mt-2

This is my try, but this part has problem A[1]=$0 that makes the script invalid:
Code:
awk 'FNR==NR {split($1, A, "_"); A[1]=$0; next} {print A[$1], $2}' file1 file2

Thanks for any help!

Last edited by yifangt; 09-08-2017 at 02:10 PM.. Reason: explanation
# 2  
Old 09-08-2017
Hello yifangt,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{a[$1]=$2;next} ($1 in a){print $0,a[$1]}'  Input_file2  FS="_"  Input_file1

Output will be as follows.
Code:
S00739A_ACAGTG_L001_R1.fq.gz WT-1
S00739A_ACAGTG_L001_R2.fq.gz WT-1
S00739B_GCCAAT_L001_R1.fq.gz WT-2
S00739B_GCCAAT_L001_R2.fq.gz WT-2
S00739D_GTGAAA_L001_R1.fq.gz mt-1
S00739D_GTGAAA_L001_R2.fq.gz mt-1
S00739E_ATCACG_L001_R1.fq.gz mt-2
S00739E_ATCACG_L001_R2.fq.gz mt-2

Thanks,
R. Singh
# 3  
Old 09-08-2017
No, there is no output.
I am using GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
# 4  
Old 09-08-2017
Quote:
Originally Posted by yifangt
No, there is no output.
I am using GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
Hello yifangt,

Not sure, it worked perfectly fine for me as per POST#2 only, so there could be 2 possibilities in my point of view.

I- Either there could be carriage characters present into your Input_file, you could try with command cat -v Input_file, if you see carriage return characters then you could use command tr -d '\r' < Input_file > temp_file && mv temp_file Input_file.
II- Second option could be in case on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk.

Kindly do let me know how it goes then.

Thanks,
R. Singh
# 5  
Old 09-08-2017
Did you give the input files in the correct order?
It is important to first process the "mapping file", then the file that corresponds to the output file.
And the FS="_" is to be set after the "mapping file".
# 6  
Old 09-08-2017
Thanks Ravi!
There was an issue with the weird carriage return in one of the files. It worked fine now!
I understand you changed the order of the files, and you used FS="_" for the second file, and hope this respond to what MadeinGermany reminded me.
But, I still have the question why the part A[1]=$0 in my script does not work.
# 7  
Old 09-08-2017
Quote:
Originally Posted by yifangt
Thanks Ravi!
There was an issue with the weird carriage return in one of the files. It worked fine now!
I understand you changed the order of the files, and you used FS="_" for the second file, and hope this respond to what MadeinGermany reminded me.
But, I still have the question why the part A[1]=$0 in my script does not work.
Hello yifangt,

So in your code why A[$1] is not working because when Input_file1 is being read then $1 will be whole line eg-->S00739A_ACAGTG_L001_R1.fq.gz and when Input_file2 is being read then $1 will be S00739A, so that is why A[$1]'s value will always be NULL and it will not print anything over there, kindly do let me know if you have any queries on same.

Thanks,
R. Singh
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all, trying this using shell/bash with sed/awk/grep I have two files, one containing one column, the other containing multiple columns (comma delimited). file1.txt abc12345 def12345 ghi54321 ... file2.txt abc1,text1,texta abc,text2,textb def123,text3,textc gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies

2. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

3. Shell Programming and Scripting

Parsing the longest match substring

Hello gurus, I have a database of possible primary signal strings pp22 pt22dx pp22dx jty2234 Also I have a list of scrambled signals which has a shorter string and a longer string separated by // (double slash ). Always the shorter string of a scrambled signal will have the primary... (6 Replies)
Discussion started by: senhia83
6 Replies

4. Shell Programming and Scripting

Match column 8 in file 1 with column 2 in file 2 and replace..

I am looking at the NR==FNR posts and trying to use them to achieve the following but I am not getting it. I have 2 files. I want to match column 8 in file 1 with column 2 in file 2. When they match I want to replace column 9 in file 1 with column 1 in file 2. This is and extract from file 1 ... (5 Replies)
Discussion started by: kieranfoley
5 Replies

5. Shell Programming and Scripting

How to match mandatory column in file.?

(3 Replies)
Discussion started by: Rishabh Jain
3 Replies

6. Shell Programming and Scripting

Search substring in a column of file

Hi all, I have 2 files, the first one containing a list of ids and the second one is a master file. I want to search each id from the first file from the 5th col in the second file. The 5th column in master file has values separated by ';', if not a single value is present. Each id must occur... (2 Replies)
Discussion started by: ritakadm
2 Replies

7. Shell Programming and Scripting

Match same file column data

File A B07 U51C 4434 L662C 4412 B07 L64U 612 L651B 4434 B07 L11C 4434 R151B 4434 B05 L12Z 612 L51B 4434 B01 651Z 612 L651C 4434 B04 A51Z 612 L51A 4434 L07 B08D 4434 B1B 4434 B07 RU8D 4434 L51A 4434 B07 L58D 4434 B51C 4434 B07 LA8D 4434 L4B 4434 Now i want File B Output B07... (2 Replies)
Discussion started by: asavaliya
2 Replies

8. Shell Programming and Scripting

print when column match with other file

Hello all, please help. There are two file like this: file1: 1197510.0 294777.7 9666973.0 21.6 1839.8 1197510.0 294777.7 9666973.0 413.2 2075.9 1197510.0 294777.7 9666973.0 689.3 2260.0 ... (1 Reply)
Discussion started by: attila
1 Replies

9. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

10. Shell Programming and Scripting

Substring match

Hi, I want to find a file / directory with the name xxxxCELLxxx in the given path. The CELL is can be either in a UPPER or lower case. Thanks (4 Replies)
Discussion started by: youknowme
4 Replies
Login or Register to Ask a Question