Search substring in a column of file

Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search substring in a column of file
# 1  
Old 11-21-2013
Search substring in a column of file

Hi all,

I have 2 files, the first one containing a list of ids and the second one is a master file. I want to search each id from the first file from the 5th col in the second file. The 5th column in master file has values separated by ';', if not a single value is present.

Each id must occur just once as a sub-string of $5 in the master file (which is a 20GB file).
I realize that I can use
grep -m 1 -f idfile masterfile

but it takes awfully long and it will give me all other strings that contain the id.
I can also do
awk -F"\t" ' { if ($5 ~ /one_value/) print $0} ' masterfile

but I do not know how to pass a file instead of just a single value in awk.Also it does not solve the problem of returning all strings that contain the id instead of just 1 that I want.

Can there be a smarter and faster way to do this? use join maybe?


a b c d 12;34;1 e
d e f g 67;2;567 h
i j k l 3;123 m
n o p q 321;231 r
s t u v 1223 x

1 a b c d 12;34;1 e
2 d e f g 67;2;567 h
3 i j k l 3;123 m

Last edited by ritakadm; 11-21-2013 at 01:32 PM..
# 2  
Old 11-21-2013
Why are you setting -F"\t" in your awk script when there aren't any tabs in your input files?

The following seems to produce the output you want for your sample input:
awk '
NR == FNR {
{       n = split($5, x, /;/)
        for(i = 1; i <= n; i++)
                if(x[i] in id)
                        print x[i], $0
}' idfile masterfile

If you want to try this on a Solaris/SunOS system, change awk to nawk, /usr/xpg4/bin/awk, or /usr/xpg6/bin/awk.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 3  
Old 11-21-2013
Or this string trick
awk '
NR == FNR {
{      x = ";"$5";"
        for (i in id)
                if (index(x,i))
                        print i, $0
}' idfile masterfile

Would be interesting if this is faster than Don's one. Or if index(x,i) is faster than x ~ i...
This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed question for substring search

i have this data where i am looking for a two digit number 01,03,05 or 07. if not found i should detect that . this sed command gives me the matching rows . I want the opposite , i want the rows if the match is NOT found . also the sed command is only looking for 01, can i add 03, 05, 07 to... (7 Replies)
Discussion started by: boncuk
7 Replies

2. Shell Programming and Scripting

Match substring from a column of the second file

I want to merge the lines by matching substring of the first file with first column of the second file. file1: S00739A_ACAGTG_L001_R1.fq.gz S00739A_ACAGTG_L001_R2.fq.gz S00739B_GCCAAT_L001_R1.fq.gz S00739B_GCCAAT_L001_R2.fq.gz S00739D_GTGAAA_L001_R1.fq.gz S00739D_GTGAAA_L001_R2.fq.gz... (14 Replies)
Discussion started by: yifangt
14 Replies

3. UNIX for Dummies Questions & Answers

Search and add the column in the file

Hi All, I have the Overview.csv file like below format Message ID Sendout Group Name Email Subject Name Type Rcpts Responses Response Rate Open Rate Click Rate 2000009723 01-22-2014 16:14 Test_GroupPQA2013 INFO RISQUE D'INONDATION... (3 Replies)
Discussion started by: armsaran
3 Replies

4. UNIX for Dummies Questions & Answers

Print rows with substring in column

Hi I want to print all rows where there is the alphabet N in the 6th column as a substring. Here is what i tried and not working.Please help ! awk ' { if ( $6 == *"N"* ) print $0} ' awk ' { if ( "${6}" == *N* ) print $0} ' awk ' { if( grep -q N <<<$6) print $0} ' (1 Reply)
Discussion started by: newbie83
1 Replies

5. Shell Programming and Scripting

To Search for a pattern and substring text in a file

I have the following data in a text file. "A",1,"MyTextfile.CSV","200","This is ,line one" "B","EFG",23,"MyTextfile1.csv","5621",562,"This is ,line two" I want to extract the fileNames MyTextfile.CSV and MyTextfile1.csv. The problem is not all the lines are delimited with "," There are... (3 Replies)
Discussion started by: AshTrak
3 Replies

6. Shell Programming and Scripting

AWK: Substring search

Hi I have a table like this I want to know how many times the string in 2nd column appears in the first column as substring. For example the first string of 2nd column "cgt" occurs 3 times in the 1st column and "acg" one time. So my desired output is THank you very much in advance:) (14 Replies)
Discussion started by: polsum
14 Replies

7. Shell Programming and Scripting

File 1 Column 1 value search in File 2 records, then loop?

Summary: I planned on using Awk to grab a value from File 1 and search all records/fields in file 2. If there is a match in File 2, print the first column value of the record of the match of File2. Continue this search until the end of file 2. Once at the end of file 2, grab the next value in... (4 Replies)
Discussion started by: Incog
4 Replies

8. Shell Programming and Scripting

How to search for a word in a particular column of a file

How to search for a word like "computer" in a column (eg: 4th field) of a '***' delimited file and add a column at the end of the record which denotes 'Y' if present and 'N' if not. After this, we need to again check for words like 'Dell' but not 'DellXPS' in 5th field and again add another column... (5 Replies)
Discussion started by: Jassz
5 Replies

9. UNIX for Dummies Questions & Answers

search for string and return substring

Hi, I have a file with the following contents: I need to create a script or search command that will search for this string 'ENDC' in the file. This string is unique and only occurs in one record. Once it finds the string, I would like it to return positions 101-109 ( this is the date of... (0 Replies)
Discussion started by: Lenora2009
0 Replies

10. Shell Programming and Scripting

Search name and display column from a file

Hi I have search everywhere for this but I haven't got any result. so here is my question? I am trying to ask user to enter a name and then searching that name from a file and from a specific column. If user enter something, it should only displaying that name from that specific column and If the... (3 Replies)
Discussion started by: sillyha
3 Replies
Login or Register to Ask a Question