Search for string in column using variable: awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search for string in column using variable: awk
# 1  
Old 02-03-2018
Search for string in column using variable: awk

I'm interested to match column pattern through awk using an external variable for data:

Code:
-9	1:751343:T:A	-9	0	T	A	0.726	-5.408837e-03	9.576603e-03	7.967536e-01	5.722312e-01
-9	1:751756:T:C	-9	0	T	C	0.727	-5.360458e-03	9.579447e-03	7.966977e-01	5.757858e-01
-9	1:752566:G:A	-9	0	G	A	0.331	6.583382e-03	8.958503e-03	7.950995e-01	4.624419e-01
-9	1:753425:T:C	-9	0	T	C	0.295	7.321481e-03	
-9	3:60197:G:A	-9	0	G	A	0.918	1.480658e-03	1.554497e-02	7.950968e-01	9.241192e-01
-9	3:60202:C:G	-9	0	C	G	0.989	-2.318091e-02	2.707507e-02	7.947803e-01	5.699114e-01
-9	22:51228888:T:G	-9	0	T	G	0.737	-7.274594e-03	1.073497e-02	7.928675e-01	4.980153e-01
-9	22:51228910:G:A	-9	0	G	A	0.791	-6.978814e-03	1.147448e-02	7.936905e-01	5.430739e-01
-9	22:51229455:G:C	-9	0	G	C	0.965	4.726587e-03	2.609153e-02	7.949339e-01	8.562523e-01
-9	22:51229491:G:A	-9	0	G	A	0.970	5.992810e-02	2.711477e-02	7.917267e-01	2.712828e-02
-9	22:51229591:A:G	-9	0	A	G	0.988	1.235893e-01	4.360370e-02	7.923663e-01	4.605606e-03
-9	22:51229717:A:T	-9	0	A	T	0.791	-4.919975e-03	1.159186e-02	7.941657e-01	6.712634e-01

The data are stored in more.txt file.

Code:
for i in {1..22} 
do  

printf "$i\n" 

awk -v chr=$i '{

if ($2 ~ /^chr:/ )
{
print $0 
} 
}' more.txt

done

Issue:
The match pattern doesn't pick chr variable. It works fine if I put number (hard coded). It prints chr variable correctly though.

Only numbers through 1 to 22 are printed for the for loop.

Darwin 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov 9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64
OSX 10.13.2

Last edited by genome; 02-03-2018 at 06:47 PM.. Reason: edited data
# 2  
Old 02-03-2018
Putting an awk variable name inside double-quotes or inside slashes in an awk script turns it into a literal string; not the name of a variable to be expanded. Although the extended regular expressions are usually written with slashes as delimiters, in reality all that awk requires is a string (a constant string between double-quotes, a constant string between slashes, or a variable containing a string).

Try:
Code:
for i in {1..22} 
do
	printf "$i\n" 

	awk -v chr=$i '
	{
		if ($2 ~ ("^" chr ":") )
		{
			print $0 
		} 
	}' more.txt
done

or, very slightly more efficiently:
Code:
for i in {1..22} 
do
	printf "$i\n" 

	awk -v ERE="^$i:" '
	{
		if ($2 ~ ERE)
		{
			print $0 
		} 
	}' more.txt
done

# 3  
Old 02-03-2018
Thank you.

Code:
for i in {1..22}
do



awk -v chr="^$i:" '{                                                                                                                                                                                        
                                                                                                                                                                                                            
if ($2 ~ chr )                                                                                                                                                                                              
{                                                                                                                                                                                                           
print $0                                                                                                                                                                                                    
}                                                                                                                                                                                                           
}' more.txt

done

I was about to post modified code. Smilie
# 4  
Old 02-04-2018
Did you consider using awk's default behaviour for condensing the script to
Code:
for i in {1..22}
  do    printf "$i\n"
        awk -v chr="^$i:" '$2 ~ chr' more.txt
  done

EDIT: Are you aware that your script creates 22 processes to run awk in either, opening and reading more.txt 22 times? That's quite expensive, resourcewise. How about one single awk invocation and one single file read for all:
Code:
awk '
        {TMP = $2
         sub (/:.*$/, "", TMP)
         BUF[TMP, ++CNT[TMP]] = $0
        }

END     {for (i=1; i<=22; i++)  {print i
                                 for (c=1; c<=CNT[i]; c++) print BUF[i, c]
                                }
        }
'  more.txt


Last edited by RudiC; 02-04-2018 at 05:23 AM..
# 5  
Old 02-04-2018
Quote:
Originally Posted by RudiC
Did you consider using awk's default behaviour for condensing the script to
Code:
for i in {1..22}
  do    printf "$i\n"
        awk -v chr="^$i:" '$2 ~ chr' more.txt
  done

Oh, no print needed? Thanks. It's pretty.


Quote:
Originally Posted by RudiC
EDIT: Are you aware that your script creates 22 processes to run awk in either, opening and reading more.txt 22 times? That's quite expensive, resourcewise. How about one single awk invocation and one single file read for all:
Code:
awk '
        {TMP = $2
         sub (/:.*$/, "", TMP)
         BUF[TMP, ++CNT[TMP]] = $0
        }

END     {for (i=1; i<=22; i++)  {print i
                                 for (c=1; c<=CNT[i]; c++) print BUF[i, c]
                                }
        }
'  more.txt

Yes, I was loading and reading file 22 times. Sorry, can't understand your code.
How and what logic is being implemented.

Thank you as always for your reply.
# 6  
Old 02-04-2018
Code:
awk '
        {BUF[TMP=substr($2,1,index($2,":")-1), ++CNT[TMP]] = $0                 # store the input file in memory with index based on $2 and in increasing order
        }

END     {for (i=1; i<=22; i++)  {print i                                        # create sequence No. (1 .. 22) and print it
                                 for (c=1; c<=CNT[i]; c++) print BUF[i, c]      # print the input for this sequence number - if exists - in increasing order
                                }                                               # if it does not exist, CNT defaults to zero, and loop is not entered.
        }
' more.txt

# 7  
Old 02-14-2018
Code:
BUF[TMP=substr($2,1,index($2,":")-1), ++CNT[TMP]]

I am totally incapable of understanding this code snippet. Before comma I can see splitting, substring but , ++CNT[TMP]
May you please help?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk variable search and line count between variable-search pattern

Input: |Running the Rsync|Sun Oct 16 22:48:01 BST 2016 |End of the Rsync|Sun Oct 16 22:49:54 BST 2016 |Running the Rsync|Sun Oct 16 22:54:01 BST 2016 |End of the Rsync|Sun Oct 16 22:55:45 BST 2016 |Running the Rsync|Sun Oct 16 23:00:02 BST 2016 |End of the Rsync|Sun Oct 16 23:01:44 BST 2016... (4 Replies)
Discussion started by: busyboy
4 Replies

2. Shell Programming and Scripting

awk search and replace nth column by using a variable.

I am passing a variable and replace nth value with the variable. I tried using many options in awk command but unable to ignore the special characters in the output and also unable to pass the actual value. Input : "1","2","3" Output : "1","1000","3" TempVal=`echo 1000` Cat... (2 Replies)
Discussion started by: onesuri
2 Replies

3. UNIX for Advanced & Expert Users

Pass variable to awk command search string

I must have forgot how to do this, but, I am attempting to enter a variable into an awk / gawk search pattern. I am getting a value from user input to place in a specific section of a 132 character string. my default command is .... gawk --re-interval '/^(.{3}P .{4}CYA.{8}1)/' ... (3 Replies)
Discussion started by: sdeevers
3 Replies

4. Shell Programming and Scripting

Pass column number as variable to awk and compare with a string.

Hi All, I have a file test.txt. Content of test.txt : 1 vinay se 2 kumar sse 4 kishore tl I am extracting the content of file with below command. awk '$2 ~ "vinay" {print $0}' test.txt Now instead of hardcoding $2 is there any way pass $2 as variable and compare with a... (7 Replies)
Discussion started by: Girish19
7 Replies

5. UNIX for Advanced & Expert Users

Recursively search the string from a column in no. of files

i have a file named keyword.csv(contains around 8k records) which contains a no. of columns. The 5th column contains all the keywords. I want to recursively search these keywords in all .pl files(around 1k) and display the filename....Afterthat i will use the filename and some of the column from... (3 Replies)
Discussion started by: millan
3 Replies

6. Shell Programming and Scripting

Search several string and convert into a single line for each search string using awk command AIX?.

I need to search the file using strings "Request Type" , " Request Method" , "Response Type" and by using result set find the xml tags and convert into a single line?. below are the scenarios. Cat test Nov 10, 2012 5:17:53 AM INFO: Request Type Line 1.... (5 Replies)
Discussion started by: laknar
5 Replies

7. Shell Programming and Scripting

Search for string in a file and extract another string to a variable

Hi, guys. I have one question: I need to search for a string in a file, and then extract another string from the file and assign it to a variable. For example: the contents of the file (group) is below: ... ftp:x:23: mail:x:34 ... testing:x:2001 sales:x:2002 development:x:2003 ...... (6 Replies)
Discussion started by: daikeyang
6 Replies

8. Shell Programming and Scripting

Search in a column by a string

Hi All, My file looks like : hsdhj dsajhf jshdfajkh jksdhfj jkdhsfj shfjhd shdf hdsfjkh jsdfhj hdshf sdjh dhs foot dsjhfj jdshf dasfh jdsh dsjfh jdfshj david Now, I want to search entire column by a string... (10 Replies)
Discussion started by: naw_deepak
10 Replies

9. Shell Programming and Scripting

Search for string dublicates in column

Hi I have a file with one column. There are a few replicas in this column, that is some lines look exactly the same. I want to know the ones that occur twice. Inputfile.xml "AAH.dbEUR" "ECT.dbEUR" "AEGN.dbEUR" "AAH.dbEUR" "AKZO.dbEUR" ... Here I would like to be informed that... (7 Replies)
Discussion started by: lulle
7 Replies

10. Shell Programming and Scripting

String search and return value from column

Dear All I had below mention file as my input file. 87980457 Jan 12 2008 2:00AM 1 60 BSC1 81164713 Jan 12 2008 3:00AM 1 60 BSC2 78084521 Jan 12 2008 4:00AM 1 60 BSC3 68385193... (3 Replies)
Discussion started by: jaydeep_sadaria
3 Replies
Login or Register to Ask a Question