awk to match keyword and return matches and unique fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to match keyword and return matches and unique fields
# 1  
Old 12-29-2015
awk to match keyword and return matches and unique fields

Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2, which are the unique id's, but they only appear once. Thank you Smilie.


file
Code:
name	31	Index	Chromosomal Position	Gene	Inheritance
		122	2106725	TSC2	AD
		124	2115481	TSC2	AD
		121	2105400	TSC2	AD
		82	135782221	TSC1	AD
		81	135782026	TSC1	AD
		126	2138218	TSC2	AD
		123	2113107	TSC2	AD
		125	2126142	TSC2	AD
name2	12	Index	Chromosomal Position	Gene	Inheritance
		1	43396568	SLC2A1	AD, AR
name3	20	Index	Chromosomal Position	Gene	Inheritance
		188	2135240	TSC1	AD
		179	2103379	TSC1 AD
		191	2137899	TSC2	AD
		181	2110617	TSC2	AD
		190	2137857	TSC2	AD
		189	2137806	TSC2	AD
		186	2133798	TSC2	AD
		187	2135074	TSC2	AD
		180	2105400	TSC2	AD
		183	2122822	TSC2	AD
		192	2138218	TSC2	AD
		185	2125937	TSC2	AD
		184	2125788	TSC2	AD
		193	2138269	TSC2	AD
		182	2112981	TSC2	AD

Desired output
Code:
name	  31	Index	Chromosomal Position	Gene	Inheritance
                  82	135782221	TSC1	AD
                  81	135782026	TSC1	AD
name3  20	Index	Chromosomal Position	Gene	Inheritance
                  188	2135240	TSC1	AD
                  179	2103379	TSC1	AD
                  191	2137899	TSC1	AD


awk
Code:
awk '/TSC1/{ print $1,$2,$0 }' file.txt > output.txt


Last edited by cmccabe; 12-29-2015 at 07:59 PM.. Reason: corrected input
# 2  
Old 12-29-2015
This seems to come close to what you said you wanted to do:
Code:
awk '
/^[[:alnum:]]/ {
	h = $0
	np = 0
}
$3 == "TSC1" {
	if(np++ == 0)
		print h
	print
}' file

but with the sample input you provided, it only prints:
Code:
name      31    Index   Chromosomal Position    Gene    Inheritance
                  82    135782221       TSC1    AD
                  81    135782026       TSC1    AD

Since TSC1 does not appear anywhere in your input file after names3, I have no idea how you got the rest of the output you said you desired.

As always, if anyone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk (not nawk for this script).
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 12-29-2015
I corrected the typo in the input and apologize. Thank you Smilie.
# 4  
Old 12-29-2015
Your updated sample input now has two lines containing TSC1 after name3, your desired output still has three???

Did my suggestion do what you want done?
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 12-29-2015
I'm not in the office now and will post back tomorrow. I'm sure that will work. Thank you Smilie.
# 6  
Old 12-30-2015
Code:
awk '
NF == 7         {HD = $0 RS}
$3 == "TSC1"    {printf "%s%s\n", HD, $0
                 HD = ""
                }
' file

Still only two output lines for name3, not three...
This User Gave Thanks to RudiC For This Post:
# 7  
Old 12-31-2015
Thank you both, works great.... thank you Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

find pattern matches in consecutive lines in certain fields-awk

I have a text file with many thousands of lines, a small sample of which looks like this: InputFile:PS002,003 D -1 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 509 0 PS002,003 PSQ 0 1 7 18 1 0 -1 1 1 3 -1 -1 ... (5 Replies)
Discussion started by: jvoot
5 Replies

2. UNIX for Beginners Questions & Answers

awk match two fields in two files

Hi, I have two TEST files t.xyz and a.xyz which have three columns each. a.xyz have more rows than t.xyz. I will like to output rows at which $1 and $2 of t.xyz match $1 and $2 of a.xyz. Total number of output rows should be equal to that of t.xyz. It works fine, but when I apply it to large... (6 Replies)
Discussion started by: geomarine
6 Replies

3. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

4. UNIX for Beginners Questions & Answers

Grep or awk a unique and specific word across many fields

Hi there, I have data with similar structure as this: CHR START-SNP END-SNP REF ALT PATIENT1 PATIENT2 PATIENT3 PATIENT4 chr1 69511 69511 A G homo hetero homo hetero chr2 69513 69513 T C . hetero homo hetero chr3 69814 69814 G C . . homo homo chr4 69815 69815 C A hetero . . hetero is... (10 Replies)
Discussion started by: daashti
10 Replies

5. Shell Programming and Scripting

awk to combine matches and use a field to adjust coordinates in other fields

Trying to output a result that uses the data from file to combine and subtract specific lines. If $4 matches in each line then the last $6 value is added to $2 and that becomes the new$3. Each matching line in combined into one with $1 then the original $2 then the new$3 then $5. For the cases... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. Shell Programming and Scripting

awk unique count of partial match with semi-colon

Trying to get the unique count of the below input, but if the text in beginning of $5 is a partial match to another line in the file then it is not unique. awk awk '!seen++ {n++} END {print n}' input 7 input chr1 159174749 159174770 chr1:159174749-159174770 ACKR1 chr1 ... (2 Replies)
Discussion started by: cmccabe
2 Replies

7. Shell Programming and Scripting

awk to calculate fields only if match is found

Trying to combine the matching $5 values between file1 and file2. If a match is found then the last $6 value in the match and the sum of $7 are outputted to a new file. The awk below I hope is a good start. Thank you :). file1 chr12 9221325 9221448 chr12:9221325-9221448 A2M 1... (5 Replies)
Discussion started by: cmccabe
5 Replies

8. Shell Programming and Scripting

awk pattern match and count unique in column

Hi all I have a need of searching some pattern in file by month and then count unique records D11 G11 R11 -------> Pattern available in file S11 Jan$1 to $5 column contains some records in which I want to find unique for this purpose I have written script like below awk '/Jan/ ||... (4 Replies)
Discussion started by: nex_asp
4 Replies

9. Shell Programming and Scripting

awk Help -- If match found return the count

Hi All, I need to get the count of records in the file, if the passing parameter matches with the list of records in the file. Below is my example source file: Test1.dat 20120913 20120913 20120912 20120912 20120912 20120912 20120912 20120913 20120913 20120912 In my script I am... (5 Replies)
Discussion started by: bbc17484
5 Replies

10. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies
Login or Register to Ask a Question