Visit Our UNIX and Linux User Community


Grep or awk a unique and specific word across many fields

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Grep or awk a unique and specific word across many fields
# 1  
Old 06-15-2017
Grep or awk a unique and specific word across many fields

Hi there,

I have data with similar structure as this:
Code:
CHR	START-SNP	END-SNP	REF	ALT	PATIENT1	PATIENT2	PATIENT3	PATIENT4
chr1	69511	69511	A	G	homo	hetero	homo	hetero
chr2	69513	69513	T	C	.	hetero	homo	hetero
chr3	69814	69814	G	C	.	.	homo	homo
chr4	69815	69815	C	A	hetero	.	.	hetero

is there a way to report a string the whole string if words such homo or hetero is found across columns not minding fields with dots (.) which mean unknown. So the data looks like this:
Code:
CHR	START-SNP	END-SNP	REF	ALT	PATIENT1	PATIENT2	PATIENT3	PATIENT4
chr3	69814	69814	G	C	.	.	homo	homo
chr4	69815	69815	C	A	hetero	.	.	hetero


Thanks Smilie
# 2  
Old 06-15-2017
Not clear. You want lines to be printed to stdout if "words" (is that field contents?) other than "." occur twice (or more) / exactly twice in that line? Is that any field or fields starting from $6? Is that any "words" or just the "words" "homo" and "hetero"?
# 3  
Old 06-15-2017
lines to be printed if homo or hetero which are field contents are constant (more than 2) across that string other than "." starting from the $6 and just the words homo and hetero

Thanks
# 4  
Old 06-15-2017
In the line starting with "chr1", "homo" count is 2 as is "hetero" count. Should that print or not, i.e. are more than one items allowed?
# 5  
Old 06-15-2017
it shouldn't as the next filed has hetero. it should look for all fields after the 6th column.

I am trying to create to separate files one with hetero and one with homo. if its easier to code that way.

input:
Code:
CHR	START-SNP	END-SNP	REF	ALT	PATIENT1	PATIENT2	PATIENT3	PATIENT4
chr1	69511	69511	A	G	homo	hetero	homo	hetero
chr2	69513	69513	T	C	.	hetero	homo	hetero
chr3	69814	69814	G	C	.	.	homo	homo
chr4	69815	69815	C	A	hetero	.	.	hetero

when grep/awk for hetero
output 1:
Code:
CHR	START-SNP	END-SNP	REF	ALT	PATIENT1	PATIENT2	PATIENT3	PATIENT4
chr4	69815	69815	C	A	hetero	.	.	hetero

when grep/awk for homo
output 2:
Code:
CHR	START-SNP	END-SNP	REF	ALT	PATIENT1	PATIENT2	PATIENT3	PATIENT4
chr3	69814	69814	G	C	.	.	homo	homo

BTW the file I have has many PATIENT1-10000 columns
# 6  
Old 06-15-2017
Please become accustomed to provide decent context info of your problem.
It is always helpful to support a request with system info like OS and shell, related environment (variables, options), preferred tools, and adequate (representative) sample input and desired output data and the elaborate (!) logics connecting the two, to avoid ambiguities and keep people from guessing.

For your above problem, try
Code:
awk '
        {split ("",  C)
         for (i=6; i<=NF; i++) C[$i]++
         CM = C["homo"]
         CR = C["hetero"]
        }
(CM > 1) && !CR ||
(CR > 1) && !CM ||
NR == 1
' file

This User Gave Thanks to RudiC For This Post:
# 7  
Old 06-16-2017
Worked like a charm.

Just a question if I want to the separation to include equal or more than 1.

Do I have to modify the code to this:
Code:
awk '
        {split ("",  C)
         for (i=6; i<=NF; i++) C[$i]++
         CM = C["homo"]
         CR = C["hetero"]
        }
(CM > 0) && !CR ||
(CR > 0) && !CM ||
NR == 1
' file

Thanks

Previous Thread | Next Thread
Test Your Knowledge in Computers #537
Difficulty: Medium
An enumerated data type allows a user to define a list of keywords associated with integers.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting max value of specific fields with awk

Hello All, Here is am trying to get maximum value of third field depending on first,second and fourth fields with awk command . delimeter is pipe(|) . input 0221|09|14.25|aaa 0221|09|44.27|aaa 0221|09|44.33|aaa 0221|09|44.53|bbb 0221|09|34.32|bbb 0221|09|37.13|bbb... (5 Replies)
Discussion started by: sayami00
5 Replies

2. Shell Programming and Scripting

awk to match keyword and return matches and unique fields

Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2, which are the unique id's, but they only appear once. Thank you :). file name 31 Index Chromosomal Position Gene Inheritance 122 2106725 TSC2 AD 124 2115481 TSC2 AD 121 2105400 TSC2 AD... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Need a word which just comes next to after grep of a specific word

Hi, Below is an example : ST1 PREF: int1 AVAIL: int2 ST2 PREF :int1 AVAIL: int2 I need int1 to come in preferred variable while programming and int2 in available variable Please help me doing so Best regards, Vishal (10 Replies)
Discussion started by: Vishal_dba
10 Replies

4. Shell Programming and Scripting

Print unique names in a specific column using awk

Is it possible to modify file like this. 1. Remove all the duplicate names in a define column i.e 4th col 2. Count the no.of unique names separated by ";" and print as a 5th col thanx in advance!! Q input c1 30 3 Eh2 c10 96 3 Frp c41 396 3 Ua5;Lop;Kol;Kol c62 2 30 Fmp;Fmp;Fmp ... (5 Replies)
Discussion started by: quincyjones
5 Replies

5. Shell Programming and Scripting

Print unique names in each row of a specific column using awk

Is it possible to remove redundant names in the 4th column? input cqWE 100 200 singapore;singapore AZO 300 400 brazil;america;germany;ireland;germany .... .... output cqWE 100 200 singapore AZO 300 400 brazil;america;germany;ireland (4 Replies)
Discussion started by: quincyjones
4 Replies

6. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

7. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies

8. Shell Programming and Scripting

Grep out specific word and only that word

ok, so this is proving to be kind of difficult even though it should not be. say for instance I want to grep out ONLY the word fkafal from the below output, how do I do it? echo ajfjf fjfjf iafjga fkafal foeref afoafahfia | grep -w "fkafal" If i run the above command, i get back all the... (4 Replies)
Discussion started by: SkySmart
4 Replies

9. Shell Programming and Scripting

grep a word from a specific line

for example: searches only for single word for single word this is line three match=$(grep -n -e "single" data.txt) this command will stored "..... single ...... single" into search. how can i grep the single word just from line 2 only?? (3 Replies)
Discussion started by: blurboy
3 Replies

Featured Tech Videos