awk unique count of partial match with semi-colon


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk unique count of partial match with semi-colon
# 1  
Old 06-02-2016
awk unique count of partial match with semi-colon

Trying to get the unique count of the below input, but if the text in beginning of $5 is a partial match to another line in the file then it is not unique.

awk
Code:
awk '!seen[$5]++ {n++} END {print n}' input
7

input
Code:
chr1    159174749    159174770    chr1:159174749-159174770    ACKR1
chr1    159175223    159176240    chr1:159175223-159176240    ACKR1
chr2    149225899    149228040    chr2:149225899-149228040    AK025127;MBD5
chr2    200213413    200213906    chr2:200213413-200213906    AK025127;SATB2
chr3    196050574    196050878    chr3:196050574-196050878    AK124973;TM4SF19;TM4SF19-TCTEX1D2
chr10    5042568    5042687    chr10:5042568-5042687    AKR1C2
chr10    5043696    5043883    chr10:5043696-5043883    AKR1C2
chr10    5043695    5043883    chr10:5043695-5043883    AKR1C2;AKR1C3

desired output (correct count) 4 since $5 in line 1 and 2 are the same, $5in line 3 and 4 are the same and $5 in line 6,7,8 are the same. I can only seem to count each line and the ; is causing problems, but I can not seem to fix it. Thank you Smilie.

Last edited by cmccabe; 06-02-2016 at 04:06 PM.. Reason: added details
# 2  
Old 06-02-2016
$5 in line 3 and 4 are NOT the same, nor in line 6, 7, and 8. Add split ($5, T, ";"); and then use T[1].
# 3  
Old 06-02-2016
Perhaps like so? Modifying your post:

Code:
awk '{split($5,F,/;/)} !seen[F[1]]++ {n++} END {print n}' file
4

-- edit --
Ow RudiC already gave the exact same answer...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete all lines without a trailing semi colon

shell : bash os : RHEL 7.2 I have a file like below 61265388 1-11Y5C-7690 1-11Y4Q-6763 INSERT INTO emp VALUES('oramds:test.xref','CBS_01','MIGWO161265388','61265388','N',SYSDATE); INSERT INTO emp VALUES('oramds:test.xref','COMMON','MIGWO161265388','MIG1COMMON61265388','N',SYSDATE);... (3 Replies)
Discussion started by: kraljic
3 Replies

2. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

awk partial string match and add specific fields

Trying to combine strings that are a partial match to another in $1 (usually below it). If a match is found than the $2 value is added to the $2 value of the match and the $3 value is added to the $3 value of the match. I am not sure how to do this and need some expert help. Thank you :). file ... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

awk does not find ids with semi-colon in the name

I am using awk to search $5 of the "input" file using the "list" file as the search criteria. So if the id in line 1 of "list" is found in "search" then it is counted in the ids found. However, if the line in "list" is not found in "search", then it is outputted as is missing. The awk below runs... (3 Replies)
Discussion started by: cmccabe
3 Replies

5. Homework & Coursework Questions

C++ Attempting to modify this function to read from a (;) semi-colon-separated file

After some thought. I am uncomfortable issuing my professors name where, there may be unintended side effects from any negative responses/feedback. Willing to re post if I can omit school / professor publicly, but can message moderator for validation? I am here for knowledge and understanding,... (1 Reply)
Discussion started by: briandanielz
1 Replies

6. Shell Programming and Scripting

awk pattern match and count unique in column

Hi all I have a need of searching some pattern in file by month and then count unique records D11 G11 R11 -------> Pattern available in file S11 Jan$1 to $5 column contains some records in which I want to find unique for this purpose I have written script like below awk '/Jan/ ||... (4 Replies)
Discussion started by: nex_asp
4 Replies

7. Shell Programming and Scripting

Running multiple commands stored as a semi-colon separated string

Hi, Is there a way in Korn Shell that I can run multiple commands stored as a semi-colon separated string, e.g., # vs="echo a; echo b;" # $vs a; echo b; I want to be able to store commands in a variable, then run all of it once and pipe the whole output to another program without using... (2 Replies)
Discussion started by: svhyd
2 Replies

8. Shell Programming and Scripting

awk/sed to extract column bases on partial match

Hi I have a log file which has outputs like the one below conn=24,196 op=1 RESULT err=0 tag=0 nentries=9 etime=3,712 dbtime=0 mem=486,183,328/2,147,483,648 Now most of the time I am only interested in the time ( the first column) and a column that begins with etime i.e... (8 Replies)
Discussion started by: pkabali
8 Replies

9. Shell Programming and Scripting

bash aliases and command chaining with ; (semi-colon)

What am I doing wrong here? Or is this not possible? A bug? alias f='find . >found 2>/dev/null &' f ; sleep 20 ; ls -l -bash: syntax error near unexpected token `;' (2 Replies)
Discussion started by: star_man
2 Replies

10. Shell Programming and Scripting

awk partial match and filter records

Hi, I am having file which contains around 15 columns, i need to fetch column 3,12,14 based on the condition that column 3 starts with 40464 this is the sample data how to achieve that (3 Replies)
Discussion started by: aemunathan
3 Replies
Login or Register to Ask a Question