Unique extraction of rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Unique extraction of rows
# 1  
Old 11-14-2013
Unique extraction of rows

I do have a tab delimited file of the following format:
Code:
Code:
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
543 hkwuy NA NA NA NA 6 NA NA NA NA 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
Each row contains 16 columns and the missing values are indicated as NA. I want to extract all the rows containing a single or more than one numeric value 2 to 15 that I specify and extract those rows.

Suppose if I want to extract the row that contain only 2. below is the output I need:
Code:
432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA

If I want to specify more than one numberr for example rows that contains only 3 10 11:
Code:
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA

I tried the following using awk to get the row containing 2:
Code:
awk -F"\t" '$3 == "2" { print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15"\t"$16 }' file.in

But I don't know how to specify rows that contain only "2" nor specify more than one number. Please let me know the best way in awk to do this extraction
# 2  
Old 11-14-2013
Do you insist on awk or would grep help as well? ?
Code:
grep -E "^[^ ]* [^ ]*.* (3|10|11) .*$" file
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
543 hkwuy NA NA NA NA 6 NA NA NA NA 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA

BTW - what if the sequence of the patterns is reversed, like 11 - 10 - 3 - would that have to be a hit or not? Plus, is that an AND condition (all three patterns must show up) or an OR (any would be sufficient)?
# 3  
Old 11-14-2013
reverse pattern is ok. But AND condition (all three patterns must show up) is a must when extracting rows with more than one numbers
# 4  
Old 11-14-2013
This seems to work, but I feel it's not quite satisfying for all imaginable constellations
Code:
grep -E "^[^ ]* [^ ]*.* 3( .*|.* )10( .*|.* )11 *.*$" file
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA

Maybe you need to run awk through every single field > 2.
# 5  
Old 11-14-2013
Does it help if we remove all NA's and leave it blank?
# 6  
Old 11-14-2013
something to start with - assuming a number may appear only once on a line.
Code:
awk -f kan.awk myFile
or
awk -v nums='10 3 11' -f kan.awk myFile

where kan.awk is:
Code:
BEGIN {
  if (!(nums)) nums="2"
  numsN=split(nums, tA,FS)
  for(i=1;i<=numsN;i++)
    numsA[tA[i]]
}
{
  found=0
  for(i=1;i<=NF;i++)
    if ($i in numsA)
     found++
}
found==numsN

# 7  
Old 11-14-2013
I'm afraid we're getting nowhere with those regexes. Try this awkthingy and come back with results:
Code:
awk     '       {P=1
                 n=split (PARA, PATT)
                 for (i=3; i<=NF; i++)
                   for (j=1; j<=n; j++) if ($i==PATT[j]) delete PATT[j]
                 for (k in PATT) if (PATT[k]) P=0
                }
         P
        ' PARA="11 3 10" file | less

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

2. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

3. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Hi Gurus, I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies

4. Shell Programming and Scripting

Delete unique rows - optimize script

Hi all, I have the following input - the unique row key is 1st column cat file.txt A response C request C response D request C request C response E request The desired output should be C request (7 Replies)
Discussion started by: varu0612
7 Replies

5. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

6. UNIX for Dummies Questions & Answers

Delete rows with unique value for specific column

Hi all I have a file which looks like this 1234|1|Jon|some text|some text 1234|2|Jon|some text|some text 3453|5|Jon|some text|some text 6533|2|Kate|some text|some text 4567|3|Chris|some text|some text 4567|4|Maggie|some text|some text 8764|6|Maggie|some text|some text My third column is my... (9 Replies)
Discussion started by: A-V
9 Replies

7. Shell Programming and Scripting

Change unique file names into new unique filenames

I have 84 files with the following names splitseqs.1, spliseqs.2 etc. and I want to change the .number to a unique filename. E.g. change splitseqs.1 into splitseqs.7114_1#24 and change spliseqs.2 into splitseqs.7067_2#4 So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies

8. Shell Programming and Scripting

Shell script to count unique rows in a CSV

HI All, I have a CSV file of 30 columns separated by ,. I want to get a count of all unique rows written to a flat file. The CSV file is around 5000 rows The first column is a time stamp and I need to exclude while counting unique Thanks, Ravi (4 Replies)
Discussion started by: Nani369
4 Replies

9. Shell Programming and Scripting

Deleting specific rows in large files having rows greater than 100000

Hi Guys, I need help in modifying a large text file containing more than 1-2 lakh rows of data using unix commands. I am quite new to the unix language the text file contains data in a pipe delimited format sdfsdfs sdfsdfsd START_ROW sdfsd|sdfsdfsd|sdfsdfasdf|sdfsadf|sdfasdf... (9 Replies)
Discussion started by: manish2009
9 Replies

10. Shell Programming and Scripting

get part of file with unique & non-unique string

I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM". I can find the line number for the beginning of the statement section with sed. ... (5 Replies)
Discussion started by: andrewsc
5 Replies
Login or Register to Ask a Question