Unique extraction of rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Unique extraction of rows
# 8  
Old 11-14-2013
I tested it with many numbers. But the output file is exactly the same as the input file and there is no extraction happening nor any error messages.
# 9  
Old 11-14-2013
Quote:
Originally Posted by Kanja
I tested it with many numbers. But the output file is exactly the same as the input file and there is no extraction happening nor any error messages.
works for me with your sample file - both RudiC's and my suggestions.
Check your input file to see if it jives with what your posted here as a sample.
# 10  
Old 11-14-2013
I think I got it right now. Thanks guys.
# 11  
Old 11-15-2013
Hi.

This bash-awk solution is similar to that vgersh99 and RudiC. Their codes could easily be modified to take the arguments from the command line like this does. This code creates the awk match string with a function and then has awk evaluate the patterns against each line, making the shell more complex, but the awk simpler.

This also makes the patterns match only exact exact strings by using word-boundary patterns.
Code:
#!/usr/bin/env bash

# @(#) s3	Demonstrate multiple matches on a line, awk

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

create_awk_args() {
  if [ $# -le 0 ]
  then
    pe " Nothing selected on command line, exit." >&2
    exit 1
  fi
  
  v1=$(
    p1='\<'
    p2='\>'
    for (( i=1;i<=$#;i++))
    do
      if [ $i -gt 1 ]
      then
        printf " && "
      fi
      printf "%s" "/$p1${!i}$p2/"
    done
  )
  db " Note: parameters are [$v1]"
}

FILE=data1

pl " Input data file $FILE:"
cat $FILE

pl " Results $*"
create_awk_args $*
awk '
'"$v1"'
' $FILE

exit 0

producing:
Code:
$ ./s3 3 10 11

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data file data1:
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
543 hkwuy NA NA NA NA 6 NA NA NA NA 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
990 Klip NA 3 NA NA 6 NA NA NA 10 111 NA NA NA NA

-----
 Results 3 10 11
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA

and for a single parameter:
Code:
$ ./s3 111

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data file data1:
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
543 hkwuy NA NA NA NA 6 NA NA NA NA 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
990 Klip NA 3 NA NA 6 NA NA NA 10 111 NA NA NA NA

-----
 Results 111
990 Klip NA 3 NA NA 6 NA NA NA 10 111 NA NA NA NA

Best wishes ... cheers, drl
# 12  
Old 11-16-2013
I wrote one lengthy code you may try Smilie

Code:
awk '

# float conversion (to avoid matching problem)
function i2f(val){
                      return sprintf("%05.2f",val)    
                 }

  NR==1{
         c = d = split(search,A)
         for( i = 1; i <= c; i++ ) search = i == 1? i2f(A[i]) : search "|" i2f(A[i])
       }
       {    
         c = d
         for( i = 3; i <= NF; i++ ){
                  
         if($i ~ /[[:digit:]]/){             
                                c = tolower(pattern_type) == "o" ? match(search,i2f($i))? --c : ++c : \
                                    tolower(pattern_type) == "m" ? match(search,i2f($i))? --c : c   : ""
                               }

         # We matched enough fields using multiple search option
         if(c == 0 && tolower(pattern_type) == "m")break

                                    }
         c = c == 0 ? 1 : 0 
        
       }c     

     ' search="3 6 10 11" pattern_type="o"   file

Code:
$ cat file
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
543 hkwuy NA NA NA NA 6 NA NA NA NA 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA

search number 2 only in row o represents only, search="2" pattern_type="o"
Code:
$ sh search.sh 
432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA

search number 2 in row m represents multi, search="2" pattern_type="m"
Code:
$ sh search.sh 
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA

search 3,6,10 and 11 only in row with pattern_type "o" search="3 6 10 11" pattern_type="o"
Code:
$ sh search.sh 
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA

search 3,6,10 and 11 in row with pattern_type "m" search="3 6 10 11" pattern_type="m"
Code:
$ sh search.sh 
431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA
633 KAT1 NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA

input o or m in pattern_type=" " field according to your need

Last edited by Akshay Hegde; 11-16-2013 at 09:45 AM.. Reason: typo
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

2. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

3. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Hi Gurus, I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies

4. Shell Programming and Scripting

Delete unique rows - optimize script

Hi all, I have the following input - the unique row key is 1st column cat file.txt A response C request C response D request C request C response E request The desired output should be C request (7 Replies)
Discussion started by: varu0612
7 Replies

5. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

6. UNIX for Dummies Questions & Answers

Delete rows with unique value for specific column

Hi all I have a file which looks like this 1234|1|Jon|some text|some text 1234|2|Jon|some text|some text 3453|5|Jon|some text|some text 6533|2|Kate|some text|some text 4567|3|Chris|some text|some text 4567|4|Maggie|some text|some text 8764|6|Maggie|some text|some text My third column is my... (9 Replies)
Discussion started by: A-V
9 Replies

7. Shell Programming and Scripting

Change unique file names into new unique filenames

I have 84 files with the following names splitseqs.1, spliseqs.2 etc. and I want to change the .number to a unique filename. E.g. change splitseqs.1 into splitseqs.7114_1#24 and change spliseqs.2 into splitseqs.7067_2#4 So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies

8. Shell Programming and Scripting

Shell script to count unique rows in a CSV

HI All, I have a CSV file of 30 columns separated by ,. I want to get a count of all unique rows written to a flat file. The CSV file is around 5000 rows The first column is a time stamp and I need to exclude while counting unique Thanks, Ravi (4 Replies)
Discussion started by: Nani369
4 Replies

9. Shell Programming and Scripting

Deleting specific rows in large files having rows greater than 100000

Hi Guys, I need help in modifying a large text file containing more than 1-2 lakh rows of data using unix commands. I am quite new to the unix language the text file contains data in a pipe delimited format sdfsdfs sdfsdfsd START_ROW sdfsd|sdfsdfsd|sdfsdfasdf|sdfsadf|sdfasdf... (9 Replies)
Discussion started by: manish2009
9 Replies

10. Shell Programming and Scripting

get part of file with unique & non-unique string

I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM". I can find the line number for the beginning of the statement section with sed. ... (5 Replies)
Discussion started by: andrewsc
5 Replies
Login or Register to Ask a Question