filter the uniq record problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting filter the uniq record problem
# 8  
Old 05-13-2009
I am assuming that duplicate rows are adjacent, as it seems from your input sample.
If not, then just add a prefix "sort inputfile.txt |" on the command line.
Under this assumption we need only a buffer r as big as the longest input row.
Without this assumption we need to store in memory all the rows, because it is not known if any row has a duplicate until the end of the file is read.

Code:
nawk -F'|' '$1==f{c=0;next}{if(c)print r;c=1;r=$0;f=$1}END{if(c)print r}'

I believe it cannot be shortened...

As a korn shell script:
Code:
#!/usr/bin/ksh

c=false

while IFS='|' read -r f1 fr; do
  [[ $f1 = $f ]] && { c=false; continue; }
  $c && print -r -- $r
  c=true; f=$f1; r="$f1|$fr"
done

$c && print -r -- $r


Last edited by colemar; 05-13-2009 at 08:45 AM..
# 9  
Old 05-13-2009
Another possible solution is reading the input file twice:

Code:
awk -F\| 'NR==FNR{_[$1]++;next}_[$1]<2' infile infile

Or:

Code:
awk 'BEGIN { 
  ARGV[ARGC++] = ARGV[ARGC-1]
  FS = "|"
  }
NR == FNR {
  _[$1]++; next
  }
_[$1] < 2' infile

Otherwise:

Code:
awk -F\| 'END { for (K in k) print r[K] }
k[$1]++ { delete k[$1] } { r[$1] = $0 }
' infile


Last edited by radoulov; 05-13-2009 at 09:09 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies

2. Shell Programming and Scripting

Filter uniq field values (non-substring)

Hello, I want to filter column based on string value. All substring matches are filtered out and only unique master strings are picked up. infile: 1 abcd 2 abc 3 abcd 4 cdef 5 efgh 6 efgh 7 efx 8 fgh Outfile: 1 abcd 4 cdef 5 efgh 7 efxI have tried awk '!a++; match(a, $2)>0'... (32 Replies)
Discussion started by: yifangt
32 Replies

3. Shell Programming and Scripting

Delete record filter by column

Dear friend, I have a file 2 files with column wise FILE_A ------------------------------ x,1,@ y,3,$ x,5,% FILE_B -------------------- x,1,@ i like to delete the all lines in FILE_A ,if first column available in FILE_B. output (in FILE_A) y,3,$ x,5,% (10 Replies)
Discussion started by: Jewel
10 Replies

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... (17 Replies)
Discussion started by: lathigara
17 Replies

5. Shell Programming and Scripting

awk-filter record by another file

I have file1 3049 3138 4672 22631 45324 112382 121240 125470 130289 186128 193996 194002 202776 228002 253221 273523 284601 284605 641858 (8 Replies)
Discussion started by: biomed
8 Replies

6. Shell Programming and Scripting

sort & uniq on specific fields problem

Hello; I have the output data set from: egrep -i 'warning| error| fail' /var/adm/syslog/syslog.log Jan 31 12:02:18 fidsrv vmunix: LVM: WARNING: VG 128 0x001000: LV 5: Some I/O requests to this LV are waiting Jan 31 12:02:23 fidsrv vmunix: Asynchronous write failed on LUN (dev=0x100000f)... (3 Replies)
Discussion started by: delphys
3 Replies

7. Shell Programming and Scripting

Keep the last uniq record only

Hi folks, Below is the content of a file 'tmp.dat', and I want to keep the uniq record (key by first column). However, the uniq record should be the last record. 302293022|2|744124889|744124889 302293022|3|744124889|744124889 302293022|4|744124889|744124889 302293022|5|744124889|744124889... (4 Replies)
Discussion started by: ChicagoBlues
4 Replies

8. Shell Programming and Scripting

Filter record from a file

Reposting since I didnt not get any reply. I have a problem while filtering records from a file. Can somebody help please? For eg: Consider the below files Record file: 0003@00000000000190@20100401@201004012010040120100401@003@... (1 Reply)
Discussion started by: gpaulose
1 Replies

9. Shell Programming and Scripting

filter record from a file reading another file

Hi, I want to filter record from a file if the records in the second column matches the data in another file. I tried the below awk command but it filters the records in the filter file. I want the opposite, to include only the records in the filter file. I tried this: awk -F'|'... (8 Replies)
Discussion started by: gpaulose
8 Replies

10. Shell Programming and Scripting

Help in writing a KSH script to filter the latest record?

Hi All, I have a text file with the folowing content. BANGALORE|1417|2010-02-04 08:41:04.174|dob|xxx BANGALORE|1416|2010-02-04 08:23:19.566|dob|yyy BANGALORE|1415|2010-02-04 08:20:14.497|dob|aaa BANGALORE|1414|2010-02-04 08:19:40.065|dob|vvv BANGALORE|1413|2010-02-04... (4 Replies)
Discussion started by: Karpak
4 Replies
Login or Register to Ask a Question