Sponsored Content
Top Forums Shell Programming and Scripting filter the uniq record problem Post 302315730 by colemar on Wednesday 13th of May 2009 06:22:53 AM
Old 05-13-2009
I am assuming that duplicate rows are adjacent, as it seems from your input sample.
If not, then just add a prefix "sort inputfile.txt |" on the command line.
Under this assumption we need only a buffer r as big as the longest input row.
Without this assumption we need to store in memory all the rows, because it is not known if any row has a duplicate until the end of the file is read.

Code:
nawk -F'|' '$1==f{c=0;next}{if(c)print r;c=1;r=$0;f=$1}END{if(c)print r}'

I believe it cannot be shortened...

As a korn shell script:
Code:
#!/usr/bin/ksh

c=false

while IFS='|' read -r f1 fr; do
  [[ $f1 = $f ]] && { c=false; continue; }
  $c && print -r -- $r
  c=true; f=$f1; r="$f1|$fr"
done

$c && print -r -- $r


Last edited by colemar; 05-13-2009 at 08:45 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help in writing a KSH script to filter the latest record?

Hi All, I have a text file with the folowing content. BANGALORE|1417|2010-02-04 08:41:04.174|dob|xxx BANGALORE|1416|2010-02-04 08:23:19.566|dob|yyy BANGALORE|1415|2010-02-04 08:20:14.497|dob|aaa BANGALORE|1414|2010-02-04 08:19:40.065|dob|vvv BANGALORE|1413|2010-02-04... (4 Replies)
Discussion started by: Karpak
4 Replies

2. Shell Programming and Scripting

filter record from a file reading another file

Hi, I want to filter record from a file if the records in the second column matches the data in another file. I tried the below awk command but it filters the records in the filter file. I want the opposite, to include only the records in the filter file. I tried this: awk -F'|'... (8 Replies)
Discussion started by: gpaulose
8 Replies

3. Shell Programming and Scripting

Filter record from a file

Reposting since I didnt not get any reply. I have a problem while filtering records from a file. Can somebody help please? For eg: Consider the below files Record file: 0003@00000000000190@20100401@201004012010040120100401@003@... (1 Reply)
Discussion started by: gpaulose
1 Replies

4. Shell Programming and Scripting

Keep the last uniq record only

Hi folks, Below is the content of a file 'tmp.dat', and I want to keep the uniq record (key by first column). However, the uniq record should be the last record. 302293022|2|744124889|744124889 302293022|3|744124889|744124889 302293022|4|744124889|744124889 302293022|5|744124889|744124889... (4 Replies)
Discussion started by: ChicagoBlues
4 Replies

5. Shell Programming and Scripting

sort & uniq on specific fields problem

Hello; I have the output data set from: egrep -i 'warning| error| fail' /var/adm/syslog/syslog.log Jan 31 12:02:18 fidsrv vmunix: LVM: WARNING: VG 128 0x001000: LV 5: Some I/O requests to this LV are waiting Jan 31 12:02:23 fidsrv vmunix: Asynchronous write failed on LUN (dev=0x100000f)... (3 Replies)
Discussion started by: delphys
3 Replies

6. Shell Programming and Scripting

awk-filter record by another file

I have file1 3049 3138 4672 22631 45324 112382 121240 125470 130289 186128 193996 194002 202776 228002 253221 273523 284601 284605 641858 (8 Replies)
Discussion started by: biomed
8 Replies

7. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... (17 Replies)
Discussion started by: lathigara
17 Replies

8. Shell Programming and Scripting

Delete record filter by column

Dear friend, I have a file 2 files with column wise FILE_A ------------------------------ x,1,@ y,3,$ x,5,% FILE_B -------------------- x,1,@ i like to delete the all lines in FILE_A ,if first column available in FILE_B. output (in FILE_A) y,3,$ x,5,% (10 Replies)
Discussion started by: Jewel
10 Replies

9. Shell Programming and Scripting

Filter uniq field values (non-substring)

Hello, I want to filter column based on string value. All substring matches are filtered out and only unique master strings are picked up. infile: 1 abcd 2 abc 3 abcd 4 cdef 5 efgh 6 efgh 7 efx 8 fgh Outfile: 1 abcd 4 cdef 5 efgh 7 efxI have tried awk '!a++; match(a, $2)>0'... (32 Replies)
Discussion started by: yifangt
32 Replies

10. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
UNIQ(1) 							   User Commands							   UNIQ(1)

NAME
uniq - report or omit repeated lines SYNOPSIS
uniq [OPTION]... [INPUT [OUTPUT]] DESCRIPTION
Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output). With no options, matching lines are merged to the first occurrence. Mandatory arguments to long options are mandatory for short options too. -c, --count prefix lines by the number of occurrences -d, --repeated only print duplicate lines -D, --all-repeated[=delimit-method] print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines -f, --skip-fields=N avoid comparing the first N fields -i, --ignore-case ignore differences in case when comparing -s, --skip-chars=N avoid comparing the first N characters -u, --unique only print unique lines -z, --zero-terminated end lines with 0 byte, not newline -w, --check-chars=N compare no more than N characters in lines --help display this help and exit --version output version information and exit A field is a run of blanks (usually spaces and/or TABs), then non-blank characters. Fields are skipped before chars. Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use `sort -u' without `uniq'. Also, comparisons honor the rules specified by `LC_COLLATE'. AUTHOR
Written by Richard M. Stallman and David MacKenzie. REPORTING BUGS
Report uniq bugs to bug-coreutils@gnu.org GNU coreutils home page: <http://www.gnu.org/software/coreutils/> General help using GNU software: <http://www.gnu.org/gethelp/> Report uniq translation bugs to <http://translationproject.org/team/> COPYRIGHT
Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
comm(1), join(1) The full documentation for uniq is maintained as a Texinfo manual. If the info and uniq programs are properly installed at your site, the command info coreutils 'uniq invocation' should give you access to the complete manual. GNU coreutils 8.5 February 2011 UNIQ(1)
All times are GMT -4. The time now is 10:42 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy