Removing Duplicates from file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing Duplicates from file
# 1  
Old 09-02-2011
Tools Removing Duplicates from file

Hi Experts,

Please check the following new requirement. I got data like the following in a file.

HTML Code:
FILE_HEADER
01cbbfde7898410| 3477945| home| 1
01cbc275d2c122| 3478234| WORK| 1
01cbbe4362743da| 3496386| Rich Spare| 1
01cbc275d2c122| 3478234| WORK| 1
This is pipe separated file with column 2,3 as key columns. The file should be formatted to the following output files

1) All records other than the duplicates

HTML Code:
FILE_HEADER
01cbbfde7898410| 3477945| home| 1
01cbbe4362743da| 3496386| Rich Spare| 1
2) The dupicate key file

HTML Code:
3478234| WORK
Any thoughts on this.SmilieSmilieSmilieSmilieSmilieSmilie

Note:- The 'FILE_HEADER' should be there in the first file.
# 2  
Old 09-02-2011
Hi Tinu,

Please try out the below command

Code:
sort -t "|" +1 -3 test_file |uniq -u

or
Code:
sed '1d' test_file|sort -t "|" +1 -3|uniq -u (removing the header line)

Smilie
~jimmy

Last edited by Franklin52; 09-02-2011 at 03:42 AM.. Reason: Please use code tags for code and data samples, thank you
# 3  
Old 09-06-2011
see a better solution for unique and duplicate records from a file:

Code:
sed '1d' $FILE1 | sort  -t "|" +1 -3 > temp1
cat temp1 |  awk -F"|" ' BEGIN{a=0}{a++; b[a]=$2$3; c[a]=$0} \
END { for(i=0; i<=a; ++i) if(b[i+1]==b[i]) print  c[i]"\n"c[i+1]}'|uniq >temp2
cat temp1 temp2 > temp3
sort temp3 | uniq -u > temp4
echo $HEADER > $FILE1
cat temp4 >> $FILE1


Last edited by radoulov; 09-06-2011 at 10:12 AM.. Reason: Code tags!
# 4  
Old 09-06-2011
Another one with awk:

Code:
awk -F\| 'END {
  for (i = 1; ++i <= NR;) {
    split(d[i], t)
    if (c[t[2], t[3]] > 1) {
      if (!s[t[2], t[3]]++)
        print t[2], t[3] > dups
        }
    else
      print d[i] > uniq    
    }
  }
NR == 1 {
  print > dups
  print > uniq
  next
  }
{
  c[$2, $3]++; d[NR] = $0
  }' OFS=\| dups=dups.txt uniq=uniq.txt infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies

2. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

3. UNIX for Dummies Questions & Answers

Grep from pattern file without removing duplicates?

I have been using grep to output whole lines using a pattern file with identifiers (fileA): fig|562.2322.peg.1 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.7 From fileB with corresponding identifiers in the second column: NODE_0 fig|562.2322.peg.1 peg ... (2 Replies)
Discussion started by: Mauve
2 Replies

4. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

5. Shell Programming and Scripting

Removing duplicates depending on file size

Hi all, I am working with a huge amount of files in a Linux environment and I was trying to filter my data. Here's what my data looks like Name............................Size OLUSDN.gf.gif-1.JPEG.......5 kb LKJFDA01.gf.gif-1.JPEG.....3 kb LKJFDA01.gf.gif-2.JPEG.....1 kb... (7 Replies)
Discussion started by: Error404
7 Replies

6. Shell Programming and Scripting

formatting a file and removing duplicates

Hi, I have a file that I want to change the format of. It is a large file in rows but I want it to be comma separated (comma then a space). The current file looks like this: HI, Joe, Bob, Jack, Jack After I would want to remove any duplicates so it would look like this: HI, Joe,... (2 Replies)
Discussion started by: kylle345
2 Replies

7. Shell Programming and Scripting

Removing duplicates from log file?

I have a log file with posts looking like this: -- Messages can be delivered by different systems at different times. The id number is used to sort out duplicate messages. What I need is to strip the arrival time from each post, sort posts by id number, and reattach arrival time to respective... (2 Replies)
Discussion started by: Ilja
2 Replies

8. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

hey all, I need some help. I have a text file with names in it. My target is that if a particular pattern exists in that file more than once..then i want to rename all the occurences of that pattern by alternate patterns.. for e.g if i have PATTERN occuring 5 times then i want to... (3 Replies)
Discussion started by: ashisharora
3 Replies

9. Shell Programming and Scripting

Removing duplicates in a sorted file by field.

I have data like this: It's sorted by the 2nd field (TID). envoy,90000000000000634600010001,04/11/2008,23:19:27,RB00266,0015,DETAIL,ERROR, envoy,90000000000000634600010001,04/12/2008,04:23:45,RB00266,0015,DETAIL,ERROR,... (1 Reply)
Discussion started by: kinksville
1 Replies

10. UNIX for Dummies Questions & Answers

removing duplicates from a file

i have a file with some 1000 entries it will contain entries like 1000,ram 2000,pankaj 1001,rahim 1000,ram 2532,govind 2000,pankaj 3000,venkat 2532,govind what i want is i want to extract only the distinct rows from this file so my output should contain only 1000,ram... (2 Replies)
Discussion started by: trichyselva
2 Replies
Login or Register to Ask a Question