Remove duplicates from a file

05-09-2013

Registered User

17, 0

Join Date: Mar 2012

Last Activity: 11 October 2013, 4:38 AM EDT

Location: Bangalore

Posts: 17

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks Vidyadhar,this is working fine. I m getting the specified data but the order is getting changed. The data that i m getting here is -

Code:

104       MNO       17/04/1984      12000
107       XYZ       24/09/1978      6000
101       ABC       11/01/1991      5000
102       DEF       15/04/1998      8000

But what i am expecting is -

Code:

101       ABC       11/01/1991      5000
104       MNO       17/04/1984      12000
102       DEF       15/04/1998      8000
107       XYZ       24/09/1978      6000

Is there any way in which we can get the result without changing their order?

Last edited by radoulov; 05-09-2013 at 06:14 AM..

saga20

View Public Profile for saga20

Find all posts by saga20

05-09-2013

Registered User

1,650, 478

Join Date: Mar 2012

Last Activity: 11 September 2019, 8:06 AM EDT

Posts: 1,650

Thanks Given: 58

Thanked 478 Times in 474 Posts

Code:

awk '!X[$0]++' file

pamu

View Public Profile for pamu

Find all posts by pamu

05-09-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by saga20

But sort -u will show the original records.As per my sample data as posted above,it ll get only 2 records that r unique. But i want just duplicated to be removed. So that it ll get 4 records.

No. The command:

Code:

sort -ur -o Emp.txt Emp.txt

When Emp.txt contains:

Code:

Empid Empname Joining_Date Salary
101 ABC 11/01/1991 5000
104 MNO 17/04/1984 12000
102 DEF 15/04/1998 8000
101 ABC 11/01/1991 5000
107 XYZ 24/09/1978 6000
104 MNO 17/04/1984 12000
101 ABC 11/01/1991 5000

replaces the contents of Emp.txt with:

Code:

Empid Empname Joining_Date Salary
107 XYZ 24/09/1978 6000
104 MNO 17/04/1984 12000
102 DEF 15/04/1998 8000
101 ABC 11/01/1991 5000

I used reverse sort order to keep the header at the start of file. If it is important to keep the output in the same (unsorted) order as the input, you could try:

Code:

awk '! ($0 in o) {o[$0];print}' Emp.txt > Emp$$.txt && mv Emp$$.txt Emp.txt

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-09-2013

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

As a side note, the latest GNU awk (Awk 4.1.0 beta) supports in-place editing.
The code would be:

Code:

awk -i inplace '!seen[$0]++' Emp.txt

radoulov

View Public Profile for radoulov

Find all posts by radoulov

05-09-2013

Registered User

17, 0

Join Date: Mar 2012

Last Activity: 11 October 2013, 4:38 AM EDT

Location: Bangalore

Posts: 17

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by pamu

Code:

awk '!X[$0]++' file

Thanks Pamu,its working fine as per my requirement...

---------- Post updated at 02:58 PM ---------- Previous update was at 02:54 PM ----------

Quote:

Originally Posted by Don Cragun

No. The command:

Code:

sort -ur -o Emp.txt Emp.txt

When Emp.txt contains:

Code:

Empid Empname Joining_Date Salary
101 ABC 11/01/1991 5000
104 MNO 17/04/1984 12000
102 DEF 15/04/1998 8000
101 ABC 11/01/1991 5000
107 XYZ 24/09/1978 6000
104 MNO 17/04/1984 12000
101 ABC 11/01/1991 5000

replaces the contents of Emp.txt with:

Code:

Empid Empname Joining_Date Salary
107 XYZ 24/09/1978 6000
104 MNO 17/04/1984 12000
102 DEF 15/04/1998 8000
101 ABC 11/01/1991 5000

I used reverse sort order to keep the header at the start of file. If it is important to keep the output in the same (unsorted) order as the input, you could try:

Code:

awk '! ($0 in o) {o[$0];print}' Emp.txt > Emp$$.txt && mv Emp$$.txt Emp.txt

Thanks Don Cragun for ur response,but after executing this,my file is becoming empty. Anyways i got the solution.

---------- Post updated at 02:59 PM ---------- Previous update was at 02:58 PM ----------

Quote:

Originally Posted by radoulov

As a side note, the latest GNU awk (Awk 4.1.0 beta) supports in-place editing.
The code would be:

Code:

awk -i inplace '!seen[$0]++' Emp.txt

Thanks Radoulov for ur response,but after executing this,my file is becoming empty. Anyways i got the solution.

saga20

View Public Profile for saga20

Find all posts by saga20

UNIX for Dummies Questions & Answers

Remove duplicates from a file

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Remove duplicates in flat file

Discussion started by: samjoshuab

2. Shell Programming and Scripting

To remove duplicates from pipe delimited file

Discussion started by: ginrkf

3. UNIX for Dummies Questions & Answers

Remove duplicates and keep them in a separate file

Discussion started by: flacchy

4. Shell Programming and Scripting

How to remove duplicates from the .dat file

Discussion started by: Oracle_User

5. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Discussion started by: onesuri

6. Shell Programming and Scripting

Remove duplicates from a file

Discussion started by: gpaulose

7. Shell Programming and Scripting

Remove duplicates from end of file

Discussion started by: lavnayas

8. Shell Programming and Scripting

Shell script to remove duplicates lines in a file

Discussion started by: RichElks

9. Shell Programming and Scripting

remove duplicates within a block in a file..help required

Discussion started by: nipun_garg

10. Shell Programming and Scripting

Remove duplicates from File from specific location

Discussion started by: gopikgunda