Deleting all occurences of a duplicate row

07-10-2008

Registered User

79, 0

Join Date: Apr 2008

Last Activity: 30 July 2009, 9:32 AM EDT

Location: Chennai,India

Posts: 79

Thanks Given: 0

Thanked 0 Times in 0 Posts

Deleting all occurences of a duplicate row

Hi,

I need to delete all occurences of the repeated lines from a file and retain only the lines that is not repeated elsewhere in the file. As seen below the first two lines are same except that for the string "From BaseLine" and "From SMS".I shouldn't consider the string "From SMS" and "From BaseLine" for checking the repeated lines. I want to retain only the third line.

From BaseLine - 0T001 000 999999999 00101 20080411000000T1023.27
From SMS - 0T001 000 999999999 00101 20080411000000T1023.27
From BaseLine - 0T001 000 999999999 00101 20080411000000T109.019

My output should be the third line alone.

These file size would range from 100 MB to 900MB. The performance factor should also be considered. Can you please help me out?

Regards,

Ragav.

ragavhere

View Public Profile for ragavhere

Find all posts by ragavhere

07-10-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Use nawk or /usr/xpg4/bin/awk on Solaris:

Code:

awk -F- 'END {
  for (p in r)
    if (u[p] == 1)
      print r[p]
      }
!u[$2] ++ { 
  r[$2] = $0
  }' input

radoulov

View Public Profile for radoulov

Find all posts by radoulov

07-10-2008

Registered User

79, 0

Join Date: Apr 2008

Last Activity: 30 July 2009, 9:32 AM EDT

Location: Chennai,India

Posts: 79

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks. Can you please explain?

Regards,

Ragav.

ragavhere

View Public Profile for ragavhere

Find all posts by ragavhere

07-10-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Which part of the code is not obvious?

radoulov

View Public Profile for radoulov

Find all posts by radoulov

07-10-2008

Registered User

79, 0

Join Date: Apr 2008

Last Activity: 30 July 2009, 9:32 AM EDT

Location: Chennai,India

Posts: 79

Thanks Given: 0

Thanked 0 Times in 0 Posts

Can you please explain the entire code???

Regards
Ragav

ragavhere

View Public Profile for ragavhere

Find all posts by ragavhere

07-10-2008

Registered User

2,669, 20

Join Date: Sep 2006

Last Activity: 28 January 2015, 8:30 AM EST

Posts: 2,669

Thanks Given: 0

Thanked 20 Times in 20 Posts

Code:

uniq -u -f 3 file

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

07-10-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

OK.

Code:

awk -F- ...

Use '-' as a field separator.

The following expression/action pair is execute first:

Code:

!u[$2] ++ { 
  r[$2] = $0
  }

When the string in the second field is seen for the first time the element/value of the associative array u will be 0 (false for AWK), because of the implicit variable initialization. In idiomatic AWK it could be written as:

Code:

!array[key] ++

Which actually means:

Code:

array[key] ++ == 0

So, when NOT array[key]++ returns true (0 -> false, !0 -> true), do the following: build another associative array r (r for record, because it holds the entire record), $2 as key, $0 as element/value. So we store one copy (the first one) of each unique $2 while we're counting the unique values of $2 in the expression part - u[$2] ++.

Code:

END {
  for (p in r)
    if (u[p] == 1)
      print r[p]
      }

After all the input has been read the END block is executed.
For every key (k) in the r array verify: if the element/value in the u array with the same key (k) equals 1 (has only one entry in the entire input), print the corresponding element/value of the r (record) array.

That's all.

radoulov

View Public Profile for radoulov

Find all posts by radoulov

Shell Programming and Scripting

Deleting all occurences of a duplicate row

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete duplicate row based on criteria

Discussion started by: shash

2. Shell Programming and Scripting

Delete duplicate row

Discussion started by: aav1307

3. Shell Programming and Scripting

In php, Moving a new row to another table and deleting old row

Discussion started by: jazzyzha

4. Shell Programming and Scripting

Moving new row and deleting old row to another table

Discussion started by: jazzyzha

5. Shell Programming and Scripting

deleting dupes in a row

Discussion started by: gimley

6. Shell Programming and Scripting

REMOVE DUPLICATE IN a ROW AFTER CHECKING THE FIRST SIMILAR NAME

Discussion started by: manigrover

7. Shell Programming and Scripting

Deleting Duplicate Records

Discussion started by: DFr0st

8. Shell Programming and Scripting

how to identify duplicate columns in a row

Discussion started by: suresh3566

9. Shell Programming and Scripting

Delete a row that has a duplicate column

Discussion started by: spartan22

10. Shell Programming and Scripting

sort and semi-duplicate row - keep latest only

Discussion started by: LisaS