Deleting only 2nd and third duplicates in field 2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Deleting only 2nd and third duplicates in field 2
# 1  
Old 12-17-2013
Deleting only 2nd and third duplicates in field 2

# 2  
Old 12-17-2013
Sorry I didn't understand your requirement. What is expected output ? can you show sample output for input provided ?
# 3  
Old 12-17-2013
this is the output expected

actually the last command did not quite work. What I want is:
Code:
rearranging           /lst/enc_hal_0363 rearranging            /lst/enc_hal_0365 rearranging            /lst/enc_hal_0393 rearranging             /lst/enc_hal_0394 rearranging          /lst/enc_hal_0399 rearranging            /lst/enc_hal_0494 rearranging          /lst/enc_hal_0501 rearranging          /lst/volume_root deleting             /lst/enc_hal_0416

The problem is that there are 3 occurrences of lst/enc_hal_0416 and
all the commands I've used will not allow me to keep the one that has
deleting in the first field. And I have more than one list so I can't just
single out the enc_hal_0416. Next time there may be 3 occurrences of enc_hal_0888

---------- Post updated at 12:17 PM ---------- Previous update was at 12:14 PM ----------

sorry this was a mess. I mean this

Code:
rearranging            /lst/enc_hal_0363
rearranging             /lst/enc_hal_0365
rearranging             /lst/enc_hal_0393
rearranging             /lst/enc_hal_0394
rearranging             /lst/enc_hal_0399
rearranging             /lst/enc_hal_0494
rearranging             /lst/enc_hal_0501
rearranging            /lst/volume_root
deleting                /lst/enc_hal_0416

# 4  
Old 12-17-2013
Quote:
Originally Posted by newbie2010
actually the last command did not quite work. What I want is:
Code:
rearranging           /lst/enc_hal_0363 rearranging             /lst/enc_hal_0365 rearranging            /lst/enc_hal_0393 rearranging              /lst/enc_hal_0394 rearranging          /lst/enc_hal_0399  rearranging            /lst/enc_hal_0494 rearranging           /lst/enc_hal_0501 rearranging          /lst/volume_root deleting              /lst/enc_hal_0416

The problem is that there are 3 occurrences of lst/enc_hal_0416 and
all the commands I've used will not allow me to keep the one that has
deleting in the first field. And I have more than one list so I can't just
single out the enc_hal_0416. Next time there may be 3 occurrences of enc_hal_0888

---------- Post updated at 12:17 PM ---------- Previous update was at 12:14 PM ----------

sorry this was a mess. I mean this

Code:
rearranging            /lst/enc_hal_0363
rearranging             /lst/enc_hal_0365
rearranging             /lst/enc_hal_0393
rearranging             /lst/enc_hal_0394
rearranging             /lst/enc_hal_0399
rearranging             /lst/enc_hal_0494
rearranging             /lst/enc_hal_0501
rearranging            /lst/volume_root
deleting                /lst/enc_hal_0416


why deleting /lst/enc_hal_0501 is not in output ?

Last edited by Akshay Hegde; 12-17-2013 at 02:21 PM.. Reason: Confused too much :)
# 5  
Old 12-19-2013
File order rearrangement /duplicate deletion

I have a file that has the following entries:

Code:
+==> FILER LISTING <==
deleting   /vol/icm_wks_0363
deleting   /vol/icm_wks_0365
deleting   /vol/icm_wks_0393
deleting   /vol/icm_wks_0394
deleting   /vol/icm_wks_0399
deleting   /vol/icm_wks_0416
deleting   /vol/icm_wks_0494
deleting   /vol/icm_wks_0501
deleting   /vol/truck_root
rearranging  /vol/icm_wks_0363
rearranging  /vol/icm_wks_0365
rearranging  /vol/icm_wks_0393
rearranging  /vol/icm_wks_0394
rearranging  /vol/icm_wks_0399
rearranging  /vol/icm_wks_0416
rearranging  /vol/icm_wks_0494
rearranging  /vol/icm_wks_0501
rearranging  /vol/truck_root



Here is what the list should look like:

Code:
rearranging  /vol/truck_root
rearranging  /vol/icm_wks_0501
rearranging  /vol/icm_wks_0494
rearranging  /vol/icm_wks_0399
rearranging  /vol/icm_wks_0394
rearranging  /vol/icm_wks_0393
rearranging  /vol/icm_wks_0365
rearranging  /vol/icm_wks_0363
deleting       /vol/icm_wks_0416

What I need to do is when there are two volumes of the same name in the second column only the "rearranging" one should be printed. But if there are three volumes of the same name in the second column, as with /vol/icm_wks_0416, then the "deleting /vol/icm_wks_0416" should be printed instead of "rearranging". The problem is that I have more than one list so the volume names won't always be the same.

I have tried variants of sort:

Code:
cat test-list |sort -ft/ -uk2

Code:
cat test-list |sort -r -ft/ -uk

The second command enables me to print out this:

Code:
cat test-list |sort -r -ft/ -uk2
rearranging  /vol/truck_root
rearranging  /vol/icm_wks_0501
rearranging  /vol/icm_wks_0494
rearranging  /vol/icm_wks_0416
rearranging  /vol/icm_wks_0399
rearranging  /vol/icm_wks_0394
rearranging  /vol/icm_wks_0393
rearranging  /vol/icm_wks_0365
rearranging  /vol/icm_wks_0363

That is almost right, except that the 0416 is not marked as deleting but as rearranging.


I have tried
Code:
 cat test-list  |gawk '!k[$2]++'

but this then only prints the 2nd column.

Also
Code:
gawk 'BEGIN { FS = " " } {count[$2]++; if (count[$2] == 1) first[$2] = $0;if (count[$2] ==2)print first[$2];if(count[$2] > 1)print}'

which does not work. Can any of you shed light on it?
# 6  
Old 12-19-2013
First of all I don't see 3 entries of /vol/icm_wks_0416 in your input posted.

Anyway based on your requirement, you could try:
Code:
awk '
        NR > 1 {
                C[$2]++
                if ( !(A[$2]) )
                        A[$2] = $1
                else
                {
                        if ( C[$2] > 2 && $1 == "deleting" )
                                A[$2] = $1
                        if ( C[$2] <= 2  && $1 == "rearranging" )
                                A[$2] = $1
                }
        }
        END {
                for ( k in A )
                        print A[k], k
        }
' file

This User Gave Thanks to Yoda For This Post:
# 7  
Old 12-19-2013
Bumping up or double posting is not permitted in forum. Please read forum rules

Deleting only 2nd and third duplicates in field 2 | Unix Linux Forums | Shell Programming and Scripting
This User Gave Thanks to Akshay Hegde For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Combine Similar Output from the 2nd field w.r.t 1st Field

Hi, For example: I have: HostA,XYZ HostB,XYZ HostC,ABC I would like the output to be: HostA,HostB: XYZ HostC:ABC How can I achieve this? So far what I though of is: (1 Reply)
Discussion started by: alvinoo
1 Replies

2. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ... (6 Replies)
Discussion started by: newbie2010
6 Replies

3. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

4. UNIX for Dummies Questions & Answers

remove duplicates based on a field and criteria

Hi, I have a file with fields like below: A;XYZ;102345;222 B;XYZ;123243;333 C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 desired output: C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 ie, if the 4rth field is a duplicate.. i need only those records where... (5 Replies)
Discussion started by: wanderingmind16
5 Replies

5. Shell Programming and Scripting

Remove the partial duplicates by checking the length of a field

Hi Folks - I'm quite new to awk and didn't come across such issues before. The problem statement is that, I've a file with duplicate records in 3rd and 4th fields. The sample is as below: aaaaaa|a12|45|56 abbbbaaa|a12|45|56 bbaabb|b1|51|45 bbbbbabbb|b2|51|45 aaabbbaaaa|a11|45|56 ... (3 Replies)
Discussion started by: asyed
3 Replies

6. Shell Programming and Scripting

Deleting Duplicates leaving the first entry

Hi, I need to delete duplicate records in a file that is around 30MB. Below is what I need. Below are the entries of input file and the output file that I need. Each section of input file is separated by an empty line. Some of these sections have duplicate uid values. I want to retain only one... (4 Replies)
Discussion started by: Samingla
4 Replies

7. Shell Programming and Scripting

Extracting duplicates from a desired field

Hello, I have a file of group names and GID's (/etc/group) and I want to find the duplicate group names and put them in a file. So there are 2 fields, i.e.: audit 10 avahi 70 avahi-autoipd 103 bellrpi 605 bin 1 bin 2 bord 512 busobj 161 bwadm 230 cali81 202 card 323 cardiff 901 cbm... (2 Replies)
Discussion started by: mgb
2 Replies

8. Shell Programming and Scripting

Sort alpha on 1st field, numerical on 2nd field (sci notation)

I want to sort alphabetically on the first field and sort in descending numerical order on the 2nd field. With a normal "sort -r -n" it does this: abc ||| 5e-05 ||| bla abc ||| 3 ||| ble def ||| 1 ||| abc def ||| 0.2 ||| def As you can see it ignores the fact that 5e-05 is actually 0.00005... (1 Reply)
Discussion started by: FrancoisCN
1 Replies

9. Shell Programming and Scripting

Awk to find duplicates in 2nd field

I want to find duplicates in file on 2nd field i wrote this code: nawk '{a++} END{for i in a {if (a>1) print}}' temp Could not find whats wrong with this. Appreciate help (5 Replies)
Discussion started by: pinnacle
5 Replies

10. Shell Programming and Scripting

Removing duplicates in a sorted file by field.

I have data like this: It's sorted by the 2nd field (TID). envoy,90000000000000634600010001,04/11/2008,23:19:27,RB00266,0015,DETAIL,ERROR, envoy,90000000000000634600010001,04/12/2008,04:23:45,RB00266,0015,DETAIL,ERROR,... (1 Reply)
Discussion started by: kinksville
1 Replies
Login or Register to Ask a Question