Remove duplicate rows when >10 based on single column value


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Remove duplicate rows when >10 based on single column value
# 1  
Old 01-16-2012
Remove duplicate rows when >10 based on single column value

Hello, I'm trying to delete duplicates when there are more than 10 duplicates, based on the value of the first column.

e.g.

a 1
a 2
a 3
b 1
c 1

gives
b 1
c 1

but requires 11 duplicates before it deletes.

Thanks for the help

Moderator's Comments:
Mod Comment Video tutorial on how to use code tags in The UNIX and Linux Forums.

Last edited by informaticist; 01-17-2012 at 03:53 PM..
# 2  
Old 01-16-2012
Code:
awk 'END {
  for (i = 0; ++i <= NR;) { 
    split(rec[i], t)
    if (count[t[1]] <= l)
      print rec[i]
    }
  }
{ 
  count[$1]++
  rec[NR] = $0  
  }' l=10 infile

In some awk implementations the value of NR is not available in the END block,
if that's the case you could use something like this:

Code:
awk 'END {
  for (i = 0; ++i <= c;) { 
    split(rec[i], t)
    if (count[t[1]] <= l)
      print rec[i]
    }
  }
{ 
  count[$1]++
  rec[++c] = $0  
  }' l=10 infile


Last edited by radoulov; 01-17-2012 at 04:44 AM.. Reason: Corrected.
# 3  
Old 01-16-2012
How do I use that in a shell?
# 4  
Old 01-16-2012
You can use an editor like vi, copy & paste the code radoulov gave you, save it, change permissions to execute with the chmod command for that file and execute the script with something like ./myscript.sh and hit the enter key. The word infile needs to be the name of your input file.
# 5  
Old 01-16-2012
I tried running the script after saving with vi and got title COL1 as my output.

As in, the only output was the first entry (row 1 column 1) of the table.
# 6  
Old 01-17-2012
Please post a sample of the real input file.
The code was wrong anyway, I've corrected my post above.
# 7  
Old 01-17-2012
Neither of those worked, here is a sample of the input

Code:
Col1    Col2
2600.m01    194
2600.m01    332
2600.m01    595
2600.m01    664
2600.m01    673
2600.m01    685
2600.m01    6043
2600.m01    6158
2600.m01    6677
2600.m01    6897
2600.m01    6938
2600.m01    6969
2600.m01    7001
2600.m01    7014
2500.m01    7016
2500.m01    7064
2500.m01    7070
2500.m01    8166
2500.m01    9288
2500.m01    9291
2500.m01    9304
2500.m01    9316
2500.m01    9330
2500.m01    9365
2432.m0392    9369
2134.m01234    10525
2827.m033    67
2472.m001234    2643

and the correct output would be
Code:
Col1    Col2
2500.m01    7016
2500.m01    7064
2500.m01    7070
2500.m01    8166
2500.m01    9288
2500.m01    9291
2500.m01    9304
2500.m01    9316
2500.m01    9330
2500.m01    9365
2432.m0392    9369
2134.m01234    10525
2827.m033    67
2472.m001234    2643

Moderator's Comments:
Mod Comment Use code tags, please!

Last edited by radoulov; 01-17-2012 at 04:23 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate rows based on one column

Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed. example of my file... (3 Replies)
Discussion started by: clarissab
3 Replies

2. Shell Programming and Scripting

Converting Single Column into Multiple rows, but with strings to specific tab column

Dear fellows, I need your help. I'm trying to write a script to convert a single column into multiple rows. But it need to recognize the beginning of the string and set it to its specific Column number. Each Line (loop) begins with digit (RANGE). At this moment it's kind of working, but it... (6 Replies)
Discussion started by: AK47
6 Replies

3. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

4. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

6. Shell Programming and Scripting

remove duplicates based on single column

Hello, I am new to shell scripting. I have a huge file with multiple columns for example: I have 5 columns below. HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL HWUSI-EAS000_29:1:108 + ... (4 Replies)
Discussion started by: Diya123
4 Replies

7. Shell Programming and Scripting

duplicate row based on single column

I am a newbie to shell scripting .. I have a .csv file. It has 1000 some rows and about 7 columns... but before I insert this data to a table I have to parse it and clean it ..basing on the value of the first column..which a string of phone number type... example below.. column 1 ... (2 Replies)
Discussion started by: mitr
2 Replies

8. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ... (10 Replies)
Discussion started by: patrick87
10 Replies

9. Shell Programming and Scripting

how to delete duplicate rows based on last column

hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is... (16 Replies)
Discussion started by: reva
16 Replies

10. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies
Login or Register to Ask a Question