squeeze duplicates from a table


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting squeeze duplicates from a table
# 1  
Old 05-23-2010
squeeze duplicates from a table

I have files with an x amounts of rows with each row having 2 columns seperated by delimiter "|" .
File contains following records for example.

Code:
15|69
15|70
15|71
15|72
15|73
15|74
16|2
16|3
16|4
16|5
16|6
16|7
16|8
16|9
16|10
16|11
16|12
16|13
16|14
16|15
16|16
16|17
16|18
16|19
16|20
16|21
17|2
17|3
19|2
19|3

I want to be able to format this table so that it only shows the rows whith the largest corresponding column numbers.

for example, using the example above, I want the command to return:

Code:
15|74
16|21
17|3
19|3



Is there anyway to return the largest column number ($2) with it's corresponding row ($1) using awk?

Appreciate help.

Last edited by vgersh99; 05-23-2010 at 02:49 PM.. Reason: code tags, please!
# 2  
Old 05-23-2010
Code:
nawk '
  BEGIN { FS=OFS="|" }
  { if ($2 > a[$1]) a[$1]=$2 }
  END { for(i in a) print i, a[i]}' myFile

# 3  
Old 05-23-2010
Code:
sort -t\| -nrsk2 file1 | awk -F\| '!A[$1]++'

# 4  
Old 05-24-2010
Or bash script a little big Smilie

Code:
# cat justdoit
 
#!/bin/bash
 
oldIFS=$IFS
IFS="|"
i=0 ; ix=0 ; in=0
exec <$1
 
while read val1 val2
  do
    array[i]=$val1
    let i=i+1
    tmpval[i]=$val2
     if [ $i -ne 1 ] ; then
        if [ ${array[ix]} -ne ${array[ix+1]} ] ; then
           myval2[in]=${tmpval[i-1]}
             ((++in))
        fi
             ((++ix))
     fi
  done
myval2[in]=${tmpval[i]}
 
IFS=$oldIFS
count=${#array[@]}
in=0  ; inx=0
 
myval[0]=${array[0]}
 
while [ $(( count -=1 )) -gt -1 ]
  do
   same=1
    for val in ${array[@]}
     do
       if [ $val -eq ${array[in]} ] ; then
           ((++same))
       fi
     done
 
  var=ok
 
   if [ $same -gt 2 ] ; then
    for newval in ${myval[@]}
      do
        if [ ${array[in]} -ne $newval ] ; then
          var=notok
        else
          var=ok
        fi
      done
 fi
 
 if [ "$var" == "notok" ] ; then
       myval[inx]=${array[in]}
 fi
 
((++in))
((++inx))
  done
 
inx=0
for val1 in ${myval[@]}
   do
     echo "$val1|${myval2[inx]}"
        ((++inx))
   done

Code:
# cat myfile
15|69
15|70
15|71
15|72
15|73
15|74
16|2
16|3
16|4
16|5
16|6
16|7
16|8
16|9
16|10
16|11
16|12
16|13
16|14
16|15
16|16
16|17
16|18
16|19
16|20
16|21
17|2
17|3
19|2
19|3

Code:
# ./justdoit myfile
15|74
16|21
17|3
19|3

# 5  
Old 05-25-2010
Wow, thanks all for your answers. It really was helpfull and it allowed me to complete a program which purpose was to count the number of words in a given text file, and classify them according to the number of caracters they had. Thus, the file created with the repeating numbers - which was an effect of a counter that I placed in the part of the program which calculated the number of words per number of caracters. The only part that was missing was to find a way to take the highest value of the lines which resulted from my code, and print them instead of all those which lead to them. My knowledge in awk or bash programming is too limited for me moment to have found the way by myself, so thanks again for helping me out.


Much appreciated! Smilie

cheers to all who helped! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Remove duplicates in a dataframe (table) keeping all the different cells of just one of the columns

Hello all, I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this: A ... (5 Replies)
Discussion started by: pedro88
5 Replies

2. Web Development

Getting Rid of Annoying Bootstrap Table Borders and Wayward Table Lines

Bootstrap is great; but we have had some issues with Bootstrapped <tables> (and legacy <fieldset> elements) showing annoying, wayward lines. I solved that problem today with this simple jQuery in the footer: <script> $(function(){ $('tr, td, fieldset,... (0 Replies)
Discussion started by: Neo
0 Replies

3. Shell Programming and Scripting

Filtering duplicates based on lookup table and rules

please help solving the following. I have access to redhat linux cluster having 32gigs of ram. I have duplicate ids for variable names, in the file 1,2 are duplicates;3,4 and 5 are duplicates;6 and 7 are duplicates. My objective is to use only the first occurrence of these duplicates. Lookup... (4 Replies)
Discussion started by: ritakadm
4 Replies

4. Debian

VPN service fails after update applied in Debian Squeeze

Hello everybody, I used to log in to my office via PPTP VPN, but on last October 5th I updated my installed Debian Squeeze and it caused my VPN service (client-side) to fail. After this upgrade I'm unable to log in to the VPN server. Here follows the log: #tail -f /var/log/messages Plugin... (0 Replies)
Discussion started by: r4ym4r
0 Replies

5. Shell Programming and Scripting

Help me please: UNIX command to extract substring not squeeze spaces

Hi experts, Please help me!... I have a string " test1 test2 test3 ". There are two spaces before "test1"; There are four spaces between "test1" and "test2"; there are two spaces between "test2 and "test3". I want to extract a substring "2 test3" using positions. Below is my test... (5 Replies)
Discussion started by: sophiez16
5 Replies

6. UNIX for Dummies Questions & Answers

Creating a condensed table from a pre-existing table in putty

Hello, I'm working with putty on Windows 7 professional and I'd like to know if there's a way to gather specific lines from a pre-existing table and make a new table with that information. More specifically, I'd like the program to look at a specific column, say column N, and see if any of the... (5 Replies)
Discussion started by: Deedee393
5 Replies

7. UNIX Desktop Questions & Answers

How to squeeze multiple pipe character '|' into single '|' using sed?

Hi, I am trying to convert multiple Unix pipe symbol or bar into single |. I have tried with the following sed statements, but, no success :(. I need it using sed only echo "sed 's/\|\+/\|/g' sed 's/*/\|/' sed 's/\|*/|/' sed -r 's/\|+/\|/' However, the below awk code is working fine.... (4 Replies)
Discussion started by: royalibrahim
4 Replies

8. UNIX for Advanced & Expert Users

Load average in squeeze too low

sorry, not yelling, its just copied from a bug report and this prob is driving me crazy... ran stress --cpu 4 on a xeon and no problem the load went up to 4, but running mysql server for example the load is almost 0.00, where the same mysql server with same throughput gets 0.8 on another lenny... (15 Replies)
Discussion started by: suffeks
15 Replies

9. Shell Programming and Scripting

Merge Two Tables with duplicates in first table

Hi.. File 1: 1 aa rep 1 dd rep 1 kk rep 2 bb sad 2 ss sad 3 ee dam File 2 1 apple fruit 2 mango tree 3 lilly flower output: 1 aaple fruit aa,dd,kk rep (7 Replies)
Discussion started by: empyrean
7 Replies

10. UNIX for Dummies Questions & Answers

TR squeeze oddity

I discovered that where 'tr -s' works as expected on grepped input, it appears to completely fail on dig results. I am not sure if this is because of some sort of non posix compliancy, or what. Here is what I did: The command below works as expected, squeezing all repeated spaces to a single... (3 Replies)
Discussion started by: bdmeyersc
3 Replies
Login or Register to Ask a Question