08-27-2009
How to remove duplicated based on longest row & largest value in a column
Hii i have a file with data as shown below. Here i need to remove duplicates of the rows in such a way that
it just checks for 2,3,4,5 column for duplicates.When deleting duplicates,retain largest row i.e with many columns with values should be selected.Then it must remove duplicates such that by checking for the largest value in a specific column say 19 from the given below data.
HTML Code:
SSR 1901 12 1 0 0 0.00 40.0000 71.2000 14 12 3.00 0 4.60 4.00 0.00 0 0.00 8.60 0
SSR 1901 12 1 0 10 3.00 40.0000 71.0000 30 0 0.00 0 5.80 0.00 5.90 0 5.70 5.90 0
SSR 1902 8 22 3 7 4.40 40.0000 68.5000 35 0 0.00 0 6.00 0.00 6.20 0 5.90 6.20 0 aaaa
BDA 1902 8 22 3 0 0.00 40.0000 77.0000 60 0 8.70 0 0.00 0.00 8.00 0 8.60 8.60 0 cccc
CFR 1903 8 22 3 0 0.00 40.0000 77.0000 25 0 0.00 0 0.00 0.00 0.00 0 8.60 8.60 0 bbbb
RAO 1906 8 16 17 0 0.00 24.4000 72.7000 10 0 0.00 0 4.30 0.00 0.00 0 0.00 4.30 0
RAO 1906 8 16 17 6 0.00 24.4000 72.7000 10 0 0.00 0 4.30 6.00 0.00 0 0.00 4.30 0
LEE 1912 8 22 3 0 0.00 40.0000 76.5000 0 0 0.00 0 0.00 0.00 0.00 0 8.20 8.20 0 ffff
LEE 1912 8 22 3 0 0.00 40.0000 76.5000 0 0 0.00 0 0.00 0.00 0.00 0 8.20 8.20 0 ffff
The output should be like
HTML Code:
SSR 1901 12 1 0 0 0.00 40.0000 71.2000 14 12 3.00 0 4.60 4.00 0.00 0 0.00 8.60 0
BDA 1902 8 22 3 0 0.00 40.0000 77.0000 60 0 8.70 0 0.00 0.00 8.00 0 8.60 8.60 0 cccc
CFR 1903 8 22 3 0 0.00 40.0000 77.0000 25 0 0.00 0 0.00 0.00 0.00 0 8.60 8.60 0 bbbb
RAO 1906 8 16 17 6 0.00 24.4000 72.7000 10 0 0.00 0 4.30 6.00 0.00 0 0.00 4.30 0
LEE 1912 8 22 3 0 0.00 40.0000 76.5000 0 0 0.00 0 0.00 0.00 0.00 0 8.20 8.20 0 ffff
Here we are removing duplicates rows based on 2 criteria i.e
1)we check for 2,3,4,5 columns if they are same if so then remove one of the duplicate row.
2)Retain the row which has its largest value in column 19 & which has large set of columns with values in that row.
Help me out if any one has an idea also..i am trying this out from past one week...
Thanks in advance..
![Confused Smilie](https://www.unix.com/images/smilies/confused.gif)
![Frown Smilie](https://www.unix.com/images/smilies/frown.gif)
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
It is my first post, hoping to get help from the forum.
In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column.
OM 3328 O BT 268 5.800 7.500 4.700 0.000 ... (9 Replies)
Discussion started by: asanjuan
9 Replies
2. Shell Programming and Scripting
Hi,
I want to print column value based on row number say multiple of 8.
Input file:
line 1 67 34
line 2 45 57
. . .
. . .
line 8 12 46
. . .
. . .
line 16 24 90
. . .
. . .
line 24 49 67
Output
46
90
67 (2 Replies)
Discussion started by: Surabhi_so_mh
2 Replies
3. Shell Programming and Scripting
I am a newbie to shell scripting ..
I have a .csv file. It has 1000 some rows and about 7 columns...
but before I insert this data to a table I have to parse it and clean it ..basing on the value of the first column..which a string of phone number type...
example below..
column 1 ... (2 Replies)
Discussion started by: mitr
2 Replies
4. Shell Programming and Scripting
Given a file such as this I need to remove the duplicates.
00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt
00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt
0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt
0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies
5. Shell Programming and Scripting
cat file1.txt
field1 "user1":
field2:"data-cde"
field3:"data-pqr"
field4:"data-mno"
field1 "user1":
field2:"data-dcb"
field3:"data-mxz"
field4:"data-zul"
field1 "user2":
field2:"data-cqz"
field3:"data-xoq"
field4:"data-pos"
Now i need to have the date like below.
i have just... (7 Replies)
Discussion started by: ckaramsetty
7 Replies
6. Shell Programming and Scripting
Hi,
I have a file which consists of two columns but the first one can be varying in length like
123456789 0abcd
123456789 0abcd
4015 0 0abcd
5000 0abcd
I want to go through the file reading each line, count the number of characters in the first column and delete... (2 Replies)
Discussion started by: swasid
2 Replies
7. Shell Programming and Scripting
Hi all
I have following kind of input file
ESR1 PA156 leflunomide PA450192 leflunomide
CHST3 PA26503 docetaxel Pa4586; thalidomide Pa34958; decetaxel docetaxel docetaxel
I want to remove duplicates and I want to separate anything before and after PAxxxx entry into columns or... (1 Reply)
Discussion started by: manigrover
1 Replies
8. Shell Programming and Scripting
Dear All,
I have input like this,
J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1501 1 1 4 6101 7392 2 2442 2685 18 3201 4008 20 120 4158
J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1502 1 1 4 5125 6416 2 ... (4 Replies)
Discussion started by: attila
4 Replies
9. Shell Programming and Scripting
I am trying to see if I can use awk to remove duplicates from a file. This is the file:
-==> Listvol <==
deleting /vol/eng_rmd_0941
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_0943
deleting /vol/eng_rmd_1006
deleting /vol/eng_rmd_1012
rearrange /vol/eng_rmd_0943
... (6 Replies)
Discussion started by: newbie2010
6 Replies
10. Shell Programming and Scripting
Dear all,
How can I remove duplicated column in a text file?
Input:
LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG
LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG
LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG
LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies
LEARN ABOUT REDHAT
sgbequ
SGBEQU(l) ) SGBEQU(l)
NAME
SGBEQU - compute row and column scalings intended to equilibrate an M-by-N band matrix A and reduce its condition number
SYNOPSIS
SUBROUTINE SGBEQU( M, N, KL, KU, AB, LDAB, R, C, ROWCND, COLCND, AMAX, INFO )
INTEGER INFO, KL, KU, LDAB, M, N
REAL AMAX, COLCND, ROWCND
REAL AB( LDAB, * ), C( * ), R( * )
PURPOSE
SGBEQU computes row and column scalings intended to equilibrate an M-by-N band matrix A and reduce its condition number. R returns the row
scale factors and C the column scale factors, chosen to try to make the largest element in each row and column of the matrix B with ele-
ments B(i,j)=R(i)*A(i,j)*C(j) have absolute value 1.
R(i) and C(j) are restricted to be between SMLNUM = smallest safe number and BIGNUM = largest safe number. Use of these scaling factors is
not guaranteed to reduce the condition number of A but works well in practice.
ARGUMENTS
M (input) INTEGER
The number of rows of the matrix A. M >= 0.
N (input) INTEGER
The number of columns of the matrix A. N >= 0.
KL (input) INTEGER
The number of subdiagonals within the band of A. KL >= 0.
KU (input) INTEGER
The number of superdiagonals within the band of A. KU >= 0.
AB (input) REAL array, dimension (LDAB,N)
The band matrix A, stored in rows 1 to KL+KU+1. The j-th column of A is stored in the j-th column of the array AB as follows:
AB(ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl).
LDAB (input) INTEGER
The leading dimension of the array AB. LDAB >= KL+KU+1.
R (output) REAL array, dimension (M)
If INFO = 0, or INFO > M, R contains the row scale factors for A.
C (output) REAL array, dimension (N)
If INFO = 0, C contains the column scale factors for A.
ROWCND (output) REAL
If INFO = 0 or INFO > M, ROWCND contains the ratio of the smallest R(i) to the largest R(i). If ROWCND >= 0.1 and AMAX is neither
too large nor too small, it is not worth scaling by R.
COLCND (output) REAL
If INFO = 0, COLCND contains the ratio of the smallest C(i) to the largest C(i). If COLCND >= 0.1, it is not worth scaling by C.
AMAX (output) REAL
Absolute value of largest matrix element. If AMAX is very close to overflow or very close to underflow, the matrix should be
scaled.
INFO (output) INTEGER
= 0: successful exit
< 0: if INFO = -i, the i-th argument had an illegal value
> 0: if INFO = i, and i is
<= M: the i-th row of A is exactly zero
> M: the (i-M)-th column of A is exactly zero
LAPACK version 3.0 15 June 2000 SGBEQU(l)