How to remove duplicated based on longest row & largest value in a column Post: 302426550

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers How to remove duplicated based on longest row & largest value in a column Post 302426550 by reva on Wednesday 2nd of June 2010 07:15:12 AM

06-02-2010

Registered User

Its not working , If the file contains data lik this
a.dat:

HTML Code:

 BDA 1908 10 23 20 14  6.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.00  7.00   0 NULL
   G-R 1908 10 23 20 14  6.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.00  7.00   0 NULL
   SIG 1908 10 23 20 14  0.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.60  7.60   0 NULL
   SSR 1908 10 23 20 14  6.00  36.5000  70.5000 220.0   0  7.60   0  6.90  0.00  6.10   0  6.80  7.60   0 NULL
   BDA 1908 10 24 21 16 36.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.00  7.00   0 NULL
   G-R 1908 10 24 21 16 36.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.00  7.00   0 NULL
   SIG 1908 12 12  0  0  0.00  26.5000  97.0000   0.0   0  0.00   0  0.00  0.00  0.00   0  7.50  7.50   0 NULL
   G-R 1908 12 12 12 54 54.00  26.5000  97.0000 100.0   0  0.00   0  0.00  0.00  0.00   0  7.50  7.50   0 NULL
   SIG 1909  7  7  0  0  0.00  36.5000  70.5000 230.0   0  0.00   0  0.00  0.00  0.00   0  7.80  7.80   0 NULL
   SIG 1909  7  7 21 39  0.00  36.5000  70.5000  60.0   0  0.00   0  0.00  0.00  0.00   0  7.60  7.60   0 NULL

The output should be

HTML Code:

   SSR 1908 10 23 20 14  6.00  36.5000  70.5000 220.0   0  7.60   0  6.90  0.00  6.10   0  6.80  7.60   0 NULL
  BDA 1908 10 24 21 16 36.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.00  7.00   0 NULL
   SIG 1908 12 12  0  0  0.00  26.5000  97.0000   0.0   0  0.00   0  0.00  0.00  0.00   0  7.50  7.50   0 NULL
   SIG 1909  7  7  0  0  0.00  36.5000  70.5000 230.0   0  0.00   0  0.00  0.00  0.00   0  7.80  7.80   0 NULL

If i am using
sort -k 2,5 -k 19r file_name|awk 'a!=$2$3$4 {a=$2$3$4;print $0}'But i am getting the output as

HTML Code:

   SIG 1908 10 23 20 14  0.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.60  7.60   0 NULL
   BDA 1908 10 24 21 16 36.00  36.5000  70.5000 220.0   0  0.00   0  0.00  0.00  0.00   0  7.00  7.00   0 NULL
   SIG 1908 12 12  0  0  0.00  26.5000  97.0000   0.0   0  0.00   0  0.00  0.00  0.00   0  7.50  7.50   0 NULL
   SIG 1909  7  7  0  0  0.00  36.5000  70.5000 230.0   0  0.00   0  0.00  0.00  0.00   0  7.80  7.80   0 NULL

Help me out

reva

View Public Profile for reva

Find all posts by reva

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ITERATION: remove row based on string value

It is my first post, hoping to get help from the forum. In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column. OM 3328 O BT 268 5.800 7.500 4.700 0.000 ...

2. Shell Programming and Scripting

How to print column based on row number

Hi, I want to print column value based on row number say multiple of 8. Input file: line 1 67 34 line 2 45 57 . . . . . . line 8 12 46 . . . . . . line 16 24 90 . . . . . . line 24 49 67 Output 46 90 67

3. Shell Programming and Scripting

duplicate row based on single column

I am a newbie to shell scripting .. I have a .csv file. It has 1000 some rows and about 7 columns... but before I insert this data to a table I have to parse it and clean it ..basing on the value of the first column..which a string of phone number type... example below.. column 1 ...

4. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ...

5. Shell Programming and Scripting

Sort a the file & refine data column & row format

cat file1.txt field1 "user1": field2:"data-cde" field3:"data-pqr" field4:"data-mno" field1 "user1": field2:"data-dcb" field3:"data-mxz" field4:"data-zul" field1 "user2": field2:"data-cqz" field3:"data-xoq" field4:"data-pos" Now i need to have the date like below. i have just...

6. Shell Programming and Scripting

Deleting a row based on fetched value of column

Hi, I have a file which consists of two columns but the first one can be varying in length like 123456789 0abcd 123456789 0abcd 4015 0 0abcd 5000 0abcd I want to go through the file reading each line, count the number of characters in the first column and delete...

7. Shell Programming and Scripting

Remove duplicates within row and separate column

Hi all I have following kind of input file ESR1 PA156 leflunomide PA450192 leflunomide CHST3 PA26503 docetaxel Pa4586; thalidomide Pa34958; decetaxel docetaxel docetaxel I want to remove duplicates and I want to separate anything before and after PAxxxx entry into columns or...

8. Shell Programming and Scripting

Find smallest & largest in every column

Dear All, I have input like this, J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1501 1 1 4 6101 7392 2 2442 2685 18 3201 4008 20 120 4158 J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1502 1 1 4 5125 6416 2 ...

9. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ...

10. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC...

LEARN ABOUT REDHAT

claqge

CLAQGE(l)								 )								 CLAQGE(l)

NAME

       CLAQGE - equilibrate a general M by N matrix A using the row and scaling factors in the vectors R and C

SYNOPSIS

       SUBROUTINE CLAQGE( M, N, A, LDA, R, C, ROWCND, COLCND, AMAX, EQUED )

	   CHARACTER	  EQUED

	   INTEGER	  LDA, M, N

	   REAL 	  AMAX, COLCND, ROWCND

	   REAL 	  C( * ), R( * )

	   COMPLEX	  A( LDA, * )

PURPOSE

       CLAQGE equilibrates a general M by N matrix A using the row and scaling factors in the vectors R and C.

ARGUMENTS

       M       (input) INTEGER
	       The number of rows of the matrix A.  M >= 0.

       N       (input) INTEGER
	       The number of columns of the matrix A.  N >= 0.

       A       (input/output) COMPLEX array, dimension (LDA,N)
	       On entry, the M by N matrix A.  On exit, the equilibrated matrix.  See EQUED for the form of the equilibrated matrix.

       LDA     (input) INTEGER
	       The leading dimension of the array A.  LDA >= max(M,1).

       R       (input) REAL array, dimension (M)
	       The row scale factors for A.

       C       (input) REAL array, dimension (N)
	       The column scale factors for A.

       ROWCND  (input) REAL
	       Ratio of the smallest R(i) to the largest R(i).

       COLCND  (input) REAL
	       Ratio of the smallest C(i) to the largest C(i).

       AMAX    (input) REAL
	       Absolute value of largest matrix entry.

       EQUED   (output) CHARACTER*1
	       Specifies the form of equilibration that was done.  = 'N':  No equilibration
	       =  'R':	 Row  equilibration, i.e., A has been premultiplied by diag(R).  = 'C':  Column equilibration, i.e., A has been postmulti-
	       plied by diag(C).  = 'B':  Both row and column equilibration, i.e., A has been replaced by diag(R) * A * diag(C).

PARAMETERS

       THRESH is a threshold value used to decide if row or column scaling should be done based on the ratio of the row or column scaling factors.
       If ROWCND < THRESH, row scaling is done, and if COLCND < THRESH, column scaling is done.

       LARGE and SMALL are threshold values used to decide if row scaling should be done based on the absolute size of the largest matrix element.
       If AMAX > LARGE or AMAX < SMALL, row scaling is done.

LAPACK version 3.0						   15 June 2000 							 CLAQGE(l)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ITERATION: remove row based on string value

Discussion started by: asanjuan

2. Shell Programming and Scripting

How to print column based on row number

Discussion started by: Surabhi_so_mh

3. Shell Programming and Scripting

duplicate row based on single column

Discussion started by: mitr

4. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

5. Shell Programming and Scripting

Sort a the file & refine data column & row format

Discussion started by: ckaramsetty

6. Shell Programming and Scripting

Deleting a row based on fetched value of column

Discussion started by: swasid

7. Shell Programming and Scripting

Remove duplicates within row and separate column

Discussion started by: manigrover

8. Shell Programming and Scripting

Find smallest & largest in every column

Discussion started by: attila

9. Shell Programming and Scripting

Trying to remove duplicates based on field and row

Discussion started by: newbie2010

10. Shell Programming and Scripting

How to remove duplicated column in a text file?

Discussion started by: huiyee1

LEARN ABOUT REDHAT

claqge