Rank based on column in Unix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Rank based on column in Unix
# 1  
Old 09-19-2011
Rank based on column in Unix

Hello guys

I have requirement to rank a flat file based on column separated by delimeter

I/P
Code:
 
1783747|091411|1000
1783747|091411|2000
1783747|091411|2000
1783747|091411|2000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|6000
1783747|091411|7000
1783747|091411|8000
1783747|091411|9000
1783747|091411|10000

Code:
 
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|8000|8
1783747|091411|9000|9
1783747|091411|10000|10

The rightmost column is rank in ascending order
Please let me know is it possible in unix

It can be done easily in oracle via analytic function if the data is in table
Thanks a lot
# 2  
Old 09-19-2011
here it is:
Code:
kent$  echo "1783747|091411|1000
1783747|091411|2000
1783747|091411|2000
1783747|091411|2000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|6000
1783747|091411|7000
1783747|091411|8000
1783747|091411|9000
1783747|091411|10000"|awk -F'|' '{if($3>x){x=$3;i++};print $0"|"i}'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|8000|8
1783747|091411|9000|9
1783747|091411|10000|10


Last edited by sk1418; 09-19-2011 at 09:58 AM..
This User Gave Thanks to sk1418 For This Post:
# 3  
Old 09-19-2011
Hello Prateek,

There might be no "rank" function available. But you can tweak an awk and can do it.

Something like this:

Code:
TESTBOX>awk -F"|" 'NR==1 { a=$NF; $0=$0 OFS $NF/a} NR >1 { $0=$0 OFS $NF/a}1' OFS="|" input_file

It may not be a generic solution but still a begining to explore.
This User Gave Thanks to panyam For This Post:
# 4  
Old 09-19-2011
@panyam

Your $NF/a part is creative. however it will fail if the $3 is like
Code:
...
8000
8100
8200
8300
9000
...

# 5  
Old 09-19-2011
Assuming 3rd field is the one to be sorted
Code:
sort -k3 -n -t"|" inputFile | awk -F"|" '{if(x!=$3){i++}{x=$3}}{print $0 FS i}'

--ahamed
# 6  
Old 09-19-2011
@sk1418

Hello,

I did not remeber how the "rank" works(forgot !!!) and hence posted a generic one.

How ever thanks for your valuable post.

Regards
Ravi
# 7  
Old 09-19-2011
Or you could use Perl -

Code:
$
$
$ cat f25
1783747|091411|2000
1783747|091411|1000
1783747|091411|2000
1783747|091411|6000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|10000
1783747|091411|7000
1783747|091411|2000
1783747|091411|7001
1783747|091411|9000
$
$
$ sort -t"|" -nk3,3 f25 | perl -F"\|" -lane '$F[3]=$F[2]>$p ? ++$i : $i; print join "|",@F; $p=$F[2]'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|3
1783747|091411|4000|4
1783747|091411|5000|5
1783747|091411|6000|6
1783747|091411|7000|7
1783747|091411|7001|8
1783747|091411|9000|9
1783747|091411|10000|10
$
$

tyler_durden

Quote:
Originally Posted by Pratik4891
...The rightmost column is rank in ascending order
...
It can be done easily in oracle via analytic function if the data is in table

Actually, the values of the rightmost column mimic the "DENSE_RANK" analytic function of Oracle.

If you want the values of "RANK" analytic function, then -

Code:
$
$
$ cat f25
1783747|091411|2000
1783747|091411|1000
1783747|091411|2000
1783747|091411|6000
1783747|091411|3000
1783747|091411|4000
1783747|091411|5000
1783747|091411|10000
1783747|091411|7000
1783747|091411|2000
1783747|091411|7001
1783747|091411|9000
1783747|091411|10000
1783747|091411|10000
1783747|091411|19999
$
$
$ sort -t"|" -nk3,3 f25 | perl -F"\|" -lane '++$n;
                                             if ($F[2] > $p) {++$i; $i = $n if $i < $n}
                                             $F[3] = $i; print join "|",@F; $p=$F[2]'
1783747|091411|1000|1
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|2000|2
1783747|091411|3000|5
1783747|091411|4000|6
1783747|091411|5000|7
1783747|091411|6000|8
1783747|091411|7000|9
1783747|091411|7001|10
1783747|091411|9000|11
1783747|091411|10000|12
1783747|091411|10000|12
1783747|091411|10000|12
1783747|091411|19999|15
$
$


Last edited by durden_tyler; 09-19-2011 at 12:06 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Hi All , I am having an input file as stated below Input file 6 ddk/djhdj/djhdj/Q 10 0.5 dhd/jdjd.djd.nd/QB 01 0.5 hdhd/jd/jd/jdj/Q 10 0.5 512 hd/hdh/gdh/Q 01 0.5 jdjd/jd/ud/j/QB 10 0.5 HD/jsj/djd/Q 01 0.5 71 hdh/jjd/dj/jd/Q 10 0.5 ... (5 Replies)
Discussion started by: kshitij
5 Replies

2. Shell Programming and Scripting

Get maximum per column from CSV file, based on date column

Hello everyone, I am using ksh on Solaris 10 and I'm gathering data in a CSV file that looks like this: 20170628-23:25:01,1,0,0,1,1,1,1,55,55,1 20170628-23:30:01,1,0,0,1,1,1,1,56,56,1 20170628-23:35:00,1,0,0,1,1,2,1,57,57,2 20170628-23:40:00,1,0,0,1,1,1,1,58,58,2... (6 Replies)
Discussion started by: ejianu
6 Replies

3. Shell Programming and Scripting

UNIX command -Filter rows in fixed width file based on column values

Hi All, I am trying to select the rows in a fixed width file based on values in the columns. I want to select only the rows if column position 3-4 has the value AB I am using cut command to get the column values. Is it possible to check if cut -c3-4 = AB is true then select only that... (2 Replies)
Discussion started by: ashok.k
2 Replies

4. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

5. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

6. UNIX for Dummies Questions & Answers

Assigning rank to rows of numbers based on the last column

I have a tab delimited text file that looks like the following: ERBB3 0.00097 IL31RA 0.000972 SETD5 0.000972 MCART1 0.000973 CENPJ 0.000973 FNDC6 0.000974 I want to assign a number to each row based on the value in the last column (in the order of increasing value so that the first row... (3 Replies)
Discussion started by: evelibertine
3 Replies

7. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44... (1 Reply)
Discussion started by: polsum
1 Replies

8. Shell Programming and Scripting

to add special tag to a column based on column condition

Hi All, I have following html code <TR><TD>9</TD><TD>AR_TVR_TBS </TD><TD>85000</TD><TD>39938</TD><TD>54212</TD><TD>46</TD></TR> <TR><TD>10</TD><TD>ASCV_SMY_TBS </TD><TD>69880</TD><TD>33316</TD><TD>45698</TD><TD>47</TD></TR> <TR><TD>11</TD><TD>ARC_TBS ... (9 Replies)
Discussion started by: ckwan
9 Replies

9. Web Development

Fix For Google Page Rank: Wordpress List Rank Dashboard Widget

Here is the fix for the recent Google changes to their pagerank API. For example, in the List Rank Dashboard Widget Wordpress Plugin (Version 1.7), in this plugin file: list-rank-dashboard-widget/wp-list-rank-class.php in this function: function getGooglePR($url) Change this line: ... (0 Replies)
Discussion started by: Neo
0 Replies

10. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies
Login or Register to Ask a Question