Ranking data points from multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Ranking data points from multiple files
# 22  
Old 06-22-2016
Quote:
Originally Posted by Corona688
What if changing one tie breaks another? Which wins?
Sorry but i do not follow you.

We are counting the number of records (from both ends) and if the previous value is the same as the current value, the ranking is the same and it continues counting the records.
# 23  
Old 06-22-2016
Quote:
Originally Posted by ncwxpanther
Sorry but i do not follow you.
Me neither.

Quote:
We are counting the number of records (from both ends) and if the previous value is the same as the current value, the ranking is the same
Isn't this true by definition, unless the number of records changes from file to file?
# 24  
Old 06-24-2016
Is there a way to rank like values as ties?

For instance:

Below are the same data points and its rank (100 and 101)
Code:
46.8542 -121.7292 100
46.8542 -121.7292 101

The ranks are different but should be the same since the data value for those points are tied.

Code:
46.8542 -121.7292 -1.56
46.8542 -121.7292 -1.56

With the following code is there a way to handle these ties properly?

Code:
REF="test/190005.pnt"

sort -k3 -n test/{1900..2016}05.pnt |
        awk '
        # Read the values you want to rank from the first file
        # This trick works by NR and FNR only being the same while reading
        # the first file, not the second.
        NR==FNR {  A[$1,$2]=$3 ; next }
        # Read everything sorted by value, counting order per prefix
        # as it goes.  When a matching value is found, print its order.
        ++C[$1,$2] && A[$1,$2] == $3 { print $1, $2, C[$1,$2] }
' $REF -

# 25  
Old 06-24-2016
Quote:
Originally Posted by ncwxpanther
Is there a way to rank like values as ties?

For instance:

Below are the same data points and its rank (100 and 101)
Code:
46.8542 -121.7292 100
46.8542 -121.7292 101

The ranks are different but should be the same since the data value for those points are tied.
So, don't increment the rank when the same value is repeated? I'll see if I can do that.
# 26  
Old 06-24-2016
Code:
REF="test/190005.pnt"

sort -k3 -n test/{1900..2016}05.pnt |
        awk 'NR==FNR {  A[$1,$2]=$3 ; next }

        LAST[$1,$2] != $3 {
                ++C[$1,$2]
                LAST[$1,$2] = $3
        }

        A[$1,$2] == $3 { print $1, $2, C[$1,$2] }
' $REF -

# 27  
Old 06-27-2016
I think we are getting close. Thanks for the help on this one!

The rank still needs to be incremented when the same value is repeated. And only one of the values needs to be printed.

Currently any value that is tied is printed. The result is multiple prints of the same value if/when tied.

Your latest script prints the following (note that the rank of 92 is not correct)
Code:
46.8542 -121.7292 92
46.8542 -121.7292 92

Below I am grepping out a single value that I know is tied. The output should be a single value and its rank - tied or not.

So when i grep for that single value, only 1 line is returned.
Code:
 
46.8542 -121.7292    100


The following script works for a single value. I need the new script to work for all values.
Code:
for year in test/{1895..2016}05.pnt 
do
cat "$year"
done |
grep "46.8542 -121.7292" $year | sort -k3 -n | awk '$3 != prev { rank = NR }{ print $0, "   "rank; prev = $3 }'

Output snipit
Code:
 46.8542 -121.7292    -1.62    95
 46.8542 -121.7292    -1.61    97
 46.8542 -121.7292    -1.61    97
 46.8542 -121.7292    -1.59    99
 46.8542 -121.7292    -1.56    100
 46.8542 -121.7292    -1.56    100
 46.8542 -121.7292    -1.52    102
 46.8542 -121.7292    -1.51    103
 46.8542 -121.7292    -1.43    104
 46.8542 -121.7292    -1.39    105
 46.8542 -121.7292    -1.39    105
 46.8542 -121.7292    -1.30    107



Entire Input for a single value (sorted by data)
Code:
46.8542	-121.7292	-6.08
46.8542	-121.7292	-5.99
46.8542	-121.7292	-5.66
46.8542	-121.7292	-5.61
46.8542	-121.7292	-5.49
46.8542	-121.7292	-5.48
46.8542	-121.7292	-5.42
46.8542	-121.7292	-5.33
46.8542	-121.7292	-5.33
46.8542	-121.7292	-5.29
46.8542	-121.7292	-5.28
46.8542	-121.7292	-5.15
46.8542	-121.7292	-5.1
46.8542	-121.7292	-5.09
46.8542	-121.7292	-4.93
46.8542	-121.7292	-4.74
46.8542	-121.7292	-4.73
46.8542	-121.7292	-4.62
46.8542	-121.7292	-4.58
46.8542	-121.7292	-4.56
46.8542	-121.7292	-4.55
46.8542	-121.7292	-4.53
46.8542	-121.7292	-4.51
46.8542	-121.7292	-4.47
46.8542	-121.7292	-4.41
46.8542	-121.7292	-4.32
46.8542	-121.7292	-4.3
46.8542	-121.7292	-4.26
46.8542	-121.7292	-4.16
46.8542	-121.7292	-4.14
46.8542	-121.7292	-4.1
46.8542	-121.7292	-4.09
46.8542	-121.7292	-4.09
46.8542	-121.7292	-4
46.8542	-121.7292	-3.99
46.8542	-121.7292	-3.94
46.8542	-121.7292	-3.88
46.8542	-121.7292	-3.87
46.8542	-121.7292	-3.83
46.8542	-121.7292	-3.77
46.8542	-121.7292	-3.76
46.8542	-121.7292	-3.72
46.8542	-121.7292	-3.62
46.8542	-121.7292	-3.61
46.8542	-121.7292	-3.49
46.8542	-121.7292	-3.49
46.8542	-121.7292	-3.46
46.8542	-121.7292	-3.43
46.8542	-121.7292	-3.4
46.8542	-121.7292	-3.37
46.8542	-121.7292	-3.34
46.8542	-121.7292	-3.32
46.8542	-121.7292	-3.31
46.8542	-121.7292	-3.28
46.8542	-121.7292	-3.27
46.8542	-121.7292	-3.27
46.8542	-121.7292	-3.23
46.8542	-121.7292	-3.21
46.8542	-121.7292	-3.2
46.8542	-121.7292	-3.17
46.8542	-121.7292	-3.17
46.8542	-121.7292	-3.12
46.8542	-121.7292	-3.11
46.8542	-121.7292	-3.08
46.8542	-121.7292	-3.06
46.8542	-121.7292	-3.05
46.8542	-121.7292	-3.04
46.8542	-121.7292	-3.02
46.8542	-121.7292	-3.01
46.8542	-121.7292	-2.98
46.8542	-121.7292	-2.93
46.8542	-121.7292	-2.84
46.8542	-121.7292	-2.8
46.8542	-121.7292	-2.77
46.8542	-121.7292	-2.76
46.8542	-121.7292	-2.75
46.8542	-121.7292	-2.7
46.8542	-121.7292	-2.67
46.8542	-121.7292	-2.67
46.8542	-121.7292	-2.62
46.8542	-121.7292	-2.48
46.8542	-121.7292	-2.47
46.8542	-121.7292	-2.46
46.8542	-121.7292	-2.39
46.8542	-121.7292	-2.31
46.8542	-121.7292	-2.29
46.8542	-121.7292	-2.22
46.8542	-121.7292	-2.18
46.8542	-121.7292	-2.15
46.8542	-121.7292	-2.14
46.8542	-121.7292	-2.08
46.8542	-121.7292	-1.8
46.8542	-121.7292	-1.75
46.8542	-121.7292	-1.68
46.8542	-121.7292	-1.62
46.8542	-121.7292	-1.62
46.8542	-121.7292	-1.61
46.8542	-121.7292	-1.61
46.8542	-121.7292	-1.59
46.8542	-121.7292	-1.56
46.8542	-121.7292	-1.56
46.8542	-121.7292	-1.52
46.8542	-121.7292	-1.51
46.8542	-121.7292	-1.43
46.8542	-121.7292	-1.39
46.8542	-121.7292	-1.39
46.8542	-121.7292	-1.3
46.8542	-121.7292	-1.26
46.8542	-121.7292	-1.09
46.8542	-121.7292	-1.08
46.8542	-121.7292	-1.02
46.8542	-121.7292	-1.02
46.8542	-121.7292	-0.93
46.8542	-121.7292	-0.9
46.8542	-121.7292	-0.77
46.8542	-121.7292	-0.68
46.8542	-121.7292	-0.61
46.8542	-121.7292	-0.44
46.8542	-121.7292	-0.3
46.8542	-121.7292	-0.14
46.8542	-121.7292	0.71
46.8542	-121.7292	1.05

# 28  
Old 06-27-2016
But the output you show does not increment when the same rank is repeated? This contradicts your request.

I don't understand what you want.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies

2. UNIX for Dummies Questions & Answers

Stack data from multiple files into one, with variable column files

Hello Gurus, Im new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldnt find a solution to my issue. Greatly appreciate any help.. I have three csv files -> Apex_10_Latest.csv,... (1 Reply)
Discussion started by: wamshi
1 Replies

3. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

4. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

6. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

7. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

8. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

9. Shell Programming and Scripting

recoding data points using SED??

Hello all, I have a data file that needs some serious work...I have no idea how to implement the changes that are needed! The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this: ID 1 2 3 4 ... (7 Replies)
Discussion started by: doobedoo
7 Replies

10. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies
Login or Register to Ask a Question