Sponsored Content
Top Forums Shell Programming and Scripting Ranking data points from multiple files Post 302976481 by Don Cragun on Wednesday 29th of June 2016 04:36:34 PM
Old 06-29-2016
Quote:
Originally Posted by ncwxpanther
Thanks for taking a look at this Don.

Based on your assumptions I have moved all 100+ input files over to my parent directory. I can now reference them as {1895..2016}05.pnt

To clarify your statement at the bottom of your post

I need to base the ranks off of a particular input file. In this case it would be 201605.pnt
So I figured I would gather fields 1 and 2 from that file; search through the other 100+ files for the same set of values in fields 1 and 2; rank the value in field 3 of 201605.pnt based on those other 100+ files.

The result would be a single value of field 1 and 2 with a (value) and rank. Value is optional and if its there would need to be taken from the primary input - 201605.pnt. In other words the expected output would be 201605.pnt with a rank added as a 4th field. The rank would be based on that value when compared to the other 100+ files with like fields 1 and 2.

I ran your script as
Code:
./grid-ranking.sh {1895..2016}05.pnt

and got output (rather quickly) that has the ranks as "1" and multiple listing of the same fields 1 and 2.
Code:
 25.8125  -80.9375    24.87     1
 25.8125  -80.9375    24.88     1
 25.8125  -80.9375    24.90     1
 25.8125  -80.9375    25.00     1
 25.8125  -80.9375    25.00     1
 25.8125  -80.9375    25.01     1
 25.8125  -80.9375    25.03     1
 25.8125  -80.9375    25.07     1

Hi ncwxpanther,
Instead of moving all of your data files to a parent directory, I would just have executed the script in the child directory where the files were located. But, either way should work.

Your data format seems to change every time you post something. As stated in my last post, my code was designed to work with the sample input you provided in post #27 under the title: "Entire Input for a single value (sorted by data)". Note that in that sample data there are no leading spaces, and the field separator between fields is a single tab character; not sequences of one or more spaces.

In the data shown above, however, there is a leading space character and the field separators are sequences of two, four, or six space characters. If there were no tab characters in your input this time, my script would have only seen one input field; not three. Please make the following changes to the script I suggested:
Change lines 2-4 from:
Code:
sort -n -k1,1 -k2,2 -k3,3 "$@" | awk '
# Set input and output field separators to <tab>.
BEGIN { FS = OFS = "\t"

to:
Code:
sort -bn -k1,1 -k2,2 -k3,3 "$@" | awk '
# Set output field separator to <tab>.
BEGIN {	OFS = "\t"

and change line 51 from:
Code:
	d[++cnt] = $0

to:
Code:
	d[++cnt] = $1 OFS $2 OFS $3

and try again. This will normalize the output using a single tab as the output field separator no matter how many spaces and tabs appeared before the 1st field or between other fields in your input files. It does, however, still assume that your real field 1 and 2 input data is numeric (which might or might not work with some of your early sample data with uppercase alphabetic values in the 1st two fields). If the 1st two fields are alphanumeric instead of numeric, you could change line 2 in the script I suggested above to:
Code:
sort -b -k1,1 -k2,2 -k3,3n "$@" | awk '

As I said before, this code provides individual rankings for each different pair of field 1 and field 2 values in a single output file. If each of your 122 sample input files contains five different pairs of field 1 and 2 values and the same values appears in all of your input files, you will get five rank-ordered lists in the output sorted by the field 1 and 2 values with each list containing 122 entries. If 5 of your 122 input files contain an additional line with another pair of field 1 and 2 values, there will be another list in the output with only 5 entries. If you really need to limit the output to only contain field 1 and field 2 value combinations that appear in a specific file, I can add code to my script to make that happen. But, of course, I still find your continually changing descriptions of your desired output confusing and I may have completely misinterpreted what you are trying to do.

Quote:
Originally Posted by Corona688
No matter what I do, I cannot get sort to do anything sensible here. It does not give sort priority to columns 1 and 2, it just ignores the first two -k and obeys the third.
Hi Corona688,
With the data I was using field 1 never had any leading spaces and the field separator was always a single tab character, so:
Code:
sort -n -k1,1 -k2,2 -k3,3 file...

worked with the data I was using. But, with the above sort command, leading spaces and tabs are part of the data being sorted. To ignore leading blanks, we also need the -b option to be specified before any of the -k sort key options.

Hopefully,
Code:
sort -bn -k1,1 -k2,2 -k3,3 file...

or:
Code:
sort -b -k1,1 -k2,2 -k3,3n file...

will work better for you (depending on whether the 1st two fields are numeric or alphanumeric, respectively).

Note also that the standards say that if the -t char option is not specified, sort uses strings of one or more adjacent blanks (i.e., <space>s and <tab>s) as a field separator. But, if you include a -t option on the command line, each occurrence of char shall treated as a field separator. So, if an input line has a leading space and two spaces between the 2nd and 3rd "fields" (as recognized by awk with the default FS), sort would see field 1 as the empty string before the 1st space, field 2 as the 1st non-empty string, and field 3 would be the empty string between the next two spaces.

Are we having fun yet? Smilie
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

2. Shell Programming and Scripting

recoding data points using SED??

Hello all, I have a data file that needs some serious work...I have no idea how to implement the changes that are needed! The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this: ID 1 2 3 4 ... (7 Replies)
Discussion started by: doobedoo
7 Replies

3. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

4. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

5. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

6. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

7. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

9. UNIX for Dummies Questions & Answers

Stack data from multiple files into one, with variable column files

Hello Gurus, Im new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldnt find a solution to my issue. Greatly appreciate any help.. I have three csv files -> Apex_10_Latest.csv,... (1 Reply)
Discussion started by: wamshi
1 Replies

10. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies
All times are GMT -4. The time now is 04:54 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy