Ranking data points from multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Ranking data points from multiple files
# 36  
Old 07-12-2016
Quote:
Originally Posted by Don Cragun
If you really need to limit the output to only contain field 1 and field 2 value combinations that appear in a specific file, I can add code to my script to make that happen.
Don

Sorry I am not explaining myself clear enough for you. It would be much easier if we were face to face. You hit the nail on the head with your comment above.

Your latest suggestions worked as designed, printing out the rank of every field 1 and field 2 data point for all 100+ files.

However I am only interested in the fields 1 and 2 of a specific file to be printed. In this case it would be 201605.pnt, but it will be called with a variable each month.

I guess one way would be to print out the file names associated with each data point and then grep out the needed files.

Last edited by ncwxpanther; 07-12-2016 at 12:33 PM..
# 37  
Old 07-12-2016
Quote:
Originally Posted by ncwxpanther
Don

Sorry I am not explaining myself clear enough for you. It would be much easier if we were face to face. You hit the nail on the head with your comment above.

Your latest suggestions worked as designed, printing out the rank of every field 1 and field 2 data point for all 100+ files.

However I am only interested in the fields 1 and 2 of a specific file to be printed. In this case it would be 201605.pnt, but it will be called with a variable each month.

I guess one way would be to print out the file names associated with each data point and then grep out the needed files.
Or, you could CLEARLY state your requirements and give us examples that show what your inputs actually are and what output you actually want so we might all be able to understand what you are trying to do.

Every post you make changes something or adds a new requirement. We now know that your script will be called with a variable each month. Great! What is the name of this variable? What is the value of this variable? What should your script do with this variable?

Please stop making us guess at what your requirements might be! Please help us help you.
# 38  
Old 07-13-2016
The variable input will be called from a file named control.status

This file contains the YYYY MM. In this case its 2016 05

The variables are set using the following syntax:

Code:
curmo=`awk '{print $2; }' /some/dir/control.status `
curyr=`awk '{print $1; }' /some/dir/control.status `

So this variable can be used to determine the name of the specific file to use. In this case it would be $curyr$curmo.pnt
# 39  
Old 07-14-2016
Quote:
Originally Posted by ncwxpanther
The variable input will be called from a file named control.status

This file contains the YYYY MM. In this case its 2016 05

The variables are set using the following syntax:

Code:
curmo=`awk '{print $2; }' /some/dir/control.status `
curyr=`awk '{print $1; }' /some/dir/control.status `

So this variable can be used to determine the name of the specific file to use. In this case it would be $curyr$curmo.pnt
OK. So you say you are passing variables to to your script, but you aren't; you're extracting values from a configuration file. And, the variable names you chose (curmo and curyr) would seem to imply that they are intended to contain the current month and year, but if that is what they were, they would be set by invoking the date utility to get the current month and year. But, of course you say that the current month is 05 which makes no sense in July.

And, now that we have two values defined in a configuration file, you still haven't explained exactly what is supposed to be done with those values. But I made a guess after rereading this entire thread. If offer the following guess at a script that might do what you want. Obviously, the text shown in red will have to be modified to fit in your environment. If it does what you want, I'll consider myself very luck. It it doesn't maybe you can use it as a base on which you can build something that will do what you want.

Code:
#!/bin/ksh
# Final component of script name.
IAm=${0##*/}

# Absolute pathname of control file.
CF='/some/dir/control.status'


# Absolute pathname of directory containing the *.pnt files to be procssed.
DataDir='/some/same_or_other/directory'

if ! cd "$DataDir"
then	exit 1
fi
if ! read BaseYear BaseMonth < "$CF"
then	exit 2
fi
BaseFile="$BaseYear$BaseMonth.pnt"
if [ ! -r "$BaseFile" ]
then	printf "%s: Can't read base file (%s).\n" "$IAm" "$DataDir/$BaseFile" >&2
	exit 3
fi

sort -bn -k1,1 -k2,2 -k3,3 *"$BaseMonth.pnt" | awk '
# Set output field separator to <tab>.
BEGIN {	OFS = "\t"
}

# Function to print a group of elements that all have identical values in the
# first and second input fields.
function print_group() {
	# Check to see if we have data to process...
	if(cnt) {
		# Look for the 1st change in values after the mid-point for
		# this set group.
		for(i = int((cnt + 1) / 2) + 1; i <= cnt; i++)
			if(d[i] != d[i - 1])
				break
		# For each set of duplicate values after the midpoint, reset
		# the rank for those points to the end of the set instead of
		# the start of the set.
		while(i < cnt) {
			if(c[i] > 1)
				for(j = i; j <= i + c[i] - 1; j++)
					r[j] += c[i] - 1
			i += c[i]
		}
		# Print the data and rank for each element of the set.
		for(i = 1; i <= cnt; i++)
			print d[i], r[i]
	}
	# Reset variables for next group.
	cnt = con = 0
}

# Gather points to process from the 1st input file...
FNR == NR {
	L[$1 OFS $2]
	next
}

# Skip points not found in the 1st input file...
!(($1 OFS $2) in L) {
	next
}

# Look for a change in the first two input fields...
$1 != l1 || $2 != l2 {
	# We have found a change in values.  Print the results from the
	# previous group, if there was one.
	print_group()

	# Note first two field values so we notice the next change.
	l1 = $1
	l2 = $2

	# Clear the remembered 3rd field value to prevent contamination from
	# the previous group.
	l3 = ""
}

# Gather data for this group...
{	# Save the data for this line.
	d[++cnt] = $1 OFS $2 OFS $3

	# Calculate the rank for this line.  (At this point, we do not know
	# what the midpoint will be for this group, so all of these are saved
	# with the rank being the lowest rank for the set of lines with
	# identical third field values.  The group_print() function wll make
	# adjustments for sets of ranks after the midpoint in the group.)
	if($3 != l3 || cnt == 1) {
		# A change in field 3 values has been found.  Save the value
		# and rank for this set.
		l3 = $3
		lr = cnt
		# Clear the count of the consecutive number of lines with the
		# same value.
		con = 0
	} 

	# Set the rank for this line.
	r[cnt] = lr

	# Set number of consecutive lines that have this third field value.
	for(i = cnt - con++; i <= cnt; i++)
		c[i] = con
}

# We have found EOF.
END {	# Print the data for the last group.
	print_group()
}' "$BaseFile" -

As always, if someone wants to try this on a Solaris/SunOS system, change awk in the script to /usr/xpg4/bin/awk or nawk. This was written and tested using a Korn shell, but should work with any shell that performs the basic parameter expansions required by the POSIX standards (such as ksh, bash, ash, dash, or zsh; but not a pure Bourne shell and not a shell based on csh).
# 40  
Old 08-02-2016
Don, All of your assumptions are spot on. But as an update:

The output is still printing out the rank of every field 1 and field 2 data point for all 100+ files. Even though I am only interested in the fields 1 and 2 of the Month and Year within the control file.
# 41  
Old 08-02-2016
Quote:
Originally Posted by ncwxpanther
Don, All of your assumptions are spot on. But as an update:

The output is still printing out the rank of every field 1 and field 2 data point for all 100+ files. Even though I am only interested in the fields 1 and 2 of the Month and Year within the control file.
The script I provided has code to prevent that from happening.

Please show us the contents of your control file, the contents of a sample base file and a sample additional file for a different year but the same month as your base file that exhibits the problem you are experiencing so we can track down what is wrong. (I'm not asking for entire files and you can sanitize any private data, but we need to see enough input in both files to get output that contains unwanted rankings for at least one pair of points.)
# 42  
Old 08-02-2016
control.file
Code:
2016 06

Sample base file 201606.pnt. I also tried using files that had values only of xx.xx in the 3rd field. This sample has both xx.xx and xxx.xx in the 3rd filed....if that matters.
Code:
 
 24.5625  -81.8125    39.16
 24.5625  -81.7708    40.81
 24.5625  -81.7292    46.73
 24.5625  -81.6875    52.67
 24.6042  -81.6458    62.22
 24.6458  -81.5625    66.18
 24.6458  -81.4792    68.19
 24.6875  -81.5625    67.32
 24.6875  -81.3958    71.72
 24.7292  -81.3958    73.26
 24.7708  -80.9375    90.29
 25.1458  -81.1042   116.34
 25.1458  -81.0625   117.04
 25.1458  -81.0208   119.01
 25.1458  -80.9792   118.53
 25.1458  -80.9375   118.07
 25.1458  -80.7708   142.98
 25.1458  -80.7292   149.23
 25.1458  -80.4375   171.91
 25.1458  -80.3958   172.67
 25.1875  -81.1042   122.42
 25.1875  -81.0625   125.46
 25.1875  -81.0208   125.53
 25.1875  -80.9792   125.67
 25.1875  -80.9375   127.46
 25.1875  -80.8958   130.94

Sample base file 201506.pnt
Code:
 
 24.5625  -81.8125    74.28
 24.5625  -81.7708    72.68
 24.5625  -81.7292    66.90
 24.5625  -81.6875    61.92
 24.6042  -81.6458    57.16
 24.6458  -81.5625    60.11
 24.6458  -81.4792    62.80
 24.6875  -81.5625    62.20
 24.6875  -81.3958    68.01
 24.7292  -81.3958    69.86
 24.7708  -80.9375    85.71
 25.1458  -81.1042   159.11
 25.1458  -81.0625   161.78
 25.1458  -81.0208   162.54
 25.1458  -80.9792   163.41
 25.1458  -80.9375   169.29
 25.1458  -80.7708   150.50
 25.1458  -80.7292   145.82
 25.1458  -80.4375   122.51
 25.1458  -80.3958   120.30
 25.1875  -81.1042   168.44
 25.1875  -81.0625   170.80
 25.1875  -81.0208   173.15
 25.1875  -80.9792   176.74
 25.1875  -80.9375   176.95
 25.1875  -80.8958   176.87

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies

2. UNIX for Dummies Questions & Answers

Stack data from multiple files into one, with variable column files

Hello Gurus, Im new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldnt find a solution to my issue. Greatly appreciate any help.. I have three csv files -> Apex_10_Latest.csv,... (1 Reply)
Discussion started by: wamshi
1 Replies

3. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

4. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

6. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

7. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

8. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

9. Shell Programming and Scripting

recoding data points using SED??

Hello all, I have a data file that needs some serious work...I have no idea how to implement the changes that are needed! The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this: ID 1 2 3 4 ... (7 Replies)
Discussion started by: doobedoo
7 Replies

10. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies
Login or Register to Ask a Question