Sponsored Content
Top Forums Shell Programming and Scripting Ranking data points from multiple files Post 302978708 by Don Cragun on Tuesday 2nd of August 2016 09:31:34 PM
Old 08-02-2016
The following seems to do what I think you want...
Code:
#!/bin/ksh
# Final component of script name.
IAm=${0##*/}

# Absolute pathname of control file.
CF='/some/dir/control.status'

# Absolute pathname of directory containing the *.pnt files to be procssed.
DataDir='/some/same_or_other/directory'

if ! cd "$DataDir"
then	exit 1
fi
if ! read BaseYear BaseMonth < "$CF"
then	exit 2
fi
BaseFile="$BaseYear$BaseMonth.pnt"
if [ ! -r "$BaseFile" ]
then	printf "%s: Can't read base file (%s).\n" "$IAm" "$DataDir/$BaseFile" >&2
	exit 3
fi

sort -bn -k1,1 -k2,2 -k3,3 *"$BaseMonth.pnt" | awk '
# Set output field separator to <tab>.
BEGIN {	OFS = "\t"
}

# Function to print a group of elements that all have identical values in the
# first and second input fields.
function print_group() {
	# Check to see if we have data to process...
	if(cnt) {
		# Look for the 1st change in values after the mid-point for
		# this set group.
		for(i = int((cnt + 1) / 2) + 1; i <= cnt; i++)
			if(d[i] != d[i - 1])
				break
		# For each set of duplicate values after the midpoint, reset
		# the rank for those points to the end of the set instead of
		# the start of the set.
		while(i < cnt) {
			if(c[i] > 1)
				for(j = i; j <= i + c[i] - 1; j++)
					r[j] += c[i] - 1
			i += c[i]
		}
		# Print the data and rank for each element of the set in the
		# base file.
		for(i = 1; i <= cnt; i++)
			if(d[i] in P) {
				print d[i], r[i]
				delete P[d[i]]
			}
	}
	# Reset variables for next group.
	cnt = con = 0
}

# Gather points to process from 1st input file...
FNR == NR {
	# Gather data from the base file (given as first file operand)...
	# Gather list of point pairs to be processed.
	L[$1 OFS $2]

	# Gather list of points and value triples to be printed.
	P[$1 OFS $2 OFS $3]
	next
}

# Skip points not found in the 1st input file...
!(($1 OFS $2) in L) {
	next
}

# Look for a change in the first two input fields...
$1 != l1 || $2 != l2 || NR == 1 {
	# We have found a change in values.  Print the results from the
	# previous group, if there was one.
	print_group()

	# Note first two field values so we notice the next change.
	l1 = $1
	l2 = $2

	# Clear the remembered 3rd field value to prevent contamination from
	# the previous group.
	l3 = ""
}

# Gather data for this group...
{	# Save the data for this line.
	d[++cnt] = $1 OFS $2 OFS $3

	# Calculate the rank for this line.  (At this point, we do not know
	# what the midpoint will be for this group, so all of these are saved
	# with the rank being the lowest rank for the set of lines with
	# identical third field values.  The group_print() function wll make
	# adjustments for sets of ranks after the midpoint in the group.)
	if($3 != l3 || cnt == 1) {
		# A change in field 3 values has been found.  Save the value
		# and rank for this set.
		l3 = $3
		lr = cnt
		# Clear the count of the consecutive number of lines with the
		# same value.
		con = 0
	} 

	# Set the rank for this line.
	r[cnt] = lr

	# Set number of consecutive lines that have this third field value.
	for(i = cnt - con++; i <= cnt; i++)
		c[i] = con
}

# We have found EOF.
END {	# Print the data for the last group.
	print_group()
}' "$BaseFile" -

This code is written assuming that it is possible for more than one entry for a pair of points to appear in a single *.pnt file. If only one entry for a given pair of points can appear in a *.pnt file, you can make this script run a little bit faster by changing the line shown in orange in the print_group() function from:
Code:
				delete P[d[i]]

to:
Code:
				break

With the sample inputs provided in post #42, it produces the output:
Code:
24.5625	-81.8125	39.16	1
24.5625	-81.7708	40.81	1
24.5625	-81.7292	46.73	1
24.5625	-81.6875	52.67	1
24.6042	-81.6458	62.22	2
24.6458	-81.5625	66.18	2
24.6458	-81.4792	68.19	2
24.6875	-81.5625	67.32	2
24.6875	-81.3958	71.72	2
24.7292	-81.3958	73.26	2
24.7708	-80.9375	90.29	2
25.1458	-81.1042	116.34	1
25.1458	-81.0625	117.04	1
25.1458	-81.0208	119.01	1
25.1458	-80.9792	118.53	1
25.1458	-80.9375	118.07	1
25.1458	-80.7708	142.98	1
25.1458	-80.7292	149.23	2
25.1458	-80.4375	171.91	2
25.1458	-80.3958	172.67	2
25.1875	-81.1042	122.42	1
25.1875	-81.0625	125.46	1
25.1875	-81.0208	125.53	1
25.1875	-80.9792	125.67	1
25.1875	-80.9375	127.46	1
25.1875	-80.8958	130.94	1

This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

2. Shell Programming and Scripting

recoding data points using SED??

Hello all, I have a data file that needs some serious work...I have no idea how to implement the changes that are needed! The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this: ID 1 2 3 4 ... (7 Replies)
Discussion started by: doobedoo
7 Replies

3. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

4. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

5. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

6. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

7. UNIX for Dummies Questions & Answers

Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24). manually to... (4 Replies)
Discussion started by: ida1215
4 Replies

8. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

9. UNIX for Dummies Questions & Answers

Stack data from multiple files into one, with variable column files

Hello Gurus, Im new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldnt find a solution to my issue. Greatly appreciate any help.. I have three csv files -> Apex_10_Latest.csv,... (1 Reply)
Discussion started by: wamshi
1 Replies

10. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies
All times are GMT -4. The time now is 04:04 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy