The following seems to do what I think you want...
Code:
#!/bin/ksh
# Final component of script name.
IAm=${0##*/}
# Absolute pathname of control file.
CF='/some/dir/control.status'
# Absolute pathname of directory containing the *.pnt files to be procssed.
DataDir='/some/same_or_other/directory'
if ! cd "$DataDir"
then exit 1
fi
if ! read BaseYear BaseMonth < "$CF"
then exit 2
fi
BaseFile="$BaseYear$BaseMonth.pnt"
if [ ! -r "$BaseFile" ]
then printf "%s: Can't read base file (%s).\n" "$IAm" "$DataDir/$BaseFile" >&2
exit 3
fi
sort -bn -k1,1 -k2,2 -k3,3 *"$BaseMonth.pnt" | awk '
# Set output field separator to <tab>.
BEGIN { OFS = "\t"
}
# Function to print a group of elements that all have identical values in the
# first and second input fields.
function print_group() {
# Check to see if we have data to process...
if(cnt) {
# Look for the 1st change in values after the mid-point for
# this set group.
for(i = int((cnt + 1) / 2) + 1; i <= cnt; i++)
if(d[i] != d[i - 1])
break
# For each set of duplicate values after the midpoint, reset
# the rank for those points to the end of the set instead of
# the start of the set.
while(i < cnt) {
if(c[i] > 1)
for(j = i; j <= i + c[i] - 1; j++)
r[j] += c[i] - 1
i += c[i]
}
# Print the data and rank for each element of the set in the
# base file.
for(i = 1; i <= cnt; i++)
if(d[i] in P) {
print d[i], r[i]
delete P[d[i]]
}
}
# Reset variables for next group.
cnt = con = 0
}
# Gather points to process from 1st input file...
FNR == NR {
# Gather data from the base file (given as first file operand)...
# Gather list of point pairs to be processed.
L[$1 OFS $2]
# Gather list of points and value triples to be printed.
P[$1 OFS $2 OFS $3]
next
}
# Skip points not found in the 1st input file...
!(($1 OFS $2) in L) {
next
}
# Look for a change in the first two input fields...
$1 != l1 || $2 != l2 || NR == 1 {
# We have found a change in values. Print the results from the
# previous group, if there was one.
print_group()
# Note first two field values so we notice the next change.
l1 = $1
l2 = $2
# Clear the remembered 3rd field value to prevent contamination from
# the previous group.
l3 = ""
}
# Gather data for this group...
{ # Save the data for this line.
d[++cnt] = $1 OFS $2 OFS $3
# Calculate the rank for this line. (At this point, we do not know
# what the midpoint will be for this group, so all of these are saved
# with the rank being the lowest rank for the set of lines with
# identical third field values. The group_print() function wll make
# adjustments for sets of ranks after the midpoint in the group.)
if($3 != l3 || cnt == 1) {
# A change in field 3 values has been found. Save the value
# and rank for this set.
l3 = $3
lr = cnt
# Clear the count of the consecutive number of lines with the
# same value.
con = 0
}
# Set the rank for this line.
r[cnt] = lr
# Set number of consecutive lines that have this third field value.
for(i = cnt - con++; i <= cnt; i++)
c[i] = con
}
# We have found EOF.
END { # Print the data for the last group.
print_group()
}' "$BaseFile" -
This code is written assuming that it is possible for more than one entry for a pair of points to appear in a single *.pnt file. If only one entry for a given pair of points can appear in a *.pnt file, you can make this script run a little bit faster by changing the line shown in orange in the print_group() function from:
Code:
delete P[d[i]]
to:
Code:
break
With the sample inputs provided in post #42, it produces the output:
suppose u have a file which consist of many data points separated by asterisk
Question is to extract third part in each line .
0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020
0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030
0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Hello all,
I have a data file that needs some serious work...I have no idea how to implement the changes that are needed!
The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this:
ID 1 2 3 4 ... (7 Replies)
hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Hi,
I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Hi,
I'd like to process multiple files. For example:
file1.txt
file2.txt
file3.txt
Each file contains several lines of data. I want to extract a piece of data and output it to a new file.
file1.txt ----> newfile1.txt
file2.txt ----> newfile2.txt
file3.txt ----> newfile3.txt
Here is... (3 Replies)
Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24).
manually to... (4 Replies)
I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Hello Gurus,
Im new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldnt find a solution to my issue. Greatly appreciate any help..
I have three csv files -> Apex_10_Latest.csv,... (1 Reply)
We have the data looks like below in a log file.
I want to generat files based on the string between two hash(#) symbol like below
Source:
#ext1#test1.tale2 drop
#ext1#test11.tale21 drop
#ext1#test123.tale21 drop
#ext2#test1.tale21 drop
#ext2#test12.tale21 drop
#ext3#test11.tale21 drop... (5 Replies)