finding nearest value in a column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers finding nearest value in a column
# 1  
Old 01-22-2011
finding nearest value in a column

Hi,
I have 2 files:
file1:
Code:
 
1 ia 2
1 mn 6
1 sd 11
2 ny 3
2 ma 10
3 wa 7
3 ca 8

file2
Code:
 
1 mi 3
1 wi 5
2 pa 4
3 id 6

For each line in file 2, I want to print the line in file 1 that a) matches the 1rst field of the line in file2 and b) whose value in the 3rd field is closest to the value in the 3rd field of the line in file2.

I.E., under the above restrictions, I would like the following output given file1 and file2:


Code:
 
1 ia 2
1 mn 6
2 ny 3
3 wa 7

Note, that this is a general example- file1 and file2 each have thousands of lines! Also, it is very unlikely that there will be ties for the nearest value in field 3.


Thanks in advance!

---------- Post updated 01-22-11 at 12:38 PM ---------- Previous update was 01-21-11 at 05:08 PM ----------

You can think of column 1 as a indicator for a given cluster.
So if the value is 1 in column 1 of a line in file2, I want to print the line in file1 with 2 restrictions:

a) the line has a value of 1 in column 1
AND
b) the value in column 3 of file1 is closest to the same value in the given line in file2.

So, for the line in file2
Code:
 
1 mi 3

, the following match column 1:

Code:
 
1 ia 2
1 mn 6
1 sd 11

Of these 3, the closest value in column 3 for the line in file2 is
Code:
 
1 ia 2

So, for that line in file2, we print:
Code:
 
1 ia 2


If we continue this algorithm for each line in file2 we get the output I listed in the output file.

Thus the reason there are 2 lines in the output file with vaue 1 in the the first column is because two lines in file2 have a value of 1, and the algorithm should only print lines of file1 that match this value in file2 due to the first restriction.

Last edited by peanuts48; 01-22-2011 at 02:05 PM.. Reason: Answer a question
# 2  
Old 01-22-2011
something along these lines....

nawk -f pea.awk file1.txt file2.txt

pea.awk:
Code:
function abs (v){
  return (v<0)?-v:v
}

FNR==NR {
   f1[$1]=($1 in f1)?f1[$1] SUBSEP $3:$3
   f1prime[$1,$3]= $2
   next
}
$1 in f1 {
   diff=999999999999999999
   n=split(f1[$1], tA, SUBSEP)
   for(i=1;i<=n;i++){
     d=abs(tA[i]-$3)
     if(d<diff) {
       diff=d
       v3=tA[i]
     }
   }
   print $1, f1prime[$1,v3], v3
}


Last edited by vgersh99; 01-22-2011 at 03:00 PM..
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 01-22-2011
Ok, Thanks, I will give it a try and let you know how it works!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding null column value using array

hi, Am trying to find a solution for finding a null column value inside a loop using array. for eg: two three five From the above array myarray,myarray and myarray having null values. But when am trying to check that space using some condition its not working. for (( i=0;... (4 Replies)
Discussion started by: rogerben
4 Replies

2. Shell Programming and Scripting

Finding Nth Column

Please help me how can I display every nth field present in a "|" delimited file. Ex: If a have a file with data as a|b|c|d|e|f|g|h|k|l|m|n I want to display every 3rd feild which means the output should be c f k n Please help me. (1 Reply)
Discussion started by: ngkumar
1 Replies

3. Shell Programming and Scripting

finding number in exact column

Dear all, I want to find a number in exact column but I don't know how to do it. Here is the thing, data is shown below, and I want to find 416 in the first column and print it out, how should I deal with it? Thank you very much! ab33 50S01S 958 279.068999 67.251013 -150.172544 67.250000... (5 Replies)
Discussion started by: handsonzhao
5 Replies

4. Shell Programming and Scripting

Finding indices in an array nearest to a set of values

I have an two arrays. One array BINDIST consists of fences. I have another array XOFFS. Eg BINDIST = 0 10 20 30 40 50 60 XOFFS = 2 3 4 23 25 28 55 58 I want to find to find the indices of values in XOFFS that are closest to each BINDIST. My idea is to do as follows I create array... (7 Replies)
Discussion started by: kristinu
7 Replies

5. Shell Programming and Scripting

Finding Maximum value in a column

Hello, I am trying to get a script to work which will find the maximum value of the fourth column and assign that value to all rows where the first three columns match. For example: 1111 2222 AAAA 0.3 3333 4444 BBBB 0.7 1111 2222 AAAA 0.9 1111 2222 AAAA 0.5 3333 4444 BBBB 0.4 should... (8 Replies)
Discussion started by: jaysean
8 Replies

6. Shell Programming and Scripting

Finding the second last column value from a text file

Can any one tell me how to get the second last column value from the text file, which has different record size for each record. I know how to get the last column using awk and print statements, but I am unable to get the second last column value from the file. (4 Replies)
Discussion started by: naveen_sangam
4 Replies

7. Shell Programming and Scripting

Help with finding a string and printing value in the next column

Hi, been about 10 years since I've scripted, so very rusty and could use some quick help. I have a file that contains data like such: folder1 jondoe owner janedoe reader joeshmo none folder2 jondoe none janedoe none joeshmo owner folder3 jondoe owner folder4 janedoe owner joeshmo... (7 Replies)
Discussion started by: drewpark
7 Replies

8. Shell Programming and Scripting

Finding the total of a column using awk

Here is my file name countries USSR 8650 262 Asia Canada 3852 24 North America China 3692 866 Asia USA 3615 219 North America Brazil 3286 116 South America India 1269 637 Asia Argentina 1072 ... (8 Replies)
Discussion started by: ironhead3fan
8 Replies

9. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies

10. UNIX for Dummies Questions & Answers

Finding a column in a flatfile

I have a file which is fixed width columns. This is an offset buffer - rather than space or tab delimited. There are upto about 8 columns and I need to get all of the column 5's values into another file. The problem is that because the delimiter is a space - and some fields are blank - the 5th... (3 Replies)
Discussion started by: peter.herlihy
3 Replies
Login or Register to Ask a Question