Lookup first high value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Lookup first high value
# 1  
Old 02-01-2016
Lookup first high value

Hello experts,

I have a file looking like

Code:
v1	g1	5.42
v2	g1	2.43
v1	g2	1.24
v3	g2	0.6

I want to lookup the first value in another sorted table which is greater than column 3 value, keying on column 2 on the first table.

The sorted table looks like the following,. where I want to find out the first value in column 3 which is greater than the value in the 3rd column of the first file, based on col2 in 1st file,and return the corresponding value in col2 of sorted file.

Code:
g1	1	1.1
g1	2	1.2
g1	3	1.3
g1	4	1.4
g1	5	1.5
g1	6	1.6
g1	7	1.7
g1	8	1.8
g1	9	1.9
g1	10	2
g1	11	2.1
g1	12	2.2
g1	13	2.3
g1	14	2.4
g1	15	2.5
g1	16	2.6
g1	17	2.7
g1	18	2.8
g1	19	2.9
g1	20	3
g1	21	3.1
g1	22	3.2
g1	23	3.3
g1	24	3.4
g1	25	3.5
g1	26	3.6
g1	27	3.7
g1	28	3.8
g1	29	3.9
g1	30	4
g1	31	4.1
g1	32	4.2
g1	33	4.3
g1	34	4.4
g1	35	4.5
g1	36	4.6
g1	37	4.7
g1	38	4.8
g1	39	4.9
g1	40	5
g1	41	5.1
g1	42	5.2
g1	43	5.3
g1	44	5.4
g1	45	5.5
g1	46	5.6
g1	47	5.7
g1	48	5.8
g1	49	5.9
g1	50	6
g2	1	0
g2	2	0.2
g2	3	0.4
g2	4	0.6
g2	5	0.8
g2	6	1
g2	7	1.2
g2	8	1.4
g2	9	1.6
g2	10	1.8
g2	11	2
g2	12	2.2
g2	13	2.4
g2	14	2.6
g2	15	2.8
g2	16	3
g2	17	3.2
g2	18	3.4
g2	19	3.6
g2	20	3.8
g2	21	4

The desired output is

Code:
v1	g1	5.42	45
v2	g1	2.43	15
v1	g2	1.24	8
v3	g2	0.6	5


This can be achieved by joining pretty easily, but the issue is the sorted file is huge having 500 million records and joining takes a long time.

Can there be an awk solution which is faster?

This works

Code:
 sort -k2,2 file1 -o file1
  
 sort -k1,1 -k3,3n sortedfile -o tmp

 join -1 2 -2 1 -o 1.1 1.2 1.3 2.2 2.3  file1 tmp | awk '$4 < $5 && !a[$1$2]++' > result

Please note that the second column in sorted file is not a serial number, it is just shown like this as an example.

Thanks for your help.
# 2  
Old 02-01-2016
How come that with -o 1.1 1.2 1.3 2.2 2.3 the desired output has just 4 columns?
# 3  
Old 02-01-2016
Quote:
Originally Posted by RudiC
How come that with -o 1.1 1.2 1.3 2.2 2.3 the desired output has just 4 columns?
You are right, I was just copying and pasting from the actual data and not the demo one.

Here it is

Code:
join -1 2 -2 1 -o 1.1 1.2 1.3 2.2 2.3  file1 sortedfile | awk '$3 < $5 && !a[$1$2]++{print  $1,$2,$3,$4}'
v1 g1 5.42 45
v2 g1 2.43 15
v1 g2 1.24 8
v3 g2 0.6 5

# 4  
Old 02-01-2016
Would this help?
Code:
awk '
BEGIN           {FS = OFS = SUBSEP = "\t"
                }
FNR == NR       {K[$1]
                 V[$1,$2] = $3
                 next
                }
                {for (k in K) if (V[k,$1] && V[k,$1] < $3)      {print k, $1, V[k,$1], $2
                                                                 delete V[k,$1]
                                                                }
                }
' file[12]
v2      g1      2.43    15
v1      g1      5.42    45
v3      g2      0.6     5
v1      g2      1.24    8

Not sure if it will be faster than join, though...
These 2 Users Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Lookup Within a file

A text file has two logs of same event & both logs have to be correlated into a single line before same can be processed further RAW OPERATING LOG Date Month Year Time Event 10 JAN 2014 1000 4 11 APR 2013 1230 2 12 FEB 2014 ... (7 Replies)
Discussion started by: newageBATMAN
7 Replies

2. Shell Programming and Scripting

Lookup name from another file

Hi All, I want to lookup name for an id in col2 input from another file and add the name to each line. Input 1 comp100001_c0_seq1 At1g31340 30.40 569 384 11 3 1673 313 834 7e-62 237 comp100003_c0_seq1 At1g35370_2 35.00 80 50 ... (7 Replies)
Discussion started by: gina.lizar
7 Replies

3. Shell Programming and Scripting

Lookup file

Hi, need your help to lookup these 2 files main.txt RNPMS01,PMS717W_Marasi,CXP9016141/1_R7G04,EXECUTING RNPMS01,RAP765W_BakaranBatu,CXP9014346/1_R6AG03,EXECUTING RNPMS01,RNPMS01,CXP9014711/2_R5Z,EXECUTING RNPMS01,TBT510W_Bandar_Utama,CXP9014346/1_R6AG03,EXECUTING... (8 Replies)
Discussion started by: singgih
8 Replies

4. Shell Programming and Scripting

what would a script include to find CPU's %system time high and user time high?

Hi , I am trying to :wall: my head while scripting ..I am really new to this stuff , never did it before :( . how to find cpu's system high time and user time high in a script?? thanks , help would be appreciated ! :) (9 Replies)
Discussion started by: sushwey
9 Replies

5. Shell Programming and Scripting

Ip Location Lookup

Hello everybody; I am working on a script about ip location lookup. I 've a decimal list dec_list: 16777216,17367039,AU,AUS,AUSTRALIA 17367040,17432575,MY,MYS,MALAYSIA 17432576,17498111,AU,AUS,AUSTRALIA 17498112,17563647,KR,KOR,REPUBLIC OF KOREA 17563648,17825791,CN,CHN,CHINA... (2 Replies)
Discussion started by: mustafayilmaz
2 Replies

6. Red Hat

apache high cpu load on high traffic

i have a Intel Quad Core Xeon X3440 (4 x 2.53GHz, 8MB Cache, Hyper Threaded) with 16gig and 1tb harddrive with a 1gb port and my apache is causing my cpu to go up to 100% on all four cores heres my http.config <IfModule prefork.c> StartServers 10 MinSpareServers 10 MaxSpareServers 15... (4 Replies)
Discussion started by: awww
4 Replies

7. Shell Programming and Scripting

lookup

I have a lookup file in unix say /data/lkp.dat (First line is header and space delimited) and the content is shown below. Another file which contains the job_name and rec_count lets say /data/data_file.dat(no header pipe delimited file). Now i want to do a lookup on job_name and my output should... (3 Replies)
Discussion started by: dr46014
3 Replies

8. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

9. UNIX for Advanced & Expert Users

Sun: High kernel usage & very high load averages

Hi, I am seeing very high kernel usage and very high load averages on my system (Although we are not loading much data to our database). Here is the output of top...does anyone know what i should be looking at? Thanks, Lorraine last pid: 13144; load averages: 22.32, 19.81, 16.78 ... (4 Replies)
Discussion started by: lorrainenineill
4 Replies
Login or Register to Ask a Question