Sponsored Content
Full Discussion: Lookup first high value
Top Forums Shell Programming and Scripting Lookup first high value Post 302965625 by sheetalk on Monday 1st of February 2016 10:32:07 AM
Old 02-01-2016
Lookup first high value

Hello experts,

I have a file looking like

Code:
v1	g1	5.42
v2	g1	2.43
v1	g2	1.24
v3	g2	0.6

I want to lookup the first value in another sorted table which is greater than column 3 value, keying on column 2 on the first table.

The sorted table looks like the following,. where I want to find out the first value in column 3 which is greater than the value in the 3rd column of the first file, based on col2 in 1st file,and return the corresponding value in col2 of sorted file.

Code:
g1	1	1.1
g1	2	1.2
g1	3	1.3
g1	4	1.4
g1	5	1.5
g1	6	1.6
g1	7	1.7
g1	8	1.8
g1	9	1.9
g1	10	2
g1	11	2.1
g1	12	2.2
g1	13	2.3
g1	14	2.4
g1	15	2.5
g1	16	2.6
g1	17	2.7
g1	18	2.8
g1	19	2.9
g1	20	3
g1	21	3.1
g1	22	3.2
g1	23	3.3
g1	24	3.4
g1	25	3.5
g1	26	3.6
g1	27	3.7
g1	28	3.8
g1	29	3.9
g1	30	4
g1	31	4.1
g1	32	4.2
g1	33	4.3
g1	34	4.4
g1	35	4.5
g1	36	4.6
g1	37	4.7
g1	38	4.8
g1	39	4.9
g1	40	5
g1	41	5.1
g1	42	5.2
g1	43	5.3
g1	44	5.4
g1	45	5.5
g1	46	5.6
g1	47	5.7
g1	48	5.8
g1	49	5.9
g1	50	6
g2	1	0
g2	2	0.2
g2	3	0.4
g2	4	0.6
g2	5	0.8
g2	6	1
g2	7	1.2
g2	8	1.4
g2	9	1.6
g2	10	1.8
g2	11	2
g2	12	2.2
g2	13	2.4
g2	14	2.6
g2	15	2.8
g2	16	3
g2	17	3.2
g2	18	3.4
g2	19	3.6
g2	20	3.8
g2	21	4

The desired output is

Code:
v1	g1	5.42	45
v2	g1	2.43	15
v1	g2	1.24	8
v3	g2	0.6	5


This can be achieved by joining pretty easily, but the issue is the sorted file is huge having 500 million records and joining takes a long time.

Can there be an awk solution which is faster?

This works

Code:
 sort -k2,2 file1 -o file1
  
 sort -k1,1 -k3,3n sortedfile -o tmp

 join -1 2 -2 1 -o 1.1 1.2 1.3 2.2 2.3  file1 tmp | awk '$4 < $5 && !a[$1$2]++' > result

Please note that the second column in sorted file is not a serial number, it is just shown like this as an example.

Thanks for your help.
 

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Sun: High kernel usage & very high load averages

Hi, I am seeing very high kernel usage and very high load averages on my system (Although we are not loading much data to our database). Here is the output of top...does anyone know what i should be looking at? Thanks, Lorraine last pid: 13144; load averages: 22.32, 19.81, 16.78 ... (4 Replies)
Discussion started by: lorrainenineill
4 Replies

2. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

3. Shell Programming and Scripting

lookup

I have a lookup file in unix say /data/lkp.dat (First line is header and space delimited) and the content is shown below. Another file which contains the job_name and rec_count lets say /data/data_file.dat(no header pipe delimited file). Now i want to do a lookup on job_name and my output should... (3 Replies)
Discussion started by: dr46014
3 Replies

4. Red Hat

apache high cpu load on high traffic

i have a Intel Quad Core Xeon X3440 (4 x 2.53GHz, 8MB Cache, Hyper Threaded) with 16gig and 1tb harddrive with a 1gb port and my apache is causing my cpu to go up to 100% on all four cores heres my http.config <IfModule prefork.c> StartServers 10 MinSpareServers 10 MaxSpareServers 15... (4 Replies)
Discussion started by: awww
4 Replies

5. Shell Programming and Scripting

Ip Location Lookup

Hello everybody; I am working on a script about ip location lookup. I 've a decimal list dec_list: 16777216,17367039,AU,AUS,AUSTRALIA 17367040,17432575,MY,MYS,MALAYSIA 17432576,17498111,AU,AUS,AUSTRALIA 17498112,17563647,KR,KOR,REPUBLIC OF KOREA 17563648,17825791,CN,CHN,CHINA... (2 Replies)
Discussion started by: mustafayilmaz
2 Replies

6. Shell Programming and Scripting

what would a script include to find CPU's %system time high and user time high?

Hi , I am trying to :wall: my head while scripting ..I am really new to this stuff , never did it before :( . how to find cpu's system high time and user time high in a script?? thanks , help would be appreciated ! :) (9 Replies)
Discussion started by: sushwey
9 Replies

7. Shell Programming and Scripting

Lookup file

Hi, need your help to lookup these 2 files main.txt RNPMS01,PMS717W_Marasi,CXP9016141/1_R7G04,EXECUTING RNPMS01,RAP765W_BakaranBatu,CXP9014346/1_R6AG03,EXECUTING RNPMS01,RNPMS01,CXP9014711/2_R5Z,EXECUTING RNPMS01,TBT510W_Bandar_Utama,CXP9014346/1_R6AG03,EXECUTING... (8 Replies)
Discussion started by: singgih
8 Replies

8. Shell Programming and Scripting

Lookup name from another file

Hi All, I want to lookup name for an id in col2 input from another file and add the name to each line. Input 1 comp100001_c0_seq1 At1g31340 30.40 569 384 11 3 1673 313 834 7e-62 237 comp100003_c0_seq1 At1g35370_2 35.00 80 50 ... (7 Replies)
Discussion started by: gina.lizar
7 Replies

9. Shell Programming and Scripting

Lookup Within a file

A text file has two logs of same event & both logs have to be correlated into a single line before same can be processed further RAW OPERATING LOG Date Month Year Time Event 10 JAN 2014 1000 4 11 APR 2013 1230 2 12 FEB 2014 ... (7 Replies)
Discussion started by: newageBATMAN
7 Replies
COMM(1) 						    BSD General Commands Manual 						   COMM(1)

NAME
comm -- select or reject lines common to two files SYNOPSIS
comm [-123f] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which should be sorted lexically, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. The filename ``-'' means the standard input. The following options are available: -1 Suppress printing of column 1. -2 Suppress printing of column 2. -3 Suppress printing of column 3. -f Fold case in line comparisons. Each column will have a number of tab characters prepended to it equal to the number of lower numbered columns that are being printed. For example, if column number two is being suppressed, lines printed in column number one will not have any tabs preceding them, and lines printed in column number three will have one. comm assumes that the files are lexically sorted; all characters participate in line comparisons. EXIT STATUS
comm exits 0 on success, >0 if an error occurred. SEE ALSO
cmp(1), diff(1), sort(1), uniq(1) STANDARDS
The comm utility conforms to IEEE Std 1003.2-1992 (``POSIX.2''). BSD
June 6, 1993 BSD
All times are GMT -4. The time now is 01:28 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy