## Comparison between 2 large lists with Getting VALUES from one into the other

Comparison between 2 large lists with Getting VALUES from one into the other
# 1
07-18-2011
Comparison between 2 large lists with Getting VALUES from one into the other

hi,

I have 2 large lists:

LIST A: containes 6 fields of many entries (VARIABLE number), like:

2011-07-10 | 18:19:47 | 38037300 | 9647808003122 | 2 | success

LIST B: containes 3 fields & 183 entries (FIXED number), like:

9647805651885 9647805651885 SCP_10

What I want is a CODE for:

comparison for each number (say: X) in 7th field of "LIST A", over the complete range of entries in "LIST B", through
for-do-done loop inside another one, such that:

(a) If this X fulfil the inequality: NUM1< X < NUM2 ==> then lock on this entry & take the correspondent VAL (in LIST B), and
append it in the 7th field of LIST A (in front of the "X", within the same line)... AND break out the loop, to take another value
in LIST A (say: Y) to do the same comparison over the whole range of LIST B... and so on till completing all values of LIST A.

(b) Otherwise (if X does not lie between NUM1 & NUM2) ==> then render for another run of loop to check with the next entry, and
so on... till finding it... and BREAK out of loop... & return to LIST A to take another value.

(c) If not found over the whole range of LIST B ==> then append "NotFound"
in the 7th field of LIST A (in front of the "X", within the same line).

To end up with FINAL LIST A of 7 fields (the required VALues in the 7th field).

EX for SUCCESSFUL found VALues:

2011-07-10 | 18:19:47 | 38037300 | 9647808003122 | 2 | success | SCP_5

2011-07-10 | 16:32:47 | 38045300 | 9647818553444 | 5 | success | NotFound

BR,
Ahmed
 amurib View Public Profile for amurib Find all posts by amurib
# 2
07-20-2011
Any input guys?

Actually I'm surveying forums, & I think this can be done by NAWK.
I'm trying to figure it out, but your help is highly needed & appreciated.

BR,
Ahmed
 amurib View Public Profile for amurib Find all posts by amurib
# 3
07-20-2011
Not able to understand requirement as it is not clear. Please post sample input file and desire o/p file.
 pravin27 View Public Profile for pravin27 Find all posts by pravin27
# 4
07-20-2011
Is there anything you have tried yourself so far? Also refrain bumping up your posts if you don't get an answer immediately - this is no Script-Drive-In. You don't have to pay so you don't have to demand.
Also please start using code tags when posting code, data or logs etc. to enhance readability and keep formatting etc.
 zaxxon View Public Profile for zaxxon Find all posts by zaxxon
# 5
07-20-2011
As far as I my understanding of your requirement

I assume LISTA and LISTB you mentioned are files.

 panyam View Public Profile for panyam Find all posts by panyam
# 6
07-20-2011
Respectively:

_Not exactly. I've seen similar case on forums, but I'm not sure.
However, I'm trying to tailor it to my case... & see.
_Noted.
_No Coment.
_Noted.

Tnx

---------- Post updated at 04:05 PM ---------- Previous update was at 01:57 PM ----------

Many Thanks panyam, it worked... & u r right in ur assumption (2 files).
But unfortunately, it is SLOW (given that I'm using a test LISTA - much less in size than the actual one).

However, I've seen the following case similar to mine in forums. But, unfortunately didn't work for my case (after making some modifications). I think this solution is nice to do.

Quote:
Sorry, I didn't elaborate on my problem. field1 and field2 from file1 are Legal IP addresses forming a section(e.g. from 111.111.111.0 to 111.111.111.255).
I want to get the 'location' filed from file1 given an IP address(the first field from file2) falling within the section.
The IP sections in file1 are sorted.
Any ideas to make this work on my case, with the following simplification (samples below):

Ahmed

---------- Post updated at 04:08 PM ---------- Previous update was at 04:05 PM ----------

Many Thanks panyam, it worked... & u r right in ur assumption (2 files).
But unfortunately, it is SLOW (given that I'm using a test LISTA - much less in size than the actual one).

However, I've seen the following case similar to mine in forums. But, unfortunately didn't work for my case (after making some modifications). I think this solution is nice to do.

Quote:
Sorry, I didn't elaborate on my problem. field1 and field2 from file1 are Legal IP addresses forming a section(e.g. from 111.111.111.0 to 111.111.111.255).
I want to get the 'location' filed from file1 given an IP address(the first field from file2) falling within the section.
The IP sections in file1 are sorted.

Code:
\$ cat file141.138.128.0 41.138.159.255 location141.138.160.0 41.138.191.255 location241.138.192.0 41.138.207.255 location341.138.208.0 41.138.223.255 location441.138.224.0 41.138.239.255 location541.138.240.0 41.138.255.255 location641.138.32.0 41.138.63.255 location741.138.64.0 41.138.71.255 location841.138.72.0 41.138.79.255 location941.138.80.0 41.138.87.255 location10\$ cat file241.138.208.3 information41.138.80.23 information41.138.11.23 information11.138.11.23 informationawk 'NR==FNR {split(\$1,s,"."); #file1 split(\$2,e,"."); a[NR]=\$3; #loc. b[NR]=s[1] FS s[2]; #1st 2 digits of IP min c[NR]=s[3]; #3rd digit of IP min d[NR]=e[3]; #3rd digit of IP max i=NR; next }{ split(\$1,ip,".") #IP of file2 for (j=1;j<=i;j++) if (ip[1] FS ip[2]==b[j] && ip[3]>=c[j] && ip[3]<=d[j]) { print \$0 FS a[j];break}}' file1 file2O/P:41.138.208.3 information location441.138.80.23 information location10
Any ideas to make this work on my case, with the following simplification (samples below):

Code:
My LISTA: 2011-07-10 18:19:47 38037300 9647808003122 2 success2011-07-10 18:19:47 38037307 9647800147864 2 success My LIST B: 9647805651885 9647805651885 SCP_109647812649216 9647812649216 SCP_12

Ahmed

---------- Post updated at 04:11 PM ---------- Previous update was at 04:08 PM ----------

Many Thanks panyam, it worked... & u r right in ur assumption (2 files).
But unfortunately, it is SLOW (given that I'm using a test LISTA - much less in size than the actual one).

However, I've seen the following case similar to mine in forums. But, unfortunately didn't work for my case (after making some modifications). I think this solution is nice to do.

Quote:
Sorry, I didn't elaborate on my problem. field1 and field2 from file1 are Legal IP addresses forming a section(e.g. from 111.111.111.0 to 111.111.111.255).
I want to get the 'location' filed from file1 given an IP address(the first field from file2) falling within the section.
The IP sections in file1 are sorted.

Code:
\$ cat file141.138.128.0 41.138.159.255 location141.138.160.0 41.138.191.255 location241.138.192.0 41.138.207.255 location341.138.208.0 41.138.223.255 location441.138.224.0 41.138.239.255 location541.138.240.0 41.138.255.255 location641.138.32.0 41.138.63.255 location741.138.64.0 41.138.71.255 location841.138.72.0 41.138.79.255 location941.138.80.0 41.138.87.255 location10\$ cat file241.138.208.3 information41.138.80.23 information41.138.11.23 information11.138.11.23 informationawk 'NR==FNR {split(\$1,s,"."); #file1 split(\$2,e,"."); a[NR]=\$3; #loc. b[NR]=s[1] FS s[2]; #1st 2 digits of IP min c[NR]=s[3]; #3rd digit of IP min d[NR]=e[3]; #3rd digit of IP max i=NR; next }{ split(\$1,ip,".") #IP of file2 for (j=1;j<=i;j++) if (ip[1] FS ip[2]==b[j] && ip[3]>=c[j] && ip[3]<=d[j]) { print \$0 FS a[j];break}}' file1 file2O/P:41.138.208.3 information location441.138.80.23 information location10
Any ideas to make this work on my case, with the following simplification (samples below):

Code:
My LISTA: 2011-07-10 18:19:47 38037300 9647808003122 2 success2011-07-10 18:19:47 38037307 9647800147864 2 success My LIST B: 9647805651885 9647805651885 SCP_109647812649216 9647812649216 SCP_12

Ahmed

Quote:
Originally Posted by panyam
As far as I my understanding of your requirement

I assume LISTA and LISTB you mentioned are files.

 amurib View Public Profile for amurib Find all posts by amurib
# 7
07-21-2011
 pravin27 View Public Profile for pravin27 Find all posts by pravin27

## Count Unique values from multiple lists of files

Looking for a little help here. I have 1000's of text files within a multiple folders. YYYY/ /MM /1000's Files Eg. 2014/01/1000 files 2014/02/1237 files 2014/03/1400 files There are folders for each year and each month, and within each monthly folder there are...

## Simple comparison between two lists.

I have two lists (input) Alpha and Beta. Alpha: Beta: Need the output like this: I would like to get an output like this: Alpha vs Beta | -- | a=1 | |z=3 | z=4 | Is it possible ? :cool:

## Reading off values from a large file

Hi, I have a large output file (star.log), with many lines of the following type *** T vavg unburnt: 723.187 / burnt: 2662.000 What I would like to do is pick the values 723.187 and 2662.000 and What I've got so far is awk '/unburnt:.*burnt:/{Tu=\$6;Tb=\$NF}END{print Tu, Tb}'...

## Comparison of floating point values in shell

Hi Everyone , Need a simple code here , I Have a number in a variable say \$a=145.67 . This value changes everytime loop begins . I need to print a specific message as shown below when the above variable lies in a specific range i.e. 1.if \$a lies within 100 and 200 , it should display...

## How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric...

## csv 4 columns values comparison!

Hi all, i have a csv file which as the following data: 294;F03;2000;40441 294;F03;2000;40443 284;F01;5400;44051 284;F01;5700;45666 the file holds 11689 lines. I was trying to get a script running to output results from this file that for each line with the condition: if a line is found...

## Shell Script to Create non-duplicate lists from two lists

File_A contains Strings: a b c d File_B contains Strings: a c z Need to have script written in either sh or ksh. Derive resultant files (File_New_A and File_New_B) from lists File_A and File_B where string elements in File_New_A and File_New_B are listed below. Resultant...

## compare 2 very large lists of different length

I have two very large datasets (>100MB) in a simple vertical list format. They are of different size and with different order and formatting (e.g. whitespace and some other minor cruft that would thwart easy regex). Let's call them set1 and set2. I want to check set2 to see if it contains...

## Help In Calculation of large values in loop

Hi Gurus, I am writing a shell script in which i need to strip out the numbers from file the values are unknown i. e. the range cannot be predicted.. and in my current program the sum of values is not coming as desired i think the value of calculation is crossing the range i.e. after some...

## How to add two large values

Hi, Gives me wrong value when, \$ echo `expr 2221753117 + 299363384` -1773850795 How to overcome this? Appreciate any help on this. -Om