Comparison between 2 large lists with Getting VALUES from one into the other


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparison between 2 large lists with Getting VALUES from one into the other
# 1  
Old 07-18-2011
Comparison between 2 large lists with Getting VALUES from one into the other

hi,

I have 2 large lists:

LIST A: containes 6 fields of many entries (VARIABLE number), like:

2011-07-10 | 18:19:47 | 38037300 | 9647808003122 | 2 | success

LIST B: containes 3 fields & 183 entries (FIXED number), like:

9647805651885 9647805651885 SCP_10

What I want is a CODE for:

comparison for each number (say: X) in 7th field of "LIST A", over the complete range of entries in "LIST B", through
for-do-done loop inside another one, such that:

(a) If this X fulfil the inequality: NUM1< X < NUM2 ==> then lock on this entry & take the correspondent VAL (in LIST B), and
append it in the 7th field of LIST A (in front of the "X", within the same line)... AND break out the loop, to take another value
in LIST A (say: Y) to do the same comparison over the whole range of LIST B... and so on till completing all values of LIST A.

(b) Otherwise (if X does not lie between NUM1 & NUM2) ==> then render for another run of loop to check with the next entry, and
so on... till finding it... and BREAK out of loop... & return to LIST A to take another value.

(c) If not found over the whole range of LIST B ==> then append "NotFound"
in the 7th field of LIST A (in front of the "X", within the same line).

To end up with FINAL LIST A of 7 fields (the required VALues in the 7th field).

EX for SUCCESSFUL found VALues:

2011-07-10 | 18:19:47 | 38037300 | 9647808003122 | 2 | success | SCP_5

EX for NOT FOUND VALues:

2011-07-10 | 16:32:47 | 38045300 | 9647818553444 | 5 | success | NotFound



Thanks in advance.

BR,
Ahmed
# 2  
Old 07-20-2011
Any input guys?

Actually I'm surveying forums, & I think this can be done by NAWK.
I'm trying to figure it out, but your help is highly needed & appreciated.

BR,
Ahmed
# 3  
Old 07-20-2011
Not able to understand requirement as it is not clear. Please post sample input file and desire o/p file.
# 4  
Old 07-20-2011
Is there anything you have tried yourself so far? Also refrain bumping up your posts if you don't get an answer immediately - this is no Script-Drive-In. You don't have to pay so you don't have to demand.
Also please start using code tags when posting code, data or logs etc. to enhance readability and keep formatting etc.
# 5  
Old 07-20-2011
As far as I my understanding of your requirement

I assume LISTA and LISTB you mentioned are files.

Code:
while read line
 do
 c=0
 num=`echo $line| cut -d"|" -f4`
 while read line1
 do
 num1=`echo $line1 | cut -d" " -f1`
 num2=`echo $line1 | cut -d" " -f2`
 fie=`echo $line1 | cut -d" " -f3`
 if [ $num -ge $num1 -a $num -le $num2 -a $c -ne 1 ] ;
 then
 echo "$line | $fie"
 c=1
 fi
 done < LISTB
 if [ $c -eq 0 ];
 then
 echo "$line | not found"
 fi
 done < LISTA

# 6  
Old 07-20-2011
Respectively:

_Not exactly. I've seen similar case on forums, but I'm not sure.
However, I'm trying to tailor it to my case... & see.
_Noted.
_No Coment.
_Noted.

Tnx

---------- Post updated at 04:05 PM ---------- Previous update was at 01:57 PM ----------

Many Thanks panyam, it worked... & u r right in ur assumption (2 files).
But unfortunately, it is SLOW (given that I'm using a test LISTA - much less in size than the actual one).

However, I've seen the following case similar to mine in forums. But, unfortunately didn't work for my case (after making some modifications). I think this solution is nice to do.

Quote:
Sorry, I didn't elaborate on my problem. field1 and field2 from file1 are Legal IP addresses forming a section(e.g. from 111.111.111.0 to 111.111.111.255).
I want to get the 'location' filed from file1 given an IP address(the first field from file2) falling within the section.
The IP sections in file1 are sorted.
Code:
 
$ cat file1
41.138.128.0    41.138.159.255  location1
41.138.160.0    41.138.191.255  location2
41.138.192.0    41.138.207.255  location3
41.138.208.0    41.138.223.255  location4
41.138.224.0    41.138.239.255  location5
41.138.240.0    41.138.255.255  location6
41.138.32.0     41.138.63.255   location7
41.138.64.0     41.138.71.255   location8
41.138.72.0     41.138.79.255   location9
41.138.80.0     41.138.87.255   location10
$ cat file2
41.138.208.3    information
41.138.80.23    information
41.138.11.23    information
11.138.11.23    information

awk '
NR==FNR {split($1,s,".");        #file1
         split($2,e,".");
         a[NR]=$3;               #loc. 
         b[NR]=s[1] FS s[2];     #1st 2 digits of IP min
         c[NR]=s[3];             #3rd digit of IP min
         d[NR]=e[3];             #3rd digit of IP max
         i=NR;
         next
        }
{  split($1,ip,".")              #IP of file2
   for (j=1;j<=i;j++) 
      if (ip[1] FS ip[2]==b[j] && ip[3]>=c[j] && ip[3]<=d[j]) { print $0 FS a[j];break}
}' file1 file2

O/P:
41.138.208.3    information location4
41.138.80.23    information location10

Any ideas to make this work on my case, with the following simplification (samples below):

Code:
 
My LISTA:
 
2011-07-10  18:19:47  38037300   9647808003122   2   success
2011-07-10  18:19:47  38037307   9647800147864   2   success 
 
My LIST B:
 
9647805651885 9647805651885 SCP_10
9647812649216 9647812649216 SCP_12


Thanks in advance.
Ahmed

---------- Post updated at 04:08 PM ---------- Previous update was at 04:05 PM ----------

Many Thanks panyam, it worked... & u r right in ur assumption (2 files).
But unfortunately, it is SLOW (given that I'm using a test LISTA - much less in size than the actual one).

However, I've seen the following case similar to mine in forums. But, unfortunately didn't work for my case (after making some modifications). I think this solution is nice to do.


Quote:
Sorry, I didn't elaborate on my problem. field1 and field2 from file1 are Legal IP addresses forming a section(e.g. from 111.111.111.0 to 111.111.111.255).
I want to get the 'location' filed from file1 given an IP address(the first field from file2) falling within the section.
The IP sections in file1 are sorted.


Code:
$ cat file141.138.128.0 41.138.159.255 location141.138.160.0 41.138.191.255 location241.138.192.0 41.138.207.255 location341.138.208.0 41.138.223.255 location441.138.224.0 41.138.239.255 location541.138.240.0 41.138.255.255 location641.138.32.0 41.138.63.255 location741.138.64.0 41.138.71.255 location841.138.72.0 41.138.79.255 location941.138.80.0 41.138.87.255 location10$ cat file241.138.208.3 information41.138.80.23 information41.138.11.23 information11.138.11.23 informationawk 'NR==FNR {split($1,s,"."); #file1 split($2,e,"."); a[NR]=$3; #loc. b[NR]=s[1] FS s[2]; #1st 2 digits of IP min c[NR]=s[3]; #3rd digit of IP min d[NR]=e[3]; #3rd digit of IP max i=NR; next }{ split($1,ip,".") #IP of file2 for (j=1;j<=i;j++) if (ip[1] FS ip[2]==b[j] && ip[3]>=c[j] && ip[3]<=d[j]) { print $0 FS a[j];break}}' file1 file2O/P:41.138.208.3 information location441.138.80.23 information location10
Any ideas to make this work on my case, with the following simplification (samples below):



Code:
My LISTA: 2011-07-10 18:19:47 38037300 9647808003122 2 success2011-07-10 18:19:47 38037307 9647800147864 2 success My LIST B: 9647805651885 9647805651885 SCP_109647812649216 9647812649216 SCP_12


Thanks in advance.
Ahmed

Image

---------- Post updated at 04:11 PM ---------- Previous update was at 04:08 PM ----------

Many Thanks panyam, it worked... & u r right in ur assumption (2 files).
But unfortunately, it is SLOW (given that I'm using a test LISTA - much less in size than the actual one).

However, I've seen the following case similar to mine in forums. But, unfortunately didn't work for my case (after making some modifications). I think this solution is nice to do.


Quote:
Sorry, I didn't elaborate on my problem. field1 and field2 from file1 are Legal IP addresses forming a section(e.g. from 111.111.111.0 to 111.111.111.255).
I want to get the 'location' filed from file1 given an IP address(the first field from file2) falling within the section.
The IP sections in file1 are sorted.


Code:
$ cat file141.138.128.0 41.138.159.255 location141.138.160.0 41.138.191.255 location241.138.192.0 41.138.207.255 location341.138.208.0 41.138.223.255 location441.138.224.0 41.138.239.255 location541.138.240.0 41.138.255.255 location641.138.32.0 41.138.63.255 location741.138.64.0 41.138.71.255 location841.138.72.0 41.138.79.255 location941.138.80.0 41.138.87.255 location10$ cat file241.138.208.3 information41.138.80.23 information41.138.11.23 information11.138.11.23 informationawk 'NR==FNR {split($1,s,"."); #file1 split($2,e,"."); a[NR]=$3; #loc. b[NR]=s[1] FS s[2]; #1st 2 digits of IP min c[NR]=s[3]; #3rd digit of IP min d[NR]=e[3]; #3rd digit of IP max i=NR; next }{ split($1,ip,".") #IP of file2 for (j=1;j<=i;j++) if (ip[1] FS ip[2]==b[j] && ip[3]>=c[j] && ip[3]<=d[j]) { print $0 FS a[j];break}}' file1 file2O/P:41.138.208.3 information location441.138.80.23 information location10
Any ideas to make this work on my case, with the following simplification (samples below):



Code:
My LISTA: 2011-07-10 18:19:47 38037300 9647808003122 2 success2011-07-10 18:19:47 38037307 9647800147864 2 success My LIST B: 9647805651885 9647805651885 SCP_109647812649216 9647812649216 SCP_12


Thanks in advance.
Ahmed

Image


Quote:
Originally Posted by panyam
As far as I my understanding of your requirement

I assume LISTA and LISTB you mentioned are files.

Code:
while read line
 do
 c=0
 num=`echo $line| cut -d"|" -f4`
 while read line1
 do
 num1=`echo $line1 | cut -d" " -f1`
 num2=`echo $line1 | cut -d" " -f2`
 fie=`echo $line1 | cut -d" " -f3`
 if [ $num -ge $num1 -a $num -le $num2 -a $c -ne 1 ] ;
 then
 echo "$line | $fie"
 c=1
 fi
 done < LISTB
 if [ $c -eq 0 ];
 then
 echo "$line | not found"
 fi
 done < LISTA

# 7  
Old 07-21-2011
How about perl
Code:
#!/usr/bin/perl
open(LA,"<","listA") or die "canot open file listA\n";
open(LB,"<","listB") or die "canot open file listB\n";

while (<LA>) {
$flg=0;
chomp;
$line=$_;
$val=(split(/\|/))[3];
while (<LB>) {
        chomp;
        @flds=split;
        if ($val >= $flds[0] and $val <= $flds[1] ) {
                print $line,"|", $flds[2],"\n";
                $flg=1;
                last;
        }
}
if ($flg == 0) { print $line,"|Not Found\n"; }
}
close(LA);
close(LB);

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count Unique values from multiple lists of files

Looking for a little help here. I have 1000's of text files within a multiple folders. YYYY/ /MM /1000's Files Eg. 2014/01/1000 files 2014/02/1237 files 2014/03/1400 files There are folders for each year and each month, and within each monthly folder there are... (4 Replies)
Discussion started by: whegra
4 Replies

2. Shell Programming and Scripting

Simple comparison between two lists.

I have two lists (input) Alpha and Beta. Alpha: Beta: Need the output like this: I would like to get an output like this: Alpha vs Beta | -- | a=1 | |z=3 | z=4 | Is it possible ? :cool: (5 Replies)
Discussion started by: linuxadmin
5 Replies

3. Shell Programming and Scripting

Reading off values from a large file

Hi, I have a large output file (star.log), with many lines of the following type *** T vavg unburnt: 723.187 / burnt: 2662.000 What I would like to do is pick the values 723.187 and 2662.000 and What I've got so far is awk '/unburnt:.*burnt:/{Tu=$6;Tb=$NF}END{print Tu, Tb}'... (6 Replies)
Discussion started by: lost.identity
6 Replies

4. Shell Programming and Scripting

Comparison of floating point values in shell

Hi Everyone , Need a simple code here , I Have a number in a variable say $a=145.67 . This value changes everytime loop begins . I need to print a specific message as shown below when the above variable lies in a specific range i.e. 1.if $a lies within 100 and 200 , it should display... (2 Replies)
Discussion started by: robert89
2 Replies

5. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric... (2 Replies)
Discussion started by: davegen
2 Replies

6. Shell Programming and Scripting

csv 4 columns values comparison!

Hi all, i have a csv file which as the following data: 294;F03;2000;40441 294;F03;2000;40443 284;F01;5400;44051 284;F01;5700;45666 the file holds 11689 lines. I was trying to get a script running to output results from this file that for each line with the condition: if a line is found... (9 Replies)
Discussion started by: stryng
9 Replies

7. Shell Programming and Scripting

Shell Script to Create non-duplicate lists from two lists

File_A contains Strings: a b c d File_B contains Strings: a c z Need to have script written in either sh or ksh. Derive resultant files (File_New_A and File_New_B) from lists File_A and File_B where string elements in File_New_A and File_New_B are listed below. Resultant... (7 Replies)
Discussion started by: mlv_99
7 Replies

8. UNIX for Dummies Questions & Answers

compare 2 very large lists of different length

I have two very large datasets (>100MB) in a simple vertical list format. They are of different size and with different order and formatting (e.g. whitespace and some other minor cruft that would thwart easy regex). Let's call them set1 and set2. I want to check set2 to see if it contains... (2 Replies)
Discussion started by: uiop44
2 Replies

9. Shell Programming and Scripting

Help In Calculation of large values in loop

Hi Gurus, I am writing a shell script in which i need to strip out the numbers from file the values are unknown i. e. the range cannot be predicted.. and in my current program the sum of values is not coming as desired i think the value of calculation is crossing the range i.e. after some... (6 Replies)
Discussion started by: sandeepb
6 Replies

10. Shell Programming and Scripting

How to add two large values

Hi, Gives me wrong value when, $ echo `expr 2221753117 + 299363384` -1773850795 How to overcome this? Appreciate any help on this. -Om (5 Replies)
Discussion started by: Omkumar
5 Replies
Login or Register to Ask a Question