getting data from one file based on another


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers getting data from one file based on another
# 8  
Old 01-25-2011
Many thanks for the example. Unfortunately, it doesn't work. It returns file2 unchanged. Sorry...

As for the files - the values of columns 2 and 3 in file1 may overlap, this is why I need to match the files according to column1 first. Also, the files are not sorted, but I can do that before running a script.

Many thanks once again!
# 9  
Old 01-25-2011
Quote:
Originally Posted by zajtat
As for the files - the values of columns 2 and 3 in file1 may overlap....
Why don't you give a more realistic example of your input file and the desired output?
# 10  
Old 01-25-2011
Quick hack at it:
Code:
 
#!/usr/bin/ksh
 
cat file1 | while read a1 a2 a3 a4
do
 cat file2 | while read b1 b2 b3
 do
   if (( $b1 == $a1 && $b2 >= $a2 && $b2 <= $a3 ))
   then
     . . .
   fi
 done
done

Generates new file on stdout.
# 11  
Old 01-25-2011
I am really sorry for the confusion. Here I will try to make a new, more clear example:

file1

HTML Code:
1 1234 1240 abc
1 5678 9078 cdf
6 1223 4560 ogh
6 4567 100679 pog
8 2435 6789 kho
file2

HTML Code:
1 1235 col1
1 3456 col2
1 7890 col3
1 5679 col4
I need to match column1 in file2 to column1 if file2 first, then see if the number in column2 file2 is between the values of column2 and 3 of file1; and if so, ass column4 of file1 as an additional column to file2.
For example, in this file2, line1 - the number in column2 of file2, line1 is 1235, which is in the range of 1234 and 1240 of line1, columns 2 and 3 of file1; but it is also in the range of 1234 and 4560 of line3, columns 2 and 3 of file1. Because the first column of line 1, file2 matches the first column of line1, file1, the added value should be abc (and not ogh of line3, because the value of column1, line3, file1 does not match the value of column1, line1, file2). The output file should be like this:

HTML Code:
1 1235 col1 abc
1 3456 col2
1 7890 col3
1 5679 col4 cdf
What I meant by overlaping values in columns 2 and 3 of file1, is that the region covered by the first line (from 1234 to 1240) is also the part of the region in line3 (from 1223 to 4560), but they've got different values in column1 (1 and 6).

The values in columns 1 and 2 of file1 are sorted ascending. The values in columns1 and 2 of file2 are also sorted ascending.

Hope this is more clear and apologies for the confusion.

Many thanks for spending time on this!!!!

---------- Post updated at 03:57 PM ---------- Previous update was at 03:50 PM ----------

Sorry, I've noticed the typo in this paragraph. Here is the correct one:

For example, in this file2, line1 - the number in column2 of file2, line1 is 1235, which is in the range of 1234 and 1240 of line1, columns 2 and 3 of file1; but it is also in the range of 1223 and 4560 of line3, columns 2 and 3 of file1. Because the first column of line 1, file2 matches the first column of line1, file1, the added value should be abc (and not ogh of line3, because the value of column1, line3, file1 does not match the value of column1, line1, file2). The output file should be like this
# 12  
Old 01-26-2011
Well, my loops do not exploit the sorted order, but they are sufficient, simple and sufficient.

To exploit the sort, you need to know any sides that are many-to-, so after a match you read the many file first looking for more matches. If many to many, then you still cannot solve with a simple merge. That is why the join command needs flat files -- it does seeks back to first line of key value.
# 13  
Old 01-26-2011
Quote:
Originally Posted by DGPickett
Quick hack at it:
Code:
 
#!/usr/bin/ksh
 
cat file1 | while read a1 a2 a3 a4
do
 cat file2 | while read b1 b2 b3
 do
   if (( $b1 == $a1 && $b2 >= $a2 && $b2 <= $a3 ))
   then
     . . .
   fi
 done
done

Generates new file on stdout.
why the need for cat? And this solution is O(n^2). Worse when the files are big.
# 14  
Old 01-26-2011
Well, sometimes N^2 is low enough, and it is simple. The cat's are just me, too lazy to move the input down to a '<file1'; I think left to right when scripting! If the cats add enough overhead to matter, there is no hope for the n^2 !

Now, who has a solution with associative array scripting? Will that work with many to many?

A SQL solution would be great, too, but text tools like better record boundaries, like one row per line.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies

2. UNIX for Beginners Questions & Answers

Create file based on data from two other files

I have looked through several threads regarding merging files with awk and attempted using join however have been unsuccessful likely as I do not fully understand awk. What I am attempting is to take a csv file which could be between 1 and 15,000 lines with 5 colums and another csv file that will... (4 Replies)
Discussion started by: cdubu2
4 Replies

3. UNIX for Dummies Questions & Answers

Extracting data from one file, based on another file (splitting)

Dear All, I have two files but want to extract data from one based on another... can you please help me file 1 David Tom Ellen and file 2 David|0010|testnamez|resultsz David|0004|testnamex|resultsx Tom|0010|testnamez|resultsz Tom|0004|testnamex|resultsx Ellen|0010|testnamez|resultsz... (12 Replies)
Discussion started by: A-V
12 Replies

4. Shell Programming and Scripting

Generate tabular data based on a column value from an existing data file

Hi, I have a data file with : 01/28/2012,1,1,98995 01/28/2012,1,2,7195 01/29/2012,1,1,98995 01/29/2012,1,2,7195 01/30/2012,1,1,98896 01/30/2012,1,2,7083 01/31/2012,1,1,98896 01/31/2012,1,2,7083 02/01/2012,1,1,98896 02/01/2012,1,2,7083 02/02/2012,1,1,98899 02/02/2012,1,2,7083 I... (1 Reply)
Discussion started by: himanish
1 Replies

5. UNIX for Dummies Questions & Answers

Sorting data in file based on field in another file

Hi, I have two files, one of which I would like to sort based on the order of the data in the second. I would like to do this using a simple unix statement. My two files as follows: File 1: 12345 1 2 2 2 0 0 12349 0 0 2 2 1 2 12350 1 2 1 2 2 2 . . . File2: 12350... (3 Replies)
Discussion started by: kasan0
3 Replies

6. Shell Programming and Scripting

Extracting specific lines of data from a file and related lines of data based on a grep value range?

Hi, I have one file, say file 1, that has data like below where 19900107 is the date, 19900107 12 144 129 0.7380047 19900108 12 168 129 0.3149017 19900109 12 192 129 3.2766666E-02 ... (3 Replies)
Discussion started by: Wynner
3 Replies

7. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

8. Shell Programming and Scripting

Delete line in file based on data in another file

Hi there I would like to create a shell script to do the following: - delete a line in file1 if it contains the data string in file2 eg: file1 1 100109942004051510601703694 0.00 0.00 2 100109942004051510601702326 0.00 0.00 3 ... (1 Reply)
Discussion started by: earth_goddess
1 Replies

9. UNIX for Dummies Questions & Answers

Rename file based on first 3 characters of data in file

I'm looking to determine if I can use a grep command to read file and rename the file based on the first 3 characters of the data in the file. An example is: Read FileA If the first 3 positions of the data in the file are "ITP", then rename the file as FileA_ITP, else if the first 3... (3 Replies)
Discussion started by: jchappel
3 Replies

10. Shell Programming and Scripting

Extracting data from text file based on configuration set in config file

Hi , a:) i have configuration file with pattren <Range start no>,<Range end no>,<type of records to be extracted from the data file>,<name of the file to store output> eg: myfile.confg 9899000000,9899999999,DATA,b.dat 9899000000,9899999999,SMS,a.dat b:) Stucture of my data file is... (3 Replies)
Discussion started by: suparnbector
3 Replies
Login or Register to Ask a Question