Creating a file with matching records from two other files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Creating a file with matching records from two other files
# 1  
Old 12-17-2009
Creating a file with matching records from two other files

Hi All,

I have 2 files (file1 & file2).

File1 and File2 have m and n columns respectively

I have to compare value in column1 of file1 with file2 and find line(s) from file2 matching column1 value.
The value can be in any column in the matching lines of file2.

The output should be written in a third file.

File3 should be like:

1st line from File1
Matching lines from file2

2nd line from file1
matching lines from file2

etc.

if some line from file1 does not have any matching records in file2, then it should not appear in file3.

eg. File1
Code:
1111111111,abcde,12.10.09,675069AG
2222222222,fghij,09.08.09,948p0

file2
Code:
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
erererer,3840.98,3333333333,11.12.09,uyeriery

file3
Code:
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

Thanks in Advance!!!Smilie

Last edited by Scott; 12-17-2009 at 09:34 AM.. Reason: Please use code tags
# 2  
Old 12-17-2009
Code:
gawk -F"," 'FNR==NR{a[$1]=$0;next}
{ for(o=1;o<=NF;o++){ 
       if($o in a) { 
              print a[$o];print 
              break 
       }
   }
}' file1 file2

# 3  
Old 12-17-2009
Code:
> file3
cat file1 | while read line
do
column1=`echo $line | awk -F"," ' { print $1 } '`
grep $column1 file2 > tmpfile
if [ $? -eq 0 ]; then 
echo $line >> file3
cat tmpfile >> file3
rm tmpfile
fi
done


Last edited by Scott; 12-17-2009 at 10:12 AM.. Reason: Code tags, please!
# 4  
Old 12-17-2009
hi

assumption :- the files are located at /home/akshay/temp/Scripts/

Code:
fileLoc="/home/akshay/temp/Scripts/"
while read line
do
        temp=$(echo $line | awk -F ',' '{print $1}')
        grep $temp $fileLoc/file2
        if [ $? -eq 0 ] ; then
                echo $line >> $fileLoc/file3
                grep "$temp" $fileLoc/file2 >> $fileLoc/file3
        fi
done < $fileLoc/file1

cheers
# 5  
Old 12-17-2009
an easy solution below ... you can use gawk,nawk or /usr/xpg4/bin/awk :-

Code:
gawk  '
NR==FNR { a[$1]=$0 ; next}
{ for (i=1;i<=NF;i++) {
        if ( $i in a ) { b[$i]=b[$i]"\n"$0 ; next }
     }
}
END{
for (i in a) { printf "%s %s\n\n" ,a[i],b[i]}     
}
' FS=","  File1.txt FS=","  File2.txt  > File3.txt

SmilieSmilieSmilieSmilieSmilie

---------- Post updated at 17:16 ---------- Previous update was at 16:58 ----------

Quote:
Originally Posted by ichigo
Code:
gawk -F"," 'FNR==NR{a[$1]=$0;next}
{ for(o=1;o<=NF;o++){ 
       if($o in a) { 
              print a[$o];print 
              break 
       }
   }
}' file1 file2

ichigo:-the above code will not give you the correct o/p as below in bold.
Code:
o/p:-
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
1111111111,abcde,12.10.09,675069AG
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1

Code:
but the desired o/p are:
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

# 6  
Old 12-17-2009
Ahmad, the order of the output is not fixed. When I run it using mawk I get:
Code:
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1



---------- Post updated at 21:55 ---------- Previous update was at 21:54 ----------

Alternative in shell :
Code:
while read line; do
  if match=$(grep "${line%%,*}" file2); then
    printf "$line\n$match\n"
  fi
done < file1 > file3

Output:
Code:
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

Not as efficient as awk though for large numbers but perhaps good enough.
# 7  
Old 12-18-2009
Ahamd..i am getting "-bash: gawk: command not found" error.
Please help me how to overcome this..Thanks


Quote:
Originally Posted by ahmad.diab
an easy solution below ... you can use gawk,nawk or /usr/xpg4/bin/awk :-

Code:
gawk  '
NR==FNR { a[$1]=$0 ; next}
{ for (i=1;i<=NF;i++) {
        if ( $i in a ) { b[$i]=b[$i]"\n"$0 ; next }
     }
}
END{
for (i in a) { printf "%s %s\n\n" ,a[i],b[i]}     
}
' FS=","  File1.txt FS=","  File2.txt  > File3.txt

SmilieSmilieSmilieSmilieSmilie

---------- Post updated at 17:16 ---------- Previous update was at 16:58 ----------



ichigo:-the above code will not give you the correct o/p as below in bold.
Code:
o/p:-
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
1111111111,abcde,12.10.09,675069AG
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1

Code:
but the desired o/p are:
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy



---------- Post updated at 02:20 AM ---------- Previous update was at 02:16 AM ----------

Hi Akshay,

Thanks for the code...it works good.
But can you give me some other code that is less time consuming.
All my files are several hundred MB's containing millions of records.
So opening up file and reading line by line take alot of time.
Do you have a workaround for this ???
Thanks.

Quote:
Originally Posted by akshay61286
hi

assumption :- the files are located at /home/akshay/temp/Scripts/

Code:
fileLoc="/home/akshay/temp/Scripts/"
while read line
do
        temp=$(echo $line | awk -F ',' '{print $1}')
        grep $temp $fileLoc/file2
        if [ $? -eq 0 ] ; then
                echo $line >> $fileLoc/file3
                grep "$temp" $fileLoc/file2 >> $fileLoc/file3
        fi
done < $fileLoc/file1

cheers
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records. My goal is that when $1 of... (2 Replies)
Discussion started by: jvoot
2 Replies

2. UNIX for Beginners Questions & Answers

Matching fields between two files, repeated records

In two previous posts (here) and (here), I received help from forum members comparing multiple fields across two files and selectively printing portions of each as output based upon would-be matches using awk. I had been fairly comfortable populating awk arrays with fields and using awk's special... (3 Replies)
Discussion started by: jvoot
3 Replies

3. Shell Programming and Scripting

How can I retrieve the matching records from data file mentioned?

XYZNA0000778800Z 16123000012300321000000008000000000000000 16124000012300322000000007000000000000000 17234000012300323000000005000000000000000 17345000012300324000000004000000000000000 17456000012300325000000003000000000000000 9 XYZNA0000778900Z 16123000012300321000000008000000000000000... (8 Replies)
Discussion started by: later_troy
8 Replies

4. Shell Programming and Scripting

Shell script to filter records in a zip file that contains matching columns from another file

Not sure if this is the correct forum for this question. I have two files. file1.zip, file2 Input: file1.zip col1, col2 , col3 a , b , 0:0:0:0:0:c436:9346:d40b x, y, 0:0:0:0:0:880:39f9:c9a7 m, n , 0:0:0:0:0:80c7:9161:fe00 file2.txt col1 c4:36:93:46:d4:0b... (1 Reply)
Discussion started by: anil.v
1 Replies

5. Shell Programming and Scripting

Performance of calculating total number of matching records in multiple files

Hello Friends, I've been trying to calculate total number of a certain match in multiple data records files (DRs). Let say I have a daily created folders for each day since the beginning of july like the following drwxrwxrwx 2 mmsuper med 65536 Jul 1 23:59 20150701 drwxrwxrwx 2 mmsuper... (1 Reply)
Discussion started by: EAGL€
1 Replies

6. Shell Programming and Scripting

Listing the file name and no of records in each files for the files created on a specific day

Hi, I want to display the file names and the record count for the files in the 2nd column for the files created today. i have written the below command which is listing the file names. but while piping the above command to the wc -l command its not working for me. ls -l... (5 Replies)
Discussion started by: Showdown
5 Replies

7. Shell Programming and Scripting

Creating single pattern for matching multiple files.

Hi friends, I have a some files in a directory. for example 856-abc 856-def 851-abc 945-def 956-abc 852-abc i want to display only those files whose name starts with 856* 945* and 851* using a single pattern. i.e 856-abc 856-def 851-abc 945-def the rest of the two files... (2 Replies)
Discussion started by: Little
2 Replies

8. Shell Programming and Scripting

Compare two files with different number of records and output only the Extra records from file1

Hi Freinds , I have 2 files . File 1 |nag|HYd|1|Che |esw|Gun|2|hyd |pra|bhe|3|hyd |omu|hei|4|bnsj |uer|oeri|5|uery File 2 |nag|HYd|1|Che |esw|Gun|2|hyd |uer|oi|3|uery output : (9 Replies)
Discussion started by: i150371485
9 Replies

9. Shell Programming and Scripting

Removing non matching records

Hi all I have a file with records starting with "Page" has a first column. some of the records have some other junk characters has first column. so pls help me to remove rows which is not having "Page" has a first column. Thanks, Baski (2 Replies)
Discussion started by: baskivs
2 Replies

10. UNIX for Dummies Questions & Answers

How can you delete records in a file matching a pattern?

I am curious if the following can be done in a file in unix. Let's say I have a flat file with the following data AAA,12,2,,,, BBB,3,1,,,, CCC,,,,, DDD,2,,,,, SQQ,,,,, ASJ,,3,5 I only want to capture the data with values into a new file. If the data contains the pattern ,,,,, as in... (2 Replies)
Discussion started by: mode09
2 Replies
Login or Register to Ask a Question