Creating a file with matching records from two other files

12-17-2009

Registered User

2, 0

Join Date: Dec 2009

Last Activity: 4 January 2010, 12:13 AM EST

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

Creating a file with matching records from two other files

Hi All,

I have 2 files (file1 & file2).

File1 and File2 have m and n columns respectively

I have to compare value in column1 of file1 with file2 and find line(s) from file2 matching column1 value.
The value can be in any column in the matching lines of file2.

The output should be written in a third file.

File3 should be like:

1st line from File1
Matching lines from file2

2nd line from file1
matching lines from file2

etc.

if some line from file1 does not have any matching records in file2, then it should not appear in file3.

eg. File1

Code:

1111111111,abcde,12.10.09,675069AG
2222222222,fghij,09.08.09,948p0

file2

Code:

sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
erererer,3840.98,3333333333,11.12.09,uyeriery

file3

Code:

1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

Thanks in Advance!!!

Last edited by Scott; 12-17-2009 at 09:34 AM.. Reason: Please use code tags

Swagi

View Public Profile for Swagi

Find all posts by Swagi

12-17-2009

Registered User

65, 0

Join Date: Dec 2009

Last Activity: 19 December 2009, 6:21 AM EST

Posts: 65

Thanks Given: 0

Thanked 0 Times in 0 Posts

Code:

gawk -F"," 'FNR==NR{a[$1]=$0;next}
{ for(o=1;o<=NF;o++){ 
       if($o in a) { 
              print a[$o];print 
              break 
       }
   }
}' file1 file2

ichigo

View Public Profile for ichigo

Find all posts by ichigo

12-17-2009

Registered User

9, 0

Join Date: Dec 2009

Last Activity: 11 June 2012, 3:22 AM EDT

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

Code:

> file3
cat file1 | while read line
do
column1=`echo $line | awk -F"," ' { print $1 } '`
grep $column1 file2 > tmpfile
if [ $? -eq 0 ]; then 
echo $line >> file3
cat tmpfile >> file3
rm tmpfile
fi
done

Last edited by Scott; 12-17-2009 at 10:12 AM.. Reason: Code tags, please!

vijay_vasanth

View Public Profile for vijay_vasanth

Find all posts by vijay_vasanth

12-17-2009

Registered User

34, 0

Join Date: Jul 2009

Last Activity: 26 August 2011, 3:33 AM EDT

Location: Noida - India

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

hi

assumption :- the files are located at /home/akshay/temp/Scripts/

Code:

fileLoc="/home/akshay/temp/Scripts/"
while read line
do
        temp=$(echo $line | awk -F ',' '{print $1}')
        grep $temp $fileLoc/file2
        if [ $? -eq 0 ] ; then
                echo $line >> $fileLoc/file3
                grep "$temp" $fileLoc/file2 >> $fileLoc/file3
        fi
done < $fileLoc/file1

cheers

akshay61286

View Public Profile for akshay61286

Find all posts by akshay61286

12-17-2009

Registered User

645, 19

Join Date: May 2008

Last Activity: 7 August 2017, 4:42 AM EDT

Location: Amman, Jordan

Posts: 645

Thanks Given: 2

Thanked 19 Times in 19 Posts

an easy solution below ... you can use gawk,nawk or /usr/xpg4/bin/awk :-

Code:

gawk  '
NR==FNR { a[$1]=$0 ; next}
{ for (i=1;i<=NF;i++) {
        if ( $i in a ) { b[$i]=b[$i]"\n"$0 ; next }
     }
}
END{
for (i in a) { printf "%s %s\n\n" ,a[i],b[i]}     
}
' FS=","  File1.txt FS=","  File2.txt  > File3.txt

---------- Post updated at 17:16 ---------- Previous update was at 16:58 ----------

Quote:

Originally Posted by ichigo

Code:

gawk -F"," 'FNR==NR{a[$1]=$0;next}
{ for(o=1;o<=NF;o++){ 
       if($o in a) { 
              print a[$o];print 
              break 
       }
   }
}' file1 file2

ichigo:-the above code will not give you the correct o/p as below in bold.

Code:

o/p:-
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
1111111111,abcde,12.10.09,675069AG
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1

Code:

but the desired o/p are:
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

ahmad.diab

View Public Profile for ahmad.diab

Find all posts by ahmad.diab

12-17-2009

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Ahmad, the order of the output is not fixed. When I run it using mawk I get:

Code:

2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1

---------- Post updated at 21:55 ---------- Previous update was at 21:54 ----------

Alternative in shell :

Code:

while read line; do
  if match=$(grep "${line%%,*}" file2); then
    printf "$line\n$match\n"
  fi
done < file1 > file3

Output:

Code:

1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

Not as efficient as awk though for large numbers but perhaps good enough.

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

12-18-2009

Registered User

2, 0

Join Date: Dec 2009

Last Activity: 4 January 2010, 12:13 AM EST

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

Ahamd..i am getting "-bash: gawk: command not found" error.
Please help me how to overcome this..Thanks

Quote:

Originally Posted by ahmad.diab

an easy solution below ... you can use gawk,nawk or /usr/xpg4/bin/awk :-

Code:

gawk  '
NR==FNR { a[$1]=$0 ; next}
{ for (i=1;i<=NF;i++) {
        if ( $i in a ) { b[$i]=b[$i]"\n"$0 ; next }
     }
}
END{
for (i in a) { printf "%s %s\n\n" ,a[i],b[i]}     
}
' FS=","  File1.txt FS=","  File2.txt  > File3.txt

---------- Post updated at 17:16 ---------- Previous update was at 16:58 ----------

ichigo:-the above code will not give you the correct o/p as below in bold.

Code:

o/p:-
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
1111111111,abcde,12.10.09,675069AG
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1

Code:

but the desired o/p are:
1111111111,abcde,12.10.09,675069AG
abjkfd689,12.346,1111111111,15.09.09,kjfjlja
fhaie87oikjl,456788.09,1111111111,12.06.09,iieuwfdi1
2222222222,fghij,09.08.09,948p0
sdjkfh343mn,74895495.89,2222222222,02.05.09,uyiuewy

---------- Post updated at 02:20 AM ---------- Previous update was at 02:16 AM ----------

Hi Akshay,

Thanks for the code...it works good.
But can you give me some other code that is less time consuming.
All my files are several hundred MB's containing millions of records.
So opening up file and reading line by line take alot of time.
Do you have a workaround for this ???
Thanks.

Quote:

Originally Posted by akshay61286

hi

assumption :- the files are located at /home/akshay/temp/Scripts/

Code:

fileLoc="/home/akshay/temp/Scripts/"
while read line
do
        temp=$(echo $line | awk -F ',' '{print $1}')
        grep $temp $fileLoc/file2
        if [ $? -eq 0 ] ; then
                echo $line >> $fileLoc/file3
                grep "$temp" $fileLoc/file2 >> $fileLoc/file3
        fi
done < $fileLoc/file1

cheers

Swagi

View Public Profile for Swagi

Find all posts by Swagi

Shell Programming and Scripting

Creating a file with matching records from two other files

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Discussion started by: jvoot

2. UNIX for Beginners Questions & Answers

Matching fields between two files, repeated records

Discussion started by: jvoot

3. Shell Programming and Scripting

How can I retrieve the matching records from data file mentioned?

Discussion started by: later_troy

4. Shell Programming and Scripting

Shell script to filter records in a zip file that contains matching columns from another file

Discussion started by: anil.v

5. Shell Programming and Scripting

Performance of calculating total number of matching records in multiple files

Discussion started by: EAGL�

6. Shell Programming and Scripting

Listing the file name and no of records in each files for the files created on a specific day

Discussion started by: Showdown

7. Shell Programming and Scripting

Creating single pattern for matching multiple files.

Discussion started by: Little

8. Shell Programming and Scripting

Compare two files with different number of records and output only the Extra records from file1

Discussion started by: i150371485

9. Shell Programming and Scripting

Removing non matching records

Discussion started by: baskivs

10. UNIX for Dummies Questions & Answers

How can you delete records in a file matching a pattern?

Discussion started by: mode09