Simple two file compare with twist


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Simple two file compare with twist
# 1  
Old 05-17-2012
Simple two file compare with twist

I have file1 and file2
I lookup field3 from file2 in field1 of file1 and if there is a match, output field 2,3,5 from file2.
I now want to add field2 of file1 in the output.

I suspect what I have to do is read the entire line of file1 into a 2 dim array? pls help.

here is my code:


cat file1
Code:
foo,cmd1
bar,cmd2

cat file2
Code:
Hello,World,foo
Alice,Bob,bar
Egg,Spam,ham

output with current awk:

Code:
awk -F, 'FNR==NR {arr[$1];next} $3 in arr {OFS=","; print $2,$4,$5}' file1 file2
World,fi,fom
Bob,bie,doll

desired output:

Code:
World,fi,fom,cmd1
Bob,bie,doll,cmd2


Last edited by Scrutinizer; 05-17-2012 at 09:03 AM.. Reason: code tags
# 2  
Old 05-17-2012
Try:
Code:
awk -F, 'FNR==NR {arr[$1]=$2; next} $3 in arr {print $2,$4,$5,arr[$3]}' OFS=, file1 file2

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 05-17-2012
Quote:
Originally Posted by Scrutinizer
Try:
Code:
awk -F, 'FNR==NR {arr[$1]=$2; next} $3 in arr {print $2,$4,$5,arr[$3]}' OFS=, file1 file2

indeed that works, thanks.
do you mind explaining this {arr[$1]=$2
and OFS=, at the end (as i thought the comma within the print brackets did the job)
# 4  
Old 05-17-2012
Ok, since you already used arr[$1] to create an empty array element with the index of field 1, I used that to give it the value of field 2 instead, which then later gets referenced in the second part, when file 2 gets read. You could leave OFS where it was, but that means it would get set to a comma every time a new line is read from file2. This way it is only set once, before the files are being read, it is just more efficient.
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 05-17-2012
Quote:
Originally Posted by Scrutinizer
Ok, since you already used arr[$1] to create an empty array element with the index of field 1, I used that to give it the value of field 2 instead, which then later gets referenced in the second part, when file 2 gets read.
sorry still a little confused.

so the sequence is:
1. load file1, field1 into arr until FNR==NR
2. perform the match condition i.e $3 in arr

at this point my array still contains the value of field 1, are we saying this then gets replaced with the value of field2?

Quote:
Originally Posted by Scrutinizer
You could leave OFS where it was, but that means it would get set to a comma every time a new line is read from file2. This way it is only set once, before the files are being read, it is just more efficient.
good tip Smilie
# 6  
Old 05-17-2012
When FNR==NR (the first file is being read), load field 2 ($2) into array with index of field 1 ($1), until FNR!=NR then the second file starts (and the first section of the script is now skipped). While we are reading file 2 line by line, print the fields of file 2 and recall the previously stored array value using field 3 of the second file ($3) as the index..

Last edited by Scrutinizer; 05-17-2012 at 11:04 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 05-17-2012
Ok I'm with you now.
That means when I load field 1 into the array, I've actually created an empty array element with my values from field 1 as index.
I was under the impression they are actually loaded as elements into the array with an incremental count 0,1,2,3,...n as index.
So in effect by the time the first part of the script completes, we have the following contents in that array:

Index, Value
foo,cmd1
bar,cmd2

Have I understood this correctly?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

File Listing, with a Twist?

Greetings! I have a quick question which must be deferred to those with greater skill than myself :) In this situation, I wish to create a list of all the files on an entire partition in descending order sorted by date. I tried numerous switches for ls, and found this line to be the closest... (4 Replies)
Discussion started by: LinQ
4 Replies

2. Shell Programming and Scripting

Simple two file compare with twist

I have file1 and file2 I lookup field3 from file2 in field1 of file1 and if there is a match, output field 2,3,5 from file2. I now want to add field2 of file1 in the output. I suspect what I have to do is read the entire line of file1 into a 2 dim array? pls help. (1 Reply)
Discussion started by: tmonk1
1 Replies

3. Shell Programming and Scripting

Simple awk command to compare two files and print first difference

Hello, I have two text files, each with a single column, file 1: 124152970 123899868 123476854 54258288 123117283 file 2: 124152970 123899868 54258288 123117283 122108330 (5 Replies)
Discussion started by: LMHmedchem
5 Replies

4. Shell Programming and Scripting

Multiple File renaming with a twist

Hi I can do simple file renaming but this task is slightly more troublesome Ive got a guy that gives me multiple .pdf filles in a directory named something like 3412345.pdf 4565465.pdf 8534534.pdf And he also gives me a html file which is tabled with which shows the filenames above... (2 Replies)
Discussion started by: messiah1
2 Replies

5. Shell Programming and Scripting

Section Removal With sed; and With a Twist . . .

Hello folks! Raised a bump on my head trying to figure this one out ;) I have an xml file which needs to be edited, removing an entire property section in the work. Here's what the target section layout looks like: <property name="something"> {any number of lines go here} </property>... (7 Replies)
Discussion started by: LinQ
7 Replies

6. Shell Programming and Scripting

Incrementing with a twist - please help

I'm currently trying to write a ksh or csh script that would change the name of a file found in directories and attach to the name an incrementing three digit number. I know how to write a script that will go: 000, 001, 002, 003, etc The twist is I need more increments then allowed by a 3... (11 Replies)
Discussion started by: Rust
11 Replies

7. UNIX for Dummies Questions & Answers

file count with a twist

Hello Everyone, I am using the korn shell. I was hoping to find a set of commands to count files in a directory. I am using: ls /home/name/abc* | wc -l This command works fine when a file matches abc* (returns only the file count) , however when no file(s) are found I get... (2 Replies)
Discussion started by: robert4732
2 Replies

8. UNIX for Advanced & Expert Users

building a kernel (with a twist)

Hey all, I am working on a static analysis tool and I wan't to see if it can find bugs in the linux kernel, it uses LLVM framework to analyse the instructions. Long story short I need to build the kernel with a custom compiler. The compiler will create byte code files where binaries usually... (2 Replies)
Discussion started by: zigga15
2 Replies

9. Shell Programming and Scripting

Compare 2 files yet again but with a twist

Ok so I have a file which contains 2 columns/fields and I have another file with 2 columns. The files look like: file1: 1 33 5 345 18 2 45 1 78 31 file2: 1 c1d2t0 2 c1d3t0 3 c1d4t0 4 c1d4t0 5 c2d1t0 6 c2d1t0 7 c2d1t0 8 c2d1t0 9 c2d1t0 10 c2d1t0 (11 Replies)
Discussion started by: Autumn Tree
11 Replies

10. UNIX for Dummies Questions & Answers

how do I log into this machine - with a twist...

I know this topic has been covered in one form or another, but it hasn't been covered to handle my problem. I was given a Sparc4 running Solaris 2.5.1 The root password is unknown. This machine has no cdrom drive and it has no floppy drive. I tried booting into the single user mode, but... (1 Reply)
Discussion started by: xyyz
1 Replies
Login or Register to Ask a Question