merging two files where countries are repeated


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting merging two files where countries are repeated
# 1  
Old 01-18-2011
merging two files where countries are repeated

I have gone through various posts in this forum but could not locate where
the repeated columns are there .Hence posted this post.

File 1(colon seperated file)
===============
Code:
Abc : 1234 : London
Def : 2456 : Paris
Efg :1456 : Newyork
Fgh : 1324 : Tokyo
Hjk  : 5829 : California 
Lmn : 7823 : Chicao
Klm : 6472 : Munich
Jkm : 8231 : Franfurt
Acl : 3241 : Hiroshima
Jmp : 2385 : Detroit
Lqm : 4613 : Kansas

File 2(coma seperated file)
===================
Code:
USA , Newyork
USA , California
USA , Sanfransisco
USA , Detroit
USA , Chicago
USA , Kansas
USA , St.Louise
USA , Florida
UK , London
FRANCE , Paris
FRANCE , Marsellie.
INDIA , NewDelhi
INDIA , Mumbai
INDIA , Calcutta
INDIA , Chennai
INDIA , Bangalore
JAPAN , Tokyo
JAPAN , Nagasaki
JAPAN , Hirosima
CANADA , Ottawa
CANADA , Montreal
CANADA , Vancuvor
ARGENTINA , Buenos Aires
BRAZIL , Brassila
BRAZIL , Sao Paulo
ITALY , Rome
NORVEY , Oslo
SPAIN , Madrid
GERMANY , Frankfurt
GERMANY , Munich,
USSR , Moscow
USSR , Laningrad
USSR , Vladvostok

Output File expected as
==================
Code:
Abc , 1234 , London ,UK
Def  , 2456 , Paris , France
Efg  , 3456 , Newyork , USA
Fgh , 1324 , Tokyo , Japan
Hjk  , 5829 , California , USA
Lmn , 7823 , Chicago, USA
Klm , 6472 , Munich , Germany
Jkm , 8231 , Frankfurt , Germany
Acl  , 3241 , Hiroshima , Japan
Jmp , 2385 , Detroit , USA
Lqm , 4613 , Kansas , USA

So far I have used join command but here it is not working as the cities are under the label countries.

Result required is Display the name of the country against the name of the city .
Tried to use
Code:
awk -F":," 'NR==FNR{ key[$1,$2]=$2;next } $1[key] { print $1,$2,$3} ' "OFS=,"  file2 file1 >outputfile

Thanks in advance.

Last edited by vakharia Mahesh; 01-19-2011 at 11:47 AM.. Reason: Code tags, please...
# 2  
Old 01-18-2011
Code:
awk -F'[:,]' 'NR==FNR{key[$2]=$1;next}{print $1,$2,$3," "key[$3]}' OFS=, file2 file1 >outputfile


Last edited by anurag.singh; 01-18-2011 at 12:20 PM..
# 3  
Old 01-19-2011
Anurag ,
Thanks lot for your reply but it is not working , only the 1st instance is
shown and not all the COUNTRIES names are mentioned.I have tested your
code today and found the above things.
i.e. The output shows as :
OUTPUT FILE
=============
Abc , 1234 , London ,UK
Def , 2456 , Paris , France
Efg , 3456 , Newyork , USA
Fgh , 1324 , Tokyo , Japan
Hjk , 5829 , California ?
Lmn , 7823 , Chicago ?
Klm , 6472 , Munich , Germany
Jkm , 8231 , Frankfurt ?
Acl , 3241 , Hiroshima ?
Jmp , 2385 , Detroit ?
Lqm , 4613 , Kansas ?

That is the 1st instance of country will be displayed or joined and not in subsequent events ??

I hope I have made things clear .

Last edited by vakharia Mahesh; 01-19-2011 at 11:46 AM..
# 4  
Old 01-19-2011
Quote:
Originally Posted by vakharia Mahesh
... but it is not working , only the 1st instance is shown and not all the COUNTRIES names are mentioned. ... i.e. The output shows as :
OUTPUT FILE
=============
Abc , 1234 , London ,UK
Def , 2456 , Paris , France
Efg , 3456 , Newyork , USA
Fgh , 1324 , Tokyo , Japan
Hjk , 5829 , California ?
Lmn , 7823 , Chicago ?
Klm , 6472 , Munich , Germany
Jkm , 8231 , Frankfurt ?
Acl , 3241 , Hiroshima ?
Jmp , 2385 , Detroit ?
Lqm , 4613 , Kansas ?

That is the 1st instance of country will be displayed or joined and not in subsequent events ??

...
It is not working because your data doesn't match.

Brownie points for you if you could find the mismatch in each case yourself.

Code:
$
$ # display the contents of file2
$
$ cat file2
USA , Newyork
USA , California
USA , Sanfransisco
USA , Detroit
USA , Chicago
USA , Kansas
USA , St.Louise
USA , Florida
UK , London
FRANCE , Paris
FRANCE , Marsellie.
INDIA , NewDelhi
INDIA , Mumbai
INDIA , Calcutta
INDIA , Chennai
INDIA , Bangalore
JAPAN , Tokyo
JAPAN , Nagasaki
JAPAN , Hirosima
CANADA , Ottawa
CANADA , Montreal
CANADA , Vancuvor
ARGENTINA , Buenos Aires
BRAZIL , Brassila
BRAZIL , Sao Paulo
ITALY , Rome
NORVEY , Oslo
SPAIN , Madrid
GERMANY , Frankfurt
GERMANY , Munich,
USSR , Moscow
USSR , Laningrad
USSR , Vladvostok
$
$ # display the contents of file1
$
$ cat file1
Abc : 1234 : London
Def : 2456 : Paris
Efg :1456 : Newyork
Fgh : 1324 : Tokyo
Hjk  : 5829 : California
Lmn : 7823 : Chicao
Klm : 6472 : Munich
Jkm : 8231 : Franfurt
Acl : 3241 : Hiroshima
Jmp : 2385 : Detroit
Lqm : 4613 : Kansas
$
$ # run a Perl one-liner joining city to country
$
$ perl -lne 'if ($ARGV eq "file2"){/^(\w+)\s*,\s*(\w+)$/; $x{$2}=$1} else {/^.*:\s*(\w+)$/; $_="$_ , $x{$1}"; s/:/,/g; print}' file2 file1
Abc , 1234 , London , UK
Def , 2456 , Paris , FRANCE
Efg ,1456 , Newyork , USA
Fgh , 1324 , Tokyo , JAPAN
Hjk  , 5829 , California  ,
Lmn , 7823 , Chicao ,
Klm , 6472 , Munich ,
Jkm , 8231 , Franfurt ,
Acl , 3241 , Hiroshima ,
Jmp , 2385 , Detroit , USA
Lqm , 4613 , Kansas , USA
$
$

tyler_durden
# 5  
Old 01-19-2011
Try this,

Code:
awk -F"[:,]" 'NR==FNR{a[$2]=$1;next;} a[$3] { print $1,$2,$3,a[$3]}' file2 OFS="," file1

This User Gave Thanks to pravin27 For This Post:
# 6  
Old 01-19-2011
some updates base on pravin27's code, to fix the problem if there are different spaces or tabs before and after city and country name.

Code:
awk -F"[:,]" '
function s(a) {gsub(/^[\t ]*/,"",a);gsub(/[\t ]*$/,"",a);return a} 
NR==FNR{a[s($2)]=s($1);next;} a[s($3)] { print  $1,$2,$3 " , " a[s($3)]}
' file2 OFS="," file1


Last edited by rdcwayx; 01-19-2011 at 09:18 PM..
This User Gave Thanks to rdcwayx For This Post:
# 7  
Old 01-20-2011
Hi
Pravin27 and rdcwayx,
Thanks lot to both you for the code and working perfectly as per my requirement
,One request , can you explain in details the logic for a[$3] ? in details ?
Is it a loop ? Plese if you can explain the code in details ,I really appreciate
your help anyway .

A BIG THANKS ONCE AGAIN FOR THE CODE

With due regards and respect

Vakharia M J Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records. My goal is that when $1 of... (2 Replies)
Discussion started by: jvoot
2 Replies

2. UNIX for Beginners Questions & Answers

Matching fields between two files, repeated records

In two previous posts (here) and (here), I received help from forum members comparing multiple fields across two files and selectively printing portions of each as output based upon would-be matches using awk. I had been fairly comfortable populating awk arrays with fields and using awk's special... (3 Replies)
Discussion started by: jvoot
3 Replies

3. Shell Programming and Scripting

Compare two files with repeated lines

Hi all, I've been trying to write a script to compare two files. This is what I want: file 1: a 1 2 b 5 9 c 4 7 file 2: a a c a b Output: a 1 2 a 1 2 (2 Replies)
Discussion started by: ernesto561
2 Replies

4. What is on Your Mind?

UNIX field scope in European countries

Hi, I have been working in India since past 13 years and as per Indian IT culture, there will be saturation in Salary which barely matches with rising living costs in India. I am thinking to move/work in some other country for few years at least which help me financially plus technically. As of... (3 Replies)
Discussion started by: nightup2222
3 Replies

5. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

6. Forum Support Area for Unregistered Users & Account Problems

Trouble Registering? Countries or Regions Abusing Forums

The forums have been seeing a sharp increase in spam bots, forum robots, and malicious registrations from certain countries. If you have been directed to this thread due to a "No Permission Error" when trying to register please post in this thread and request permission to register, including... (1 Reply)
Discussion started by: Neo
1 Replies

7. Shell Programming and Scripting

[Solved] Parsing countries file

Hi All I would like to take contries for this file http://www.textfixer.com/resources/dropdowns/country-dropdown-iso-html.txt and save it in another file with the following structure . Afghanistan, Åland Islands , Albania, Algeria, American Samoa Can anyone help me to do this ? (3 Replies)
Discussion started by: molwiko
3 Replies

8. Solaris

Huge (repeated Entry) text files

Somebody HELP! I have a huge log file (TEXT) 76298035 bytes. It's a logfile of IMEIs and IMSIS that I get from my EIR node. Here is how the contents of the file look like: 000000, 1 33016382000913 652020100423994 1 33016382002353 652020100430743 1 33017035101003 652020100441736... (4 Replies)
Discussion started by: axl
4 Replies
Login or Register to Ask a Question