Awk: Multiple Replace In Column From Two Different Files

08-02-2013

Registered User

100, 0

Join Date: Mar 2012

Last Activity: 26 July 2017, 2:30 AM EDT

Posts: 100

Thanks Given: 22

Thanked 0 Times in 0 Posts

Awk: Multiple Replace In Column From Two Different Files

Master_1.txt

Code:

2372,MTS,AP
919821,Airtel,DL
0819,MTS,MUM
919849788001,Airtel,AP
1430,Aircel MP,20
405899143999999,MTS,KRL
USSDLIKE,MTS,DEL

Master_2.txt

Code:

919136,DL
9664,RAJ
919143,KOL
9888,PUN

Input File:

Code:

405899143999999,919821900030,919143227420,9664099976,919875089998,S,2
USSDLIKE,161A,91225,8104738198,405899136999995,S,5
2A65,919888799924,919136296645,9888997100,919875089998,S,1
MCA,,50009,8925682714,405899136999995,F,1

Requirement:
Column 1 & 2 to be matched from master_1.txt file & column 3 & 4 to be matched from master_2.txt file.

Output Required

Code:

MTS,KRL,Airtel,DL,KOL,RAJ,919875089998,S,2
MTS,DEL,161A,,91225,8104738198,405899136999995,S,5
2A65,,919888799924,,DL,PUN,919875089998,S,1
MCA,,,,50009,,8925682714,405899136999995,F,1

Code

Code:

awk -F, 'FNR==1 {++counter}
counter==1 {MASTERSPCGT[$1]=$2","$3;next}
counter==2 {MASTERMDNSERIES[$1]=$2;next}
{
 for (counter=1; counter<=2; counter++)
        {
          if ($counter in MASTERSPCGT) $counter=MASTERSPCGT[$counter]
                else if (substr($counter,1,9) in MASTERSPCGT) $counter=MASTERSPCGT[substr($counter,1,9)]
                else if (substr($counter,1,8) in MASTERSPCGT) $counter=MASTERSPCGT[substr($counter,1,8)]
                else if (substr($counter,1,7) in MASTERSPCGT) $counter=MASTERSPCGT[substr($counter,1,7)]
                else if (substr($counter,1,6) in MASTERSPCGT) $counter=MASTERSPCGT[substr($counter,1,6)]
                else if (substr($counter,1,5) in MASTERSPCGT) $counter=MASTERSPCGT[substr($counter,1,5)]
                else if (substr($counter,1,4) in MASTERSPCGT) $counter=MASTERSPCGT[substr($counter,1,4)]
                else $counter=$counter","
        }
        for (counter=3;counter<=4; counter++)
        {
         if (substr($counter,1,7) in MDNSERIES) $counter=MDNSERIES[substr($counter,1,7)]
                else if (substr($counter,1,4) in MDNSERIES) $counter=MDNSERIES[substr($counter,1,4)]
        }
}
1
' FS="," OFS="," master_1.txt master_2.txt inputfile

Required ouput is not coming for changes in column 3 & 4 but it is working fine for column 1 & 2. Can you please suggest changes

siramitsharma

View Public Profile for siramitsharma

Find all posts by siramitsharma

08-02-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

One problem with your code is that after you change field 1, (adding a comma and sometimes making other changes), what used to be fields 2, 3, and 4 are then fields 3, 4, and 5. After that you aren't working on the fields you thought you were working on.

I also didn't understand why you were ignoring the replacement text from some entries in master_1.txt by using no more than 9 characters of the array subscript when making replacements.

You said you want:

Code:

MTS,KRL,Airtel,DL,KOL,RAJ,919875089998,S,2
MTS,DEL,161A,,91225,8104738198,405899136999995,S,5
2A65,,919888799924,,DL,PUN,919875089998,S,1
MCA,,,,50009,,8925682714,405899136999995,F,1

but I don't understand where the comma shown in red in the last line is supposed to come from???
The following script almost produces the output you said you wanted, except that it doesn't produce that extra comma:

Code:

awk '
function FandR(fn, list,        i) {
        for(i in list) {
                if(index($fn, i) == 1) {
                        $fn = list[i]
                        return
                }
        }
        if(fn < 3) $fn = $fn ","
}
BEGIN { FS = OFS = ","
}
FNR == 1 {
        fc++
}       
fc == 1 {
        MASTERSPCGT[$1] = $2 "," $3
        next
}       
fc == 2 {
        MASTERMDNSERIES[$1] = $2
        next
}
{       FandR(4, MASTERMDNSERIES)
        FandR(3, MASTERMDNSERIES)
        FandR(2, MASTERSPCGT)
        FandR(1, MASTERSPCGT)
        print
}' master_1.txt master_2.txt inputfile

As always, if you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of using /bin/awk or /usr/bin/awk.

Note that this script:

uses a function (FandR [for Find and Replace]) to do the work,
the calls to FandR() are set up to work on field 4, 3, 2, and 1 and that order, and
FandR() uses index() rather than repeated calls to substr() to pick the matching data.

I hope this helps.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-02-2013

Registered User

100, 0

Join Date: Mar 2012

Last Activity: 26 July 2017, 2:30 AM EDT

Posts: 100

Thanks Given: 22

Thanked 0 Times in 0 Posts

Thanks don for the inputs. It is almost near & extra comma mentioned by you in red is not required so code is perfect in terms of the near output.

One point, i would like to share is that, i used

Code:

 substr

to check starting characters matching in the input file, so in the code shared by you

Code:

 index

will not be serving the criteria in case there are multiple enteries in the master_1 file as it will check in the complete string(correct me if i am wrong). for e.g entries marked in red color below(as i have these type of entries as well in the master_1 file), so the exact match in the master file(both master_1 & master_2) to be checked in starting chars of the input file. So can you please suggest what needs to be done additionaly?

Master_1 file

Code:

2372,MTS,AP
919821,Airtel,DL
0819,MTS,MUM
919849788001,Airtel,AP
1430,Aircel MP,20
405899143999999,MTS,KRL
USSDLIKE,MTS,DEL
US,Uninor,Silichar

input file

Code:

405899143999999,919821900030,919143227420,9664099976,919875089998,S,2
USSDLIKE,161A,91225,8104738198,405899136999995,S,5
2A65,919888799924,919136296645,9888997100,919875089998,S,1
MCA,,50009,8925682714,405899136999995,F,1
US,261A,919143001234,8104738198,405899136999995,S,5

Rest of the file remains same

Output required will be

Code:

MTS,KRL,Airtel,DL,KOL,RAJ,919875089998,S,2
MTS,DEL,161A,,91225,8104738198,405899136999995,S,5
2A65,,919888799924,,DL,PUN,919875089998,S,1
MCA,,,,50009,8925682714,405899136999995,F,1
Uninor,Silichar,261A,,919143001234,8104738198,405899136999995,S,5

siramitsharma

View Public Profile for siramitsharma

Find all posts by siramitsharma

08-02-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

The problem is that the order of the elements accessed from an array by the awk command:

Code:

for (index in array) {
        do stuff
}

is unspecified.

The index(string1, string2) function will return the position in string1 where string2 first appears (or 0 if string2 does not appear in string1). Since my code is checking that index() is returning 1, it is checking that string2 appears at the start of string1. If there are two entries in either of the master files where one key is an initial portion of another key in the same file, (such as USSDLIKE and US in your new master_1.txt), the following simple fix will work as long as the shorter key always follows the longer key in the master file. If you can't guarantee that the master files will follow this rule, we can change the script to sort the master files in reverse order before giving them to the awk script to get the desired results.

Note however, that it looks to me like your newly added line in inputfile:

Code:

US,261A,919143001234,8104738198,405899136999995,S,5

should be translated to:

Code:

Uninor,Silichar,261A,,KOL,8104738198,405899136999995,S,5

instead of to:

Code:

Uninor,Silichar,261A,,919143001234,8104738198,405899136999995,S,5

as you requested in message #3 in this thread because the field in red above should be converted due to the following entry in master_2.txt:

Code:

919143,KOL

Anyway, the awk script:

Code:

awk '
function FandR(fn, listindex, listindexcnt, list,       i) {
        for(i = 1; i <= listindexcnt; i++) {
                if(index($fn, listindex[i]) == 1) {
                        $fn = list[listindex[i]]
                        return
                }
        }
        if(fn < 3) $fn = $fn ","
}
BEGIN { FS = OFS = ","
}
FNR == 1 {
        fc++
}
fc == 1 {
        MASTERSPCGT[list1[++l1c] = $1] = $2 "," $3
        next
}
fc == 2 {
        MASTERMDNSERIES[list2[++l2c] = $1] = $2
        next
}
{       FandR(4, list2, l2c, MASTERMDNSERIES)
        FandR(3, list2, l2c, MASTERMDNSERIES)
        FandR(2, list1, l1c, MASTERSPCGT)
        FandR(1, list1, l1c, MASTERSPCGT)
        print
}' master_1.txt master_2.txt inputfile

uses list1[] (indexed by its line number in master_1.txt) and list2[] (indexed by its line number in master_2.txt) to store the keys. These lists can be used to check the keys in the order they were presented (instead of in an unspecified order).

The output from this script when given your updated input files is:

Code:

MTS,KRL,Airtel,DL,KOL,RAJ,919875089998,S,2
MTS,DEL,161A,,91225,8104738198,405899136999995,S,5
2A65,,919888799924,,DL,PUN,919875089998,S,1
MCA,,,,50009,8925682714,405899136999995,F,1
Uninor,Silichar,261A,,KOL,8104738198,405899136999995,S,5

Last edited by Don Cragun; 08-02-2013 at 03:53 PM.. Reason: Remove extraneous "]".

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-03-2013

Registered User

100, 0

Join Date: Mar 2012

Last Activity: 26 July 2017, 2:30 AM EDT

Posts: 100

Thanks Given: 22

Thanked 0 Times in 0 Posts

Thanks a ton don, it did worked & i have put master file in order to get the required output.

Thanks once again

siramitsharma

View Public Profile for siramitsharma

Find all posts by siramitsharma

Shell Programming and Scripting

Awk: Multiple Replace In Column From Two Different Files

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract a column and multiple by 1000 and replace it on same file

Discussion started by: arunkumar_mca

2. UNIX for Beginners Questions & Answers

awk script to extract a column, replace one of the header and replace year(from ddmmyy to yyyy)

Discussion started by: Kunalcurious

3. UNIX for Beginners Questions & Answers

How to copy a column of multiple files and paste into new excel file (next to column)?

Discussion started by: dineshkumarsrk

4. Shell Programming and Scripting

Do replace operation and awk to sum multiple columns if another column has duplicate values

Discussion started by: as7951

5. Shell Programming and Scripting

sed command to replace multiple column in one go

Discussion started by: Preeti Chandra

6. Shell Programming and Scripting

Need to sum up a column value from multiple files into Credit and Debit categories using awk command

Discussion started by: kcdg859

7. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Discussion started by: jacobs.smith

8. Shell Programming and Scripting

Join multiple files by column with awk

Discussion started by: macsx82

9. Shell Programming and Scripting

awk find and replace in multiple files

Discussion started by: jdhahbi

10. Shell Programming and Scripting

awk/sed column replace using column header - help

Discussion started by: jkl_jkl