Country Codes script faster response ;please help

04-24-2008

Registered User

39, 0

Join Date: Feb 2008

Last Activity: 1 August 2009, 8:57 AM EDT

Posts: 39

Thanks Given: 0

Thanked 0 Times in 0 Posts

Reply

Dear Era

I tried your perl script code and it can't be executed also i tried your last code optimized ; when i tried it directly it tells ambiguous output redirect when i write it in a separate file and run it as executable script this is the output:
./code2
./code2: -f: not found
geo@spiserver /home/geo/cdr/AnsweredCalls 58 > Can't open $a\{ print $0 c1 c2; c1 = c2 = ""; }

I also tried this one but it is very very slow:

echo "{`sed 's=$.*$ $.*$=if($6~/^\2/)print $0\" \1\"=' country-codes.txt`}" >awk_4
head awk_4
tail awk_4
awk -f awk_4 phonelines.txt >first_added
head first_added
tail first_added
echo "{`sed 's=$.*$ $.*$=if($7~/^\2/)print $0\" \1\"=' country-codes.txt`}" >awk_5
head awk_5
tail awk_5
awk -f awk_5 first_added

Please Advise; For you information the required response for half a million line is 1 minute to 1 and half minute and note that the file containing mapping codes will contain also large number of entries

Thanks
Zanetti

zanetti321

View Public Profile for zanetti321

Find all posts by zanetti321

04-24-2008

Registered User

3,653, 12

Join Date: Mar 2008

Last Activity: 28 March 2011, 6:41 AM EDT

Location: /there/is/only/bin/sh

Posts: 3,653

Thanks Given: 0

Thanked 12 Times in 10 Posts

If your awk won't accept a script on standard input, you can save the sed output to a file, and keep that (until the list of number code changes again).

Code:

sed --e '1i\#!/usr/bin/awk -f' \
    -e 's%\(.*\) \(.*\)%$4 ~ /^\2/ { c1=" \1" } $5 ~ /^\2/ { c2=" \1" }%' \
    -e '$a\{ print $0 c1 c2; c1 = c2 = ""; }' country-codes.txt >country-codes.awk
chmod +x country-codes.awk

You could also check if you have nawk or mawk or gawk; I would guess those might be faster than the old awk.

Do you have timing numbers from the perl script, are they acceptable?

Just printing 500,000 lines, awk (actually mawk) is much faster than perl on my system, but I believe Perl should have better regex optimizations, so for a script which does a lot of matching, I vaguely speculate that Perl might still be faster.

Code:

vnix$ wc -l phonelines.txt 
4 phonelines.txt
vnix$ cat phonelines.txt 
Belgium1 ISC 3 924556808 393475157928 b 1B
Italy ISC 151 925082838 447717254923 b 26
ISC BT-GlasGow 152 88216900500 218925288836 f 1B
ISC Spain-Telefonica 14 964188208970 218925735325 b 29
vnix$ yes phonelines.txt | head -n 125000 | time xargs perl -pe 1 >/dev/null
1.89user 1.00system 0:03.27elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9393minor)pagefaults 0swaps
vnix$ yes phonelines.txt | head -n 125000 | time xargs awk 1 >/dev/null
0.72user 0.72system 0:01.76elapsed 82%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5880minor)pagefaults 0swaps
vnix$ yes phonelines.txt | head -n 125000 | time xargs cat >/dev/null
0.26user 0.73system 0:01.23elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4072minor)pagefaults 0swaps

era

View Public Profile for era

Find all posts by era

04-24-2008

Registered User

3,653, 12

Join Date: Mar 2008

Last Activity: 28 March 2011, 6:41 AM EDT

Location: /there/is/only/bin/sh

Posts: 3,653

Thanks Given: 0

Thanked 12 Times in 10 Posts

Did some more tests; at least for a small country-codes.txt, the awk is faster. The comparison is not entirely fair because the awk has the country codes hardcoded, whereas the perl script reads it in from disk every time.

era

View Public Profile for era

Find all posts by era

04-24-2008

Registered User

39, 0

Join Date: Feb 2008

Last Activity: 1 August 2009, 8:57 AM EDT

Posts: 39

Thanks Given: 0

Thanked 0 Times in 0 Posts

Dear Era

Please Lets concentrate on the nawk ; i do have nawk ; i can't catch the perl script at all or may be i can't run it , what i did that i write the same code you gave me in a seperate file and i changed its mode to +x and i run it and it can't be run

Regarding your last script ; Note that the country codes.txt will be created once and left

Regarding the execution of last script also i write it in a seperate file and it the error is cant open /usr/bin/awk

Please can you specify your words in points so as to catch you ?

Thank you for your care
Zanetti

zanetti321

View Public Profile for zanetti321

Find all posts by zanetti321

04-24-2008

Registered User

39, 0

Join Date: Feb 2008

Last Activity: 1 August 2009, 8:57 AM EDT

Posts: 39

Thanks Given: 0

Thanked 0 Times in 0 Posts

Finally Perl Worked :)

Dear Era

Finally i succeded to execute the perl script you gave me before but the output as you said is not may expectations at all here you the output

218914146348 matches , maps to
f matches , maps to
218925098033 matches , maps to
f matches , maps to
218928892045 matches , maps to
f matches , maps to
218924078768 matches , maps to
f matches , maps to

It maps only the $3 and not both ($3 and $4) also as you see the country codes in mappings.txt is not included and also i dont want f matched and maps to something i need only $3 and $4 to be mapped with the codes in the mappings.txt which looks like:

Egypt-Vodafone 2010
UK 44
Egypt-Mobinil 2012
Libya-Libyana 21892
France 33

And so on

I think it is good now that i have the fastest script according to your guess so please help me to customize it to have output like the following:

ISC Italy 1249 20125199518 218924831978 b 22

The output after the code will be:

ISC Italy 1249 20125199518 218924831978 b 22 Egypt-Mobinil Egypt Libyana

Thanks
Zanetti

zanetti321

View Public Profile for zanetti321

Find all posts by zanetti321

04-24-2008

Registered User

3,653, 12

Join Date: Mar 2008

Last Activity: 28 March 2011, 6:41 AM EDT

Location: /there/is/only/bin/sh

Posts: 3,653

Thanks Given: 0

Thanked 12 Times in 10 Posts

I meant to mention that if your awk has a different path, you need to change that bit.

I can't test these scripts on real-world data, but you have the data -- can you perform a test run like I did and see what's fast enough?

The Perl script might need some modifications, the output doesn't look correct any longer; did you change the field numbers at some point? I notice the awk script had $4 and $5 before, and now you have the phone numbers in $6 and $7 apparently.

The awk script can easily be converted to Perl using a2p so you can really compare them head to head if you just can get the latest awk script to work. Somewhat to my surprise, your awk script (which uses if inside a single main clause, instead of one little clause for each number code) seemed to be the fastest in my limited test (with some slight modifications I made). Here's my slightly modified version, and the timings:

Code:

vnix$ # This is the script I posted earlier, optimized version of unilover's
vnix$ yes phonelines.txt | head -125000 | time xargs ./country-codes.awk >/dev/null
2.39user 0.74system 0:03.55elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5958minor)pagefaults 0swaps
vnix$ echo '#!/usr/bin/awk -f ' >phonylines.awk
vnix$ echo '{ c1 = c2 = "";' >>phonylines.awk 
vnix$ sed 's%\(.*\) \(.*\)%if ($4~/^\2/) c1 = " \1"; if ($5~/^\2/) c2 = " \1";%' country-codes.txt >>phonylines.awk
vnix$ echo 'print $0 c1 c2; }' >>phonylines.awk
vnix$ chmod +x ./phonylines.awk
vnix$ ./phonylines.awk phonelines.txt
Belgium1 ISC 3 924556808 393475157928 b 1B Italy
Italy ISC 151 925082838 447717254923 b 26 England
ISC BT-GlasGow 152 88216900500 218925288836 f 1B Thuraya Satellite Libyana
ISC Spain-Telefonica 14 964188208970 218925735325 b 29 Lebanon Libyana
vnix$ yes phonelines.txt | head -125000 | time xargs ./phonylines.awk >/dev/null 
2.25user 0.72system 0:03.38elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5956minor)pagefaults 0swaps
vnix$ yes phonelines.txt | head -125000 | time xargs perl ./phonylines.pl >/dev/null
9.78user 1.28system 0:12.44elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10935minor)pagefaults 0swaps
vnix$ # ouch

But it needs to be said that this is a very limited number of phone codes, and the game might change when you have a lot of them.

Last edited by era; 04-24-2008 at 05:07 PM.. Reason: Oops, needed to change sed separator character; reset c1 and c2 at beginning (again )-:

era

View Public Profile for era

Find all posts by era

UNIX for Advanced & Expert Users

Country Codes script faster response ;please help

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script works, but I think it could be better and faster

Discussion started by: cornelvis

2. Shell Programming and Scripting

Optimize shell script to run faster

Discussion started by: SkySmart

3. Shell Programming and Scripting

Making script run faster

Discussion started by: SkySmart

4. Shell Programming and Scripting

Make script faster

Discussion started by: AlbertGM

5. Shell Programming and Scripting

Script to parse a file faster

Discussion started by: sags007_99

6. Shell Programming and Scripting

Can anyone make this script run faster?

Discussion started by: shew01

7. UNIX for Advanced & Expert Users

Script for Country Codes

Discussion started by: zanetti321

8. Shell Programming and Scripting

How the first script should notify in case there is no response from second

Discussion started by: rajusa10