Speeding up substitutions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Speeding up substitutions
# 1  
Old 10-14-2014
Speeding up substitutions

Hi all,

I have a lookup table from which I am looking up values (from col1) and replacing them by corresponding values (from col2) in another file.

Code:
 
lookup file
 
a,b
c,d

So just replace a by b, and replace c by d.

Code:
 
mainfile
 
a,fvvgeggsegg,dvs
a,fgeggefddddddddddg
b,dfqefeff,dfs
b,fdfsfgggddgg
c,sddddd

Code:
 
output
 
b,fvvgeggsegg,dvs
b,fgeggefddddddddddg
b,dfqefeff,dfs
b,fdfsfgggddgg
d,sddddd

My working slow code

Code:
 
 
awk -F',' 'NR==FNR{a[$1]=$2;next}{ for (i in a) gsub(i,a[i])}1'  lookup mainfile

I need to speed this up , as the mainfile is 5GB and it is taking forever.
Right now the code is replacing all occurrences of the lookup strings in the mainfile, but actually I need to do that for only the first column,,is there a way to speed this up?

thanks a lot , I`m using the latest cygwin with 24 gigs of memory.
# 2  
Old 10-14-2014
Code:
awk -F',' 'NR==FNR{a[$1]=$2;next} $1 in a {$1=a[$1]}1' OFS=,  lookup mainfile

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 10-14-2014
gsub(i,a[i], $1) for just the first column, assuming awk agrees with you on what your columns are. Be sure to set -v OFS="," or it will strip out your commas and replace them with spaces.

How big is your statement file? If you could use a program to convert it into a big sed statement that might be faster.
This User Gave Thanks to Corona688 For This Post:
# 4  
Old 10-14-2014
the lookup file is 10,000 records.. so not very big.
# 5  
Old 10-14-2014
That's plenty big enough for an argument unfortunately, especially when the statement is going to be fairly complex to avoid skipping over columns.
# 6  
Old 10-14-2014
Code:
akshay@nio:/tmp$ cat lookup
a,b
c,d

Code:
akshay@nio:/tmp$ cat main
a,fvvgeggsegg,dvs
a,fgeggefddddddddddg
b,dfqefeff,dfs
b,fdfsfgggddgg
c,sddddd

Code:
akshay@nio:/tmp$ perl -F, -wnlae 'BEGIN{$, = "," }if(++$FNR == $.){ $hash{$F[0]} = $F[1] }else {$F[0] = $hash{$F[0]} if defined $hash{$F[0]}; print @F} $FNR = 0 if eof' lookup main
b,fvvgeggsegg,dvs
b,fgeggefddddddddddg
b,dfqefeff,dfs
b,fdfsfgggddgg
d,sddddd

# 7  
Old 10-14-2014
If your input files are sorted then consider using join...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Speeding up shell script with grep

HI Guys hoping some one can help I have two files on both containing uk phone numbers master is a file which has been collated over a few years ad currently contains around 4 million numbers new is a file which also contains 4 million number i need to split new nto two separate files... (4 Replies)
Discussion started by: dunryc
4 Replies

2. Shell Programming and Scripting

Help speeding up script

This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up? cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";... (20 Replies)
Discussion started by: JohnN6
20 Replies

3. UNIX for Dummies Questions & Answers

Multiple substitutions in one expression using sed

Hi, I'm trying to get multiple substitutions in one expression using sed: echo "-foo-_-bar--foo-_bar_-_foo_bar_-foo_-_bar_-" | sed -e "s//-/g" So, as you can see I'm trying to replace all instances of _-, -_, -- with - (dash) I have provided bad example. The question is how to use multiple... (6 Replies)
Discussion started by: useretail
6 Replies

4. Shell Programming and Scripting

How can I write nested command substitutions?

Hello How can write the nested command substitutions? echo `expr substr $x 1 expr ${#x} - 1` the above code is not working! Thanks in advance Regards Chetanz (5 Replies)
Discussion started by: Chetanz
5 Replies

5. Shell Programming and Scripting

Speeding up search and replace in a for loop

Hello, I am using sed in a for loop to replace text in a 100MB file. I have about 55,000 entries to convert in a csv file with two entries per line. The following script works to search file.txt for the first field from conversion.csv and then replace it with the second field. While it works fine,... (15 Replies)
Discussion started by: pbluescript
15 Replies

6. Shell Programming and Scripting

Two substitutions in one echo

PHOST1=temp i=1 I want to display the value of PHOST1 by making use of variable i inplace of 1 something like this echo "$PHOST$i" # -> This doesn't seem to work. Please provide me the correct syntax. I tried many different ways echo ${PHOST${i}} echo ${PHOST Nothing seems... (6 Replies)
Discussion started by: blazer789
6 Replies

7. Shell Programming and Scripting

arrays and substitutions

I am working on a bash script and ran around this issue. here's the code : #!/bin/bash string="\"bin\" \"barn\" \"bin, barn /\"" array=($string) echo -e "\nMethod 1\narray is ---> ${array}" echo -e "array=($string)" array=("bin" "barn" "bin, barn /") echo -e "\nMethod 2\narray is... (4 Replies)
Discussion started by: titou_dude
4 Replies

8. Shell Programming and Scripting

Multiple variable substitutions

Is there anyway to accomplish this? (ksh) FILES_TO_PROCESS='NAME1 NAME2' SOURCE_NAME1=/tmp/myfile TARGET_NAME1=/somewhere/else # other file names for i in $FILES_TO_PROCESS do file1=SOURCE_$i file2=TARGET_$i echo cp ${$file1} ${$file2} <-- how do get this to work. done (2 Replies)
Discussion started by: koondog
2 Replies

9. Shell Programming and Scripting

Perl - nested substitutions

How can I nest substitutions ? My solution just seems cheap ... sample data Cisco Catalyst Operating System Software, Version 235.5(18) Cisco Catalyst Operating System Software, Version 17.6(7) Cisco Catalyst Operating System Software, Version 19.6(7) Cisco Catalyst Operating System... (1 Reply)
Discussion started by: popeye
1 Replies

10. Shell Programming and Scripting

Speeding up processing a file

Hi guys, I'm hoping you can help me here. I've knocked up a script that looks at a (huge) log file, and pulls from each line the hour of each transaction and how long each transaction took. The data is stored sequentially as: 07:01 blah blah blah 12456 blah 07:03 blah blah blah 234 blah... (4 Replies)
Discussion started by: dlam
4 Replies
Login or Register to Ask a Question