Sponsored Content
Top Forums Shell Programming and Scripting Bash script search, improve performance with large files Post 303033258 by SDohmen on Tuesday 2nd of April 2019 08:11:57 AM
Old 04-02-2019
Quote:
Originally Posted by RudiC
You might want to build an "alternation regex", with not too many keywords, and modify the matching slightly. Compare performance of

Code:
awk '
NR==FNR                 {SRCH=SRCH DL $0
                         DL = "|"
                         next
                        }
tolower($0) ~ SRCH      {print > "'"$PAD/removed_woord.csv"'"
                         next
                        }

                        {print > "'"$PAD/filtered_winnaar_2.csv"'"
                        }
' file3 file4 

real    0m2,328s
user    0m2,318s
sys    0m0,005s

to this


Code:
time awk '
NR==FNR         {id[$0]
                 next
                }
                {for (SP in id) if (tolower($0) ~ SP)    {print > "'"$PAD/removed_woord.csv"'"
                                                 next
                                                }
                }
                {print > "'"$PAD/filtered_winnaar_2.csv"'"
                }
' file3 file4
real    0m17,038s
user    0m16,995s
sys    0m0,025s

seems to make a factor of roughly 7. The output seems to be identical. Please try and report back.



I just did this one again and i got it working. I noticed the -F";" was missing so i added that and it worked flawlessly. The complete script runs in about 20 sec now which was more then 7 min first.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies

2. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF... (3 Replies)
Discussion started by: sirababu
3 Replies

3. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

6. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

7. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies

8. Shell Programming and Scripting

Copying large files in a bash script stops execution

Hello, I'm new to this forum and like to first of all say hello to everyone. I've got a really annoying problem at the moment. I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync. This procedure is stored in a... (3 Replies)
Discussion started by: wex_storm
3 Replies

9. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies
IGAWK(1)							 Utility Commands							  IGAWK(1)

NAME
igawk - gawk with include files SYNOPSIS
igawk [ all gawk options ] -f program-file [ -- ] file ... igawk [ all gawk options ] [ -- ] program-text file ... DESCRIPTION
Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1). AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like @include getopt.awk in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path. OPTIONS
See gawk(1) for a full description of the AWK language and the options that gawk supports. EXAMPLES
cat << EOF > test.awk @include getopt.awk BEGIN { while (getopt(ARGC, ARGV, "am:q") != -1) ... } EOF igawk -f test.awk SEE ALSO
gawk(1) Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995. AUTHOR
Arnold Robbins (arnold@skeeve.com). Free Software Foundation Nov 3 1999 IGAWK(1)
All times are GMT -4. The time now is 01:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy