Sponsored Content
Top Forums Shell Programming and Scripting Bash script search, improve performance with large files Post 303033258 by SDohmen on Tuesday 2nd of April 2019 08:11:57 AM
Old 04-02-2019
Quote:
Originally Posted by RudiC
You might want to build an "alternation regex", with not too many keywords, and modify the matching slightly. Compare performance of

Code:
awk '
NR==FNR                 {SRCH=SRCH DL $0
                         DL = "|"
                         next
                        }
tolower($0) ~ SRCH      {print > "'"$PAD/removed_woord.csv"'"
                         next
                        }

                        {print > "'"$PAD/filtered_winnaar_2.csv"'"
                        }
' file3 file4 

real    0m2,328s
user    0m2,318s
sys    0m0,005s

to this


Code:
time awk '
NR==FNR         {id[$0]
                 next
                }
                {for (SP in id) if (tolower($0) ~ SP)    {print > "'"$PAD/removed_woord.csv"'"
                                                 next
                                                }
                }
                {print > "'"$PAD/filtered_winnaar_2.csv"'"
                }
' file3 file4
real    0m17,038s
user    0m16,995s
sys    0m0,025s

seems to make a factor of roughly 7. The output seems to be identical. Please try and report back.



I just did this one again and i got it working. I noticed the -F";" was missing so i added that and it worked flawlessly. The complete script runs in about 20 sec now which was more then 7 min first.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies

2. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF... (3 Replies)
Discussion started by: sirababu
3 Replies

3. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

6. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

7. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies

8. Shell Programming and Scripting

Copying large files in a bash script stops execution

Hello, I'm new to this forum and like to first of all say hello to everyone. I've got a really annoying problem at the moment. I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync. This procedure is stored in a... (3 Replies)
Discussion started by: wex_storm
3 Replies

9. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies
ATF-REPORT(1)						    BSD General Commands Manual 					     ATF-REPORT(1)

NAME
atf-report -- transforms the output of atf-run to different formats SYNOPSIS
atf-report [-o fmt1:path1 [.. -o fmtN:pathN]] atf-report -h DESCRIPTION
atf-report reads the output of atf-run and transforms it to different formats. Some of these are user-friendly and others are machine- parseable, which opens a wide range of possibilities to analyze the results of a test suite's execution. See Output formats below for more details on which these formats are. In the first synopsis form, atf-report reads the output of atf-run through its standard input and, if no -o options are given, prints a user- friendly report on its standard output using the 'ticker' format. If -o options are provided (more than one are allowed), they specify the complete list of reports to generate. They are all generated simultaneously, and for obvious reasons, two reports cannot be written to the same file. Note that the default output is suppressed when -o is provided. In the second synopsis form, atf-report will print information about all supported options and their purpose. The following options are available: -h Shows a short summary of all available options and their purpose. -o fmt:path Adds a new output format. fmt is one of the formats described later on in Output formats. path specifies where the report will be written to. Depending on the chosen format, this may refer to a single file or to a directory. For those formats that write to a single file, specifying a '-' as the path will redirect the report to the standard output. Output formats The following output formats are allowed: csv A machine-parseable Comma-Separated Values (CSV) file. This file contains the results for all test cases and test programs. Test cases are logged using the following syntax: tc, duration, test-program, test-case, result[, reason] The 'result' field for test cases is always one of 'passed', 'skipped' or 'failed'. The last two are always followed by a reason. Test programs are logged with the following syntax: tp, duration, test-program, result[, reason] In this case, the 'result' can be one of: 'passed', which denotes test programs that ran without any failure; 'failed', which refers to test programs in which one or more test cases failed; or 'bogus', which mentions those test programs that failed to exe- cute by some reason. The reason field is only available in the last case. The time required to execute each test case and test program is also provided. You should not rely on the order of the entries in the resulting output. ticker A user-friendly report that shows the progress of the test suite's execution as it operates. This type of report should always be redirected to a virtual terminal, not a file, as it may use control sequences that will make the output unreadable in regular files. xml A report contained in a single XML file. Ideal for later processing with xsltproc(1) to generate nice HTML reports. EXAMPLES
The most simple way of running a test suite is to pipe the output of atf-run through atf-report without any additional flags. This will use the default output format, which is suitable to most users: atf-run | atf-report In some situations, it may be interesting to get a machine-parseable file aside from the standard report. This can be done as follows: atf-run | atf-report -o csv:testsuite.csv -o ticker:- Or if the standard report is not desired, thus achieving completely silent operation: atf-run | atf-report -o csv:testsuite.csv SEE ALSO
atf-run(1), atf(7) BSD
December 16, 2011 BSD
All times are GMT -4. The time now is 10:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy