Bash script search, improve performance with large files Post: 303033012

Sponsored Content

Top Forums Shell Programming and Scripting Bash script search, improve performance with large files Post 303033012 by SDohmen on Thursday 28th of March 2019 11:21:00 AM

03-28-2019

Registered User

Quote:

Originally Posted by RudiC

Would you mind to also time the proposal in post #3?

I actually did but i edited in the post after. Smilie

Code:

awk  prijslijst_filter.csv lowercase_winnaar.csv  9,51s user 0,13s system 99% cpu 9,647 total

Since the difference between the grep and this newer awk is only mere seconds i am not sure which i am going to use. The awk one is prefered as it is a drop in solution for the current one but the grep one is still quite alot faster.

Grep has also the advantage that it responds better with the ignore case part. I never seem to get this properly working on the awk one even with the forced lowercase on both files.

I just tried your awk solution again RudiC and it seems something is wrong with it . I did not check the first time because i had to leave right after i tested it (the files got overwritten after).

It seems the part you gave does not give any files to continue the rest of the script.

Code:

awk '
NR==FNR                 {SRCH=SRCH DL $0
                         DL = "|"
                         next
                        }
tolower($0) ~ SRCH      {print > "'"$PAD/removed_woord_blaat33.csv"'"
                         next
                        }

                        {print > "'"$PAD/filtered_winnaar_blaat33.csv"'"
                        }
' prijslijst_filter.csv lowercase_winnaar.csv

I tried with and without time to see if that caused the issue but it did not change the outcome. Both new files arent created.

Last edited by SDohmen; 03-28-2019 at 12:29 PM.. Reason: new info

SDohmen

View Public Profile for SDohmen

Find all posts by SDohmen

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar

2. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF...

3. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very...

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search...

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100

6. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence...

7. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |...

8. Shell Programming and Scripting

Copying large files in a bash script stops execution

Hello, I'm new to this forum and like to first of all say hello to everyone. I've got a really annoying problem at the moment. I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync. This procedure is stored in a...

9. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but...

LEARN ABOUT DEBIAN

csv2po

csv2po(1)						      Translate Toolkit 1.9.0							 csv2po(1)

NAME

       csv2po - convert Comma-Separated Value (.csv) files to Gettext PO localization files

SYNOPSIS

       csv2po  [--version]  [-h|--help]  [--manpage]  [--progress  PROGRESS]  [--errorlevel  ERRORLEVEL] [-i|--input] INPUT [-x|--exclude EXCLUDE]
       [-o|--output] OUTPUT [-t|--template TEMPLATE] [--charset CHARSET] [--columnorder] [--duplicates DUPLICATESTYLE]

DESCRIPTION

       See: http://translate.sourceforge.net/wiki/toolkit/csv2po for examples and usage instructions

OPTIONS

       --version
	      show program's version number and exit

       -h/--help
	      show this help message and exit

       --manpage
	      output a manpage based on the help

       --progress
	      show progress as: dots, none, bar, names, verbose

       --errorlevel
	      show errorlevel as: none, message, exception, traceback

       -i/--input
	      read from INPUT in csv format

       -x/--exclude
	      exclude names matching EXCLUDE from input paths

       -o/--output
	      write to OUTPUT in po, pot formats

       -t/--template
	      read from TEMPLATE in pot, po, pot formats

       --charset
	      set charset to decode from csv files

       --columnorder
	      specify the order and position of columns (location,source,target)

       --duplicates
	      what to do with duplicate strings (identical source text): merge, msgctxt (default: 'msgctxt')

							      Translate Toolkit 1.9.0							 csv2po(1)