Sponsored Content
Top Forums Shell Programming and Scripting Bash script search, improve performance with large files Post 303032995 by SDohmen on Thursday 28th of March 2019 06:37:00 AM
Old 03-28-2019
Bash script search, improve performance with large files

Hello,


For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the searching.


The part that i use is as follows:


Code:
awk -F";" '
NR==FNR         {id[$0]
                 next
                }
                {for (SP in id) if (tolower($0) ~ SP)    {print > "'"$PAD/removed_woord.csv"'"
                                                 next
                                                }
                }
                {print > "'"$PAD/filtered_winnaar_2.csv"'"
                }
' $PAD/prijslijst_filter.csv $PAD/lowercase_winnaar.csv



I got this piece of programming also from this forum but i added the tolower part myself since not always it seem to get all results from the main file. 1 important part is that the results from filtering need to be saved in another file. The filtered file only contains the not found lines of course.

Last edited by joeyg; 04-04-2019 at 09:58 AM..
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies

2. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF... (3 Replies)
Discussion started by: sirababu
3 Replies

3. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

6. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

7. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies

8. Shell Programming and Scripting

Copying large files in a bash script stops execution

Hello, I'm new to this forum and like to first of all say hello to everyone. I've got a really annoying problem at the moment. I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync. This procedure is stored in a... (3 Replies)
Discussion started by: wex_storm
3 Replies

9. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies
CSV(3pm)						User Contributed Perl Documentation						  CSV(3pm)

NAME
Class::CSV - Class based CSV parser/writer SYNOPSIS
use Class::CSV; my $csv = Class::CSV->parse( filename => 'test.csv', fields => [qw/item qty sub_total/] ); foreach my $line (@{$csv->lines()}) { $line->sub_total('$'. sprintf("%0.2f", $line->sub_total())); print 'Item: '. $line->item(). " ". 'Qty: '. $line->qty(). " ". 'SubTotal: '. $line->sub_total(). " "; } my $cvs_as_string = $csv->string(); $csv->print(); my $csv = Class::CSV->new( fields => [qw/userid username/], line_separator => " "; ); $csv->add_line([2063, 'testuser']); $csv->add_line({ userid => 2064, username => 'testuser2' }); DESCRIPTION
This module can be used to create objects from CSV files, or to create CSV files from objects. Text::CSV_XS is used for parsing and creating CSV file lines, so any limitations in Text::CSV_XS will of course be inherant in this module. EXPORT None by default. METHOD
CONSTRUCTOR parse the parse constructor takes a hash as its paramater, the various options that can be in this hash are detailed below. Required Options o fields - an array ref containing the list of field names to use for each row. there are some reserved words that cannot be used as field names, there is no checking done for this at the moment but it is something to be aware of. the reserved field names are as follows: "string", "set", "get". also field names cannot contain whitespace or any characters that would not be allowed in a method name. Source Options (only one of these is needed) o filename - the path of the CSV file to be opened and parsed. o filehandle - the file handle of the CSV file to be parsed. o objects - an array ref of objects (e.g. Class::DBI objects). for this to work properly the field names provided in fields needs to correspond to the field names of the objects in the array ref. o classdbi_objects - depreciated use objects instead - using classdbi_objects will still work but its advisable to update your code. Optional Options o line_separator - the line seperator to be included at the end of every line. defaulting to " " (unix carriage return). new the new constructor takes a hash as its paramater, the same options detailed in parse apply to new however no Source Options can be used. this constructor creates a blank CSV object of which lines can be added via add_line. ACCESSING lines returns an array ref containing objects of each CSV line (made via Class::Accessor). the field names given upon construction are available as accessors and can be set or get. for more information please see the notes below or the perldoc for Class::Accessor. the lines accessor is also able to be updated/retrieved in the same way as individual lines fields (examples below). Example retrieving the lines: my @lines = @{$csv->lines()}; removing the first line: pop @lines; $csv->lines(@lines); sorting the lines: @lines = sort { $a->userid() <=> $b->userid() } @lines: $csv->lines(@lines); sorting the lines (all-in-one way): $csv->lines([ sort { $a->userid() <=> $b->userid() } @{$csv->lines()} ]); Retrieving a fields value there is two ways to retrieve a fields value (as documented in Class::Accessor). firstly you can call the field name on the object and secondly you can call "get" on the object with the field name as the argument (multiple field names can be specified to retrieve an array of values). examples are below. my $value = $line->test(); OR my $value = $line->get('test'); OR my @values = $line->get(qw/test test2 test3/); Setting a fields value setting a fields value is simmilar to getting a fields value. there are two ways to set a fields value (as documented in Class::Accessor). firstly you can simply call the field name on the object with the value as the argument or secondly you can call "set" on the object with a hash of fields and their values to set (this isn't standard in Class::Accessor, i have overloaded the "set" method to allow this). examples are below. $line->test('123'); OR $line->set( test => '123' ); OR $line->set( test => '123', test2 => '456' ); Retrieving a line as a string to retrieve a line as a string simply call "string" on the object. my $string = $line->string(); new_line returns a new line object, this can be useful for to "splice" a line into lines (see example below). you can pass the values of the line as an ARRAY ref or a HASH ref. Example my $line = $csv->new_line({ userid => 123, domainname => 'splicey.com' }); my @lines = $csv->lines(); splice(@lines, 1, 0, $line); OR splice(@{$csv->lines()}, 1, 0, $csv->new_line({ userid => 123, domainname => 'splicey.com' })); add_line adds a line to the lines stack. this is mainly useful when the new constructor is used but can of course be used with any constructor. it will add a new line to the end of the lines stack. you can pass the values of the line as an ARRAY ref or a HASH ref. examples of how to use this are below. Example $csv->add_line(['house', 100000, 4]); $csv->add_line({ item => 'house', cost => 100000, bedrooms => 4 }); OUTPUT string returns the object as a string (CSV file format). print calls "print" on string (prints the CSV to STDOUT). SEE ALSO
Text::CSV_XS, Class::Accessor AUTHOR
David Radunz, <david@boxen.net> COPYRIGHT AND LICENSE
Copyright 2004 by David Radunz This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.0 2007-02-08 CSV(3pm)
All times are GMT -4. The time now is 07:24 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy