Sponsored Content
Top Forums Shell Programming and Scripting Bash script search, improve performance with large files Post 303033020 by Peasant on Thursday 28th of March 2019 12:40:10 PM
Old 03-28-2019
When processing extremely large files you might consider using split first.
Then in multicore environments spawn several awks or greps to process it in parallel from shell script.
There are also GNU tools which offer parallelism without shell logic.

Should be a bit tougher to program, but processing time will be reduced significantly if you have cores and disks are fast to service.

Memory also comes in play, since split will read the files, and operating system will cache those files in memory, if the same is available.
Making those awks or greps processes much faster on read operations.

Of course, limit being free memory on the system and configuration of the file system caching in general.
In default configurations file system caching will be able to use a large portion free memory on most linux / unix systems i've seen.

Hope that helps
Regards
Peasant.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies

2. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF... (3 Replies)
Discussion started by: sirababu
3 Replies

3. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

6. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

7. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies

8. Shell Programming and Scripting

Copying large files in a bash script stops execution

Hello, I'm new to this forum and like to first of all say hello to everyone. I've got a really annoying problem at the moment. I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync. This procedure is stored in a... (3 Replies)
Discussion started by: wex_storm
3 Replies

9. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies
tunefs(8)						      System Manager's Manual							 tunefs(8)

NAME
tunefs - Tunes an existing UFS file system SYNOPSIS
/usr/sbin/tunefs [-a maxcontig] [-d rotdelay] [-e maxbpg] [-m minfree] [-o optimization_preference] file_system FLAGS
Specifies the maximum number of contiguous blocks that will be laid out before forcing a rotational delay (see the -d flag). The default value is 8. Device drivers that can chain several buffers together in a single transfer should set this to the maximum chain length. Specifies the expected time (in milliseconds) to service a transfer completion interrupt and initiate a new transfer on the same disk. It is used to decide how much rotational spacing to place between successive blocks in a file. Indicates the maximum number of blocks any single file can allocate out of a cylinder group before it is forced to begin allocating blocks from another cylinder group. Typically, you set this value to about one quarter of the total blocks in a cylinder group. The intent is to prevent any single file from using up all the blocks in a single cylinder group, thus degrading access times for all files subsequently allocated in that cylinder group. The effect of this limit is to cause big files to do long seeks more frequently than if they were allowed to allocate all the blocks in a cylinder group before seeking elsewhere. For file systems with exclusively large files, this parameter should be set higher. Specifies the percentage of space held back from normal users; the minimum free space threshold. The default value used is 10%. This value can be set to zero, however up to a factor of three in throughput will be lost over the performance obtained at a 10% threshold. Note that if the value is raised above the current usage level, users will be unable to allocate files until enough files have been deleted to get under the higher threshold. Specifies whether the file system should try to minimize the time spent allocating blocks (-o time) or try to minimize the space fragmentation on the disk (-o space). If the value of minfree (see above) is less than 10%, then the file system should optimize for space to avoid running out of full sized blocks. For values of minfree greater than or equal to 10%, fragmentation is unlikely to be problematical, and the file system can be optimized for time. DESCRIPTION
The tunefs command changes the dynamic parameters of a UFS file system which affect the layout policies. The parameters which are to be changed are indicated by the flags specified. This program should work on mounted and active file systems. Because the superblock is not kept in the buffer cache, the changes will only take effect if the program is run on unmounted file systems. The system must be rebooted after the root file system is tuned. For larger-capacity devices, set minfree to five percent. The rotdelay value is useful for disks that do not have read-ahead cache, such as the RA-series disks. For disks that have read-ahead cache, set rotdelay to zero. When you specifiy an optimization preference, it only comes into play in the following circumstances: when a file is growing; when it is not possible to extend a fragment; and when there is a choice between the following paths: Allocating an exact-sized fragment Allocating a full block and freeing the unused portion of the block When you specify an optimization preference, the system will first try that method when it reaches the minimum reserved space specified in minfree. If you specify space optimization, the system will try that, but switch to time optimization when the file continues to grow and fragmentation is less than half of the minimum free reserve. If you specify time optimization, the system will try that, but switch to space optimization when the file growth causes disk fragmentation to reach within two percent of the minimum free reserve. You must be the root user to use this command. FILES
Specifies the command path RELATED INFORMATION
Commands: newfs(8) delim off tunefs(8)
All times are GMT -4. The time now is 11:28 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy