Sponsored Content
Top Forums Shell Programming and Scripting Bash script search, improve performance with large files Post 303033020 by Peasant on Thursday 28th of March 2019 12:40:10 PM
Old 03-28-2019
When processing extremely large files you might consider using split first.
Then in multicore environments spawn several awks or greps to process it in parallel from shell script.
There are also GNU tools which offer parallelism without shell logic.

Should be a bit tougher to program, but processing time will be reduced significantly if you have cores and disks are fast to service.

Memory also comes in play, since split will read the files, and operating system will cache those files in memory, if the same is available.
Making those awks or greps processes much faster on read operations.

Of course, limit being free memory on the system and configuration of the file system caching in general.
In default configurations file system caching will be able to use a large portion free memory on most linux / unix systems i've seen.

Hope that helps
Regards
Peasant.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies

2. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF... (3 Replies)
Discussion started by: sirababu
3 Replies

3. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

6. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

7. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies

8. Shell Programming and Scripting

Copying large files in a bash script stops execution

Hello, I'm new to this forum and like to first of all say hello to everyone. I've got a really annoying problem at the moment. I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync. This procedure is stored in a... (3 Replies)
Discussion started by: wex_storm
3 Replies

9. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies
filecache_max(5)						File Formats Manual						  filecache_max(5)

NAME
filecache_max, filecache_min - maximum or minimum amount of physical memory used for caching file I/O data VALUES
Failsafe Default The defaults are computed and adjusted automatically, based on the amount physical memory on the system. The current default value, when displayed by will be expressed as the current value used internally by the system (representing bytes). The displayed value will vary as the internal value is automatically adjusted. Allowed values not less than the equivalent of 1 megabyte not less than the equivalent of 1 megabyte not greater than the equivalent of 70% of physical memory not greater than the equivalent of 90% of physical memory See below. Values can be specified as: 1) Default 2) A percentage of total physical memory: A positive whole number followed by a percent symbol (for example, 70%). 3) A constant value: A positive whole number that represents number of bytes of physical memory, optionally followed by a multiplier suffix, where and Recommended values It is recommended that these tunables are left in the automatic (default) state, to allow the system to better balance the memory usage among filesystem I/O-intensive processes and other types of processes. DESCRIPTION
These tunables control the amount of physical memory that can be used for caching file data during file system I/O operations. The amount of physical memory that is specified by the tunable is reserved and guaranteed to be available for file caching. The amount of physical memory used for file caching can grow beyond up to depending on I/O load and competing requests for physical memory. When these tunables are set to default, or set to a percent value, they automatically adjust with Online Addition or Deletion (OL*) of physical memory, as appropriate. Who Is Expected to Change These Tunables? The automatic (default) state should be appropriate for most environments. You must set these tunables to a constant value (not default or percent) if you want to specify file cache limits with finer granularity than percent of physical memory (for example, a minimum size or fixed size of <1% of physical memory). You must set these tunables to a constant value (not default or percent) if you do not want the limits of the file cache to adjust with OL* of physical memory. To discriminate in favor of deterministic I/O on systems with large file I/O activity, or on the contrary, to discriminate in favor of bet- ter performance of non-I/O-intensive processes, you can consider changing the values of these tunables, keeping in mind the side effects as described below. To determine a reasonable value for the cache size you should consider the file I/O-intensive applications on your system, and the size of their working set. Depending on the type of applications, the working set size can be based on the size of a transaction, or data size in given unit of time. For example, for a conservative value of in megabytes, you can use the following formula: number-of-records-in-working-set Only those processes that actively use disk I/O for file data should be included in the calculation. All others can be excluded. Here are some examples of what processes should be included in or excluded from the calculation. Include: NFS daemons, text formatters, database management applications, text editors, compilers, and so on, that access or use source and/or output files stored in one or more file systems mounted on the system. Exclude: X-display applications, login shells, system daemons, or connections, and so on. These processes use very little, if any, disk I/O for file data. Restrictions on Changing These tunables are dynamic and automatic. The system rounds the specified tunable value(s) down to the closest physical page boundary. The amount of physical memory represented by must be equal to or less than the memory represented by tunable Setting these tunables to a constant value will de-couple them from OL* events. Tuning up of the parameter may fail if there is not enough free physical memory to satisfy the request. These tunables, and must both be set to a relative state (either default of percent state), or must both be set to a constant value. For example, the following are acceptable settings: The following will result in an error: If is currently set to default, the following is acceptable: But the following will result in an error: See other restrictions in the section above. When Should the Value of These Tunables Be Raised? Low system performance at initialization time and/or on a system with filesystem I/O-intensive processes may be an indication that the val- ues of these tunables are too low. If there is a large number of processes actively and constantly using file data I/O, you should raise the value of for more deterministic I/O. In most cases, especially when the file data I/O is expected to peak only occasionally, it is recommended that the value of the maximum limit, is raised instead. What Are the Side Effects of Raising the Values? The amount of memory reserved for the minimum file cache size, dictated by cannot be used on the system for other purposes. Be careful not to raise this value so high that it can eventually cause memory pressure and overall system performance degradation. When Should the Values of These Tunables Be Lowered? The value of the minimum limit, can be lowered to allow a larger percentage of memory to be used for purposes other than filesystem I/O caching, depending on competing requests. By lowering the value of a larger amount of memory is available for other purposes, without com- peting with file I/O requests. What Are the Side Effects of Lowering the Values? If there are many competing requests for physical memory, and the file cache tunables are set to too low a value, very high demand on file I/O operations can eventually cause filesystem I/O performance degradation. EXAMPLES
Set the file cache minimum to 10% of physical memory: Set a fixed size file cache of 1 gigabyte: Set the file cache minimum to 15% of physical memory, and the maximum to 65% of the physical memory: WARNINGS
All HP-UX kernel tunable parameters are release specific. These parameters may be removed or have their meaning changed in future releases of HP-UX. Other tunable parameters related to sizing the buffer cache that existed in previous HP-UX releases are now obsolete. The tunables and should be used to set limits to the file cache. Note that, on any given system, the optimum values of these two new tunables are not nec- essarily equivalent to the optimum values of the obsolete tunable values in the older systems. You should first determine if the new default values yield acceptable performance on your system, before attempting to change the values of the new file cache tunables. Installation of optional kernel software, from HP or other vendors, may cause changes to tunable parameter values. After installation, some tunable parameters may no longer be at the default or recommended values. For information about the effects of installation on tun- able values, consult the documentation for the kernel software being installed. For information about optional kernel software that was factory installed on your system, see at AUTHOR
and were developed by HP. SEE ALSO
kctune(1M), sam(1M), gettune(2), settune(2). Tunable Kernel Parameters filecache_max(5)
All times are GMT -4. The time now is 10:30 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy