03-29-2019
Quote:
Originally Posted by
Peasant
When processing extremely large files you might consider using split first.
Then in multicore environments spawn several awks or greps to process it in parallel from shell script.
There are also GNU tools which offer parallelism without shell logic.
Should be a bit tougher to program, but processing time will be reduced significantly if you have cores and disks are fast to service.
Memory also comes in play, since split will read the files, and operating system will cache those files in memory, if the same is available.
Making those awks or greps processes much faster on read operations.
Of course, limit being free memory on the system and configuration of the file system caching in general.
In default configurations file system caching will be able to use a large portion free memory on most linux / unix systems i've seen.
Hope that helps
Regards
Peasant.
This sounds very interesting but there are 2 issues.
1. I have to split the files in smaller files (around 5k i guess) which isn't a big deal but a little bit annoying.
2. Since this is running in a script i have no idea how to call multiple instances of awk at the same time. Everything i know says that it handles each part of the script after each other and not at the same time. If you have an idea how to accomplish that please let me know since it does sound interesting/promising.
CPU and MEM arent the issue as they are sufficient. The only thing that can stall the script are the other scripts that are running also. I tried spreading them out as much as possible but some just take quite long to run and thats why i want to slim them down so they dont run together.
9 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies
2. Shell Programming and Scripting
I have a data file of 2 gig
I need to do all these, but its taking hours, any where i can improve performance, thanks a lot
#!/usr/bin/ksh
echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')"
function showHelp {
cat << EOF >&2
syntax extreme.sh FILENAME
Specify filename to parse
EOF... (3 Replies)
Discussion started by: sirababu
3 Replies
3. Shell Programming and Scripting
Hi Friends,
I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process.
Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies
4. Shell Programming and Scripting
Hi All,
I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately.
Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies
5. Programming
Input file:
#content_1
12314345345
242467
#content_14
436677645
576577657
#content_100
3425546
56
#content_12
243254546
1232454
.
.
Reference file:
content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies
6. Shell Programming and Scripting
I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size.
Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files.
If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies
7. UNIX for Dummies Questions & Answers
Hi ,
i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script
#!/bin/bash
file=$1
ofile=$2
cp $file $ofile
mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13
mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies
8. Shell Programming and Scripting
Hello,
I'm new to this forum and like to first of all say hello to everyone.
I've got a really annoying problem at the moment.
I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync.
This procedure is stored in a... (3 Replies)
Discussion started by: wex_storm
3 Replies
9. Programming
Hello,
Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies
LEARN ABOUT HPUX
rc.config
rc.config(4) Kernel Interfaces Manual rc.config(4)
NAME
rc.config, rc.config.d - files containing system configuration information
SYNOPSIS
DESCRIPTION
The system configuration used at startup is contained in files within the directory The file sources all of the files within and and
exports their contents to the environment.
/etc/rc.config
The file is a script that sources all of the scripts, and also sources To read the configuration definitions, only this file need be
sourced. This file is sourced by whenever it is run, such as when the command is run to transition between run states. Each file that
exists in is sourced, without regard to which startup scripts are to be executed.
/etc/rc.config.d
The configuration information is structured as a directory of files, rather than as a single file containing the same information. This
allows developers to create and manage their own configuration files here, without the complications of shared ownership and access of a
common file.
/etc/rc.config.d/* Files
This is where files containing configuration variable assignments are located.
Configuration scripts must be written to be read by the POSIX shell, and not the Bourne shell, or In some cases, these files must also be
read and possibly modified by control scripts or the sam program. See sd(4) and sam(1M). For this reason, each variable definition must
appear on a separate line, with the syntax:
No trailing comments may appear on a variable definition line. Comment statements must be on separate lines, with the comment character in
column one. This example shows the required syntax for configuration files:
Configuration variables may be declared as array parameters when describing multiple instances of the variable configuration. For example,
a system may contain two network interfaces, each having a unique IP address and subnet mask (see ifconfig(1M)). An example of such a dec-
laration is as follows:
Note that there must be no requirements on the order of the files sourced. This means configuration files must not refer to variables
defined in other configuration files, since there is no guarantee that the variable being referenced is currently defined. There is no
protection against environment variable namespace collision in these configuration files. Programmers must take care to avoid such prob-
lems.
/etc/TIMEZONE
The file contains the definition of the environment variable. This file is required by POSIX. It is sourced by at the same time the files
are sourced.
SEE ALSO
rc(1M).
rc.config(4)