Sponsored Content
Top Forums Shell Programming and Scripting Search and replace ---A huge number of files Post 302817635 by jim mcnamara on Thursday 6th of June 2013 07:45:02 AM
Old 06-06-2013
The very best tool for this is a database application - mysql, oracle, etc. Create an indexed table from your "big file", update it once a month. You gain scalability, meaning you can write one small db app, and run many separate parallel processes. Or threads.

Otherwise you would need a hash of 200 million records to do real time lookups. Not that this is not possible, it just seems like an unstable or error prone approach to me.
Plus it may not scale well as load increases.

So, with no database you need major hash support in your app- and tons of free memory
Code:
200 million * [big file record size]

probably way more 4GB.

perl, ruby, C will work either with or without a db. Shell/awk will not work at all well.
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - replace number of string length from search and replace for a serialized array

Hello, I really would appreciate some help with a bash script for some string manipulation on an SQL dump: I'd like to be able to rename "sites/WHATEVER/files" to "sites/SOMETHINGELSE/files" within the sql dump. This is quite easy with sed: sed -e... (1 Reply)
Discussion started by: otrotipo
1 Replies

2. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

3. UNIX for Advanced & Expert Users

Search and replace a number

a=`grep -i a.sh filename.sh|cut -d "|" -f4` b=`expr $a + 1` filename=`grep -i a.sh filename.sh` while read line do echo $line echo $filename if then echo "entered if" nawk ' BEGIN { FS="|"; OFS="|" } { sub('$a', '$b', $4) print $0}' filename.sh fi echo "exit if" done <... (1 Reply)
Discussion started by: hs.giri
1 Replies

4. UNIX for Dummies Questions & Answers

Search and replace a number

a=`grep -i a.sh filename.sh|cut -d "|" -f4` b=`expr $a + 1` filename=`grep -i a.sh filename.sh` while read line do echo $line echo $filename if then echo "entered if" nawk ' BEGIN { FS="|"; OFS="|" } { sub('$a', '$b', $4) print $0}' filename.sh fi echo "exit if" done <... (1 Reply)
Discussion started by: hs.giri
1 Replies

5. Shell Programming and Scripting

highly specific search and replace for a large number of files

hey guys, I have a directory with about 600 files. I need to find a specific word inside a command and replace only that instance of the word in many files. For example, lets say I have a command called 'foo' in many files. One of the input arguments of the 'foo' call is 'bar'. The word 'bar'... (5 Replies)
Discussion started by: ksubrama
5 Replies

6. Shell Programming and Scripting

How to delete a huge number of files at a time

I met a problem on HPUX with 64G RAM and 20 CPU. There are 5 million files with file name from file0000001.dat to file9999999.dat, in the same directory, and with some other files with random names. I was trying to remove all the files from file0000001.dat to file9999999.dat at the same time.... (9 Replies)
Discussion started by: lisp21
9 Replies

7. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

8. Shell Programming and Scripting

search a number in very very huge amount of data

Hi, I have to search a number in a very long listing of files.the total size of the files in which I have to search is 10 Tera Bytes. How to search a number in such a huge amount of data effectively.I used fgrep but it is taking many hours to search. Is there any other feasible solution to... (3 Replies)
Discussion started by: vsachan
3 Replies

9. Shell Programming and Scripting

Split a folder with huge number of files in n folders

We have a folder XYZ with large number of files (>350,000). how can i split the folder and create say 10 of them XYZ1 to XYZ10 with 35,000 files each. (doesnt matter which files go where). (12 Replies)
Discussion started by: AlokKumbhare
12 Replies
PEGASUS-CONFIG(1)														 PEGASUS-CONFIG(1)

NAME
pegasus-config - The authority for where parts of the Pegasus system exists on the filesystem. pegasus-config can be used to find libraries such as the DAX generators. SYNOPSIS
pegasus-config [-h] [--help] [-V] [--version] [--noeoln] [--perl-dump] [--perl-hash] [--python-dump] [--sh-dump] [--bin] [--conf] [--java] [--perl] [--python] [--python-externals] [--schema] [--classpath] [--local-site] [--full-local] DESCRIPTION
pegasus-config is used to find locations of Pegasus system components. The tool is used internally in Pegasus and by users who need to find paths for DAX generator libraries and schemas. OPTIONS
-h, --help Prints help and exits. -V, --version Prints Pegasus version information --perl-dump Dumps all settings in perl format as separate variables. --perl-hash Dumps all settings in perl format as single perl hash. --python-dump Dumps all settings in python format. --sh-dump Dumps all settings in shell format. --bin Print the directory containing Pegasus binaries. --conf Print the directory containing configuration files. --java Print the directory containing the jars. --perl Print the directory to include into your PERL5LIB. --python Print the directory to include into your PYTHONLIB. --python-externals Print the directory to the external Python libraries. --schema Print the directory containing schemas. --classpath Builds a classpath containing the Pegasus jars. --noeoln Do not produce a end-of-line after output. This is useful when being called from non-shell backticks in scripts. However, order is important for this option: If you intend to use it, specify it first. --local-site [d] Create a site catalog entry for site "local". This is only an XML snippet without root element nor XML headers. The optional argument "d" points to the mount point to use. If not specified, defaults to the user's $HOME directory. --full-local [d] Create a complete site catalog with only site "local". The an XML snippet without root element nor XML headers. The optional argument "d" points to the mount point to use. If not specified, defaults to the user's $HOME directory. EXAMPLE
To set the PYTHONPATH variable in your shell for using the Python DAX API: export PYTHONPATH=`pegasus-config --python` To set the same path inside Python: config = subprocess.Popen("pegasus-config --python-dump", stdout=subprocess.PIPE, shell=True).communicate()[0] exec config To set the PERL5LIB variable in your shell for using the Perl DAX API: export PERL5LIB=`pegasus-config --perl` To set the same path inside Perl: eval `pegasus-config --perl-dump`; die("Unable to eval pegasus-config output: $@") if $@; will set variables a number of lexically local-scoped my variables with prefix "pegasus_" and expand Perl's search path for this script. Alternatively, you can fail early and collect all Pegasus-related variables into a single global %pegasus variable for convenience: BEGIN { eval `pegasus-config --perl-hash`; die("Unable to eval pegasus-config output: $@") if $@; } AUTHOR
Pegasus Team http://pegasus.isi.edu 05/24/2012 PEGASUS-CONFIG(1)
All times are GMT -4. The time now is 03:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy