Search and replace ---A huge number of files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search and replace ---A huge number of files
# 1  
Old 06-06-2013
Search and replace ---A huge number of files

Hello Friends,

I have the below scenario in my current project. Suggest me which tool ( perl,python etc) is best to this scenario. Or should I go for Programming language ( C/Java )..

(1) I will be having a very big file ( information about 200million subscribers will be stored in it ). This is a static data and will be getting changed once in a month. fields in this file will be like : AAA, BBB , CCC

(2)I have to process input data ( number of records / hour could be around 100 million records ) ( fields : AAA , XXX , YYY , ZZZ etc ).
A lookup needs to be made for each field in input file with file in step(1) and produce output : AAA , XXX , YYY , ZZZ , CCC. ( Lookup will be based on field "AAA" ).

Any suggestion on:
how to process each and every input record against such big static file?

Regards,
Ravi
# 2  
Old 06-06-2013
The very best tool for this is a database application - mysql, oracle, etc. Create an indexed table from your "big file", update it once a month. You gain scalability, meaning you can write one small db app, and run many separate parallel processes. Or threads.

Otherwise you would need a hash of 200 million records to do real time lookups. Not that this is not possible, it just seems like an unstable or error prone approach to me.
Plus it may not scale well as load increases.

So, with no database you need major hash support in your app- and tons of free memory
Code:
200 million * [big file record size]

probably way more 4GB.

perl, ruby, C will work either with or without a db. Shell/awk will not work at all well.
# 3  
Old 06-06-2013
Hi Jim,

Thanks for the suggestion.

I got your point and was thinking in the same way.

However, If I store the "static data" in DB,

While processing each and every record ( input dynamic data which is around 100 million / hour ) , I have to do a DB lookup for each and every record...!

Is not that expensive?
# 4  
Old 06-06-2013
I agree with Jim's point of vue, I am lucky, in such case I go see my firends one level lower, since I am responsible of their architecture I do ahve some favors when needed: I use SAS... but SAS cost is $$$
What is the expected file size?
# 5  
Old 06-06-2013
Hi Vbe ,

Not sure which file you are referring to here: Expected file size of?

The static file will be having 200 million records with record having around characters in it approx. This file i can store in DB which is a one time task ( once in a month ofcourse).

Now,my worry is , I have to do a DB look up for each and every input record I receive and extract some value from "DB" and do the changes in input record and produce the output.

The input record will be around 200~250 bytes in length and approx 100 million records needs to be processed per hour.

Any suggestions?

Regards,
Ravi
# 6  
Old 06-06-2013
Quote:
Originally Posted by panyam
While processing each and every record ( input dynamic data which is around 100 million / hour ) , I have to do a DB lookup for each and every record...!

Is not that expensive?
Certainly less expensive than looking up records in a flat file!

Or do you mean that you coudn't just "do x for all records" in a database? Because actually, you can.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a folder with huge number of files in n folders

We have a folder XYZ with large number of files (>350,000). how can i split the folder and create say 10 of them XYZ1 to XYZ10 with 35,000 files each. (doesnt matter which files go where). (12 Replies)
Discussion started by: AlokKumbhare
12 Replies

2. Shell Programming and Scripting

search a number in very very huge amount of data

Hi, I have to search a number in a very long listing of files.the total size of the files in which I have to search is 10 Tera Bytes. How to search a number in such a huge amount of data effectively.I used fgrep but it is taking many hours to search. Is there any other feasible solution to... (3 Replies)
Discussion started by: vsachan
3 Replies

3. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

4. Shell Programming and Scripting

How to delete a huge number of files at a time

I met a problem on HPUX with 64G RAM and 20 CPU. There are 5 million files with file name from file0000001.dat to file9999999.dat, in the same directory, and with some other files with random names. I was trying to remove all the files from file0000001.dat to file9999999.dat at the same time.... (9 Replies)
Discussion started by: lisp21
9 Replies

5. Shell Programming and Scripting

highly specific search and replace for a large number of files

hey guys, I have a directory with about 600 files. I need to find a specific word inside a command and replace only that instance of the word in many files. For example, lets say I have a command called 'foo' in many files. One of the input arguments of the 'foo' call is 'bar'. The word 'bar'... (5 Replies)
Discussion started by: ksubrama
5 Replies

6. UNIX for Dummies Questions & Answers

Search and replace a number

a=`grep -i a.sh filename.sh|cut -d "|" -f4` b=`expr $a + 1` filename=`grep -i a.sh filename.sh` while read line do echo $line echo $filename if then echo "entered if" nawk ' BEGIN { FS="|"; OFS="|" } { sub('$a', '$b', $4) print $0}' filename.sh fi echo "exit if" done <... (1 Reply)
Discussion started by: hs.giri
1 Replies

7. UNIX for Advanced & Expert Users

Search and replace a number

a=`grep -i a.sh filename.sh|cut -d "|" -f4` b=`expr $a + 1` filename=`grep -i a.sh filename.sh` while read line do echo $line echo $filename if then echo "entered if" nawk ' BEGIN { FS="|"; OFS="|" } { sub('$a', '$b', $4) print $0}' filename.sh fi echo "exit if" done <... (1 Reply)
Discussion started by: hs.giri
1 Replies

8. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

9. Shell Programming and Scripting

awk - replace number of string length from search and replace for a serialized array

Hello, I really would appreciate some help with a bash script for some string manipulation on an SQL dump: I'd like to be able to rename "sites/WHATEVER/files" to "sites/SOMETHINGELSE/files" within the sql dump. This is quite easy with sed: sed -e... (1 Reply)
Discussion started by: otrotipo
1 Replies
Login or Register to Ask a Question