search a number in very very huge amount of data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting search a number in very very huge amount of data
# 1  
Old 12-02-2011
Question search a number in very very huge amount of data

Hi,
I have to search a number in a very long listing of files.the total size of the files in which I have to search is 10 Tera Bytes.

How to search a number in such a huge amount of data effectively.I used fgrep but it is taking many hours to search. Is there any other feasible solution to search in such huge data. Any way to serach effectively?
# 2  
Old 12-02-2011
Hi,

If you want only the file name, the user -l in the grep. so that it will scan the entire file.

what you are searching ? and is that number occurs more time in the file ? ( or only one time )

you can try with perl or python to process the big file.
# 3  
Old 12-02-2011
The file format is like :
Code:
00.87.123.45|54999999999|98765|-|[15/NOV/2011:20:02:08 |"www.unix.com/"|GTR|1.1.1|UU|654|0012|0|[TTTTTTTT]|0432|text/html|SDEWRTRERERERERER==|1.4.5.6|fkjgkjfgfg|-|-|-|-|-|-|-|-|-|-|-|2|123456789|0|1||123,45,7654,76123|345|67|45654654645645645|54.67.4.345|323423423|4567|56098|-|

I used awk command to search.I already know that the serached pattern occurs in 2nd field.So compare the string only with 2nd field.but when I am using -v option to pass a variable to awk command but it is not searching the variable in the file, while if I hardcode the value then it is searching the variable.Is there any mistake in the below code?

script:
-----
Code:
variable=$1
for i in `ls *`
do
gunzip -c $i | awk -F"|" -v var="$variable" '$2~/variable/' >> output.txt &
done
 
 
 
./script 999999999


Last edited by Franklin52; 12-03-2011 at 11:49 AM.. Reason: Please use code tags for data and code samples, thank you
# 4  
Old 12-02-2011
Unless you've got a blindingly fast RAID setup, "hours" is to be expected for 10 terabytes no matter what you do.

That's a useless use of ls *.

You don't want 9 processes writing to the same file at the same time. They may interfere with each other, overwriting each others' lines, etc.

It's looking for the string 'variable' because you put it in //, just give it the variable. You didn't even name it 'variable' though, you named it 'var'. Try $2 ~ var

In summary, I'd do this:

Code:
zcat * | awk -F"|" -v var="$1" '$2 ~ var' > output.txt

Depending on whether your disk's faster than your processor or vice versa, there may be ways to speed this up by running multiple gunzip's at once. I'm not sure how to do that yet but I'll think about it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies

2. Shell Programming and Scripting

Aggregation of huge data

Hi Friends, I have a file with sample amount data as follows: -89990.3456 8788798.990000128 55109787.20 -12455558989.90876 I need to exclude the '-' symbol in order to treat all values as an absolute one and then I need to sum up.The record count is around 1 million. How... (8 Replies)
Discussion started by: Ravichander
8 Replies

3. Shell Programming and Scripting

Perl : Large amount of data put into an array

This basic code works. I have a very long list, almost 10000 lines that I am building into the array. Each line has either 2 or 3 fields as shown in the code snippit. The array elements are static (for a few reasons that out of scope of this question) the list has to be "built in". It... (5 Replies)
Discussion started by: sumguy
5 Replies

4. Shell Programming and Scripting

Search and replace ---A huge number of files

Hello Friends, I have the below scenario in my current project. Suggest me which tool ( perl,python etc) is best to this scenario. Or should I go for Programming language ( C/Java ).. (1) I will be having a very big file ( information about 200million subscribers will be stored in it ). This... (5 Replies)
Discussion started by: panyam
5 Replies

5. AIX

Error while copying huge amount of data in aix

Hi When i copy 300GB of data from one filesystem to the other filesystem in AIX I get the error : tar: 0511-825 The file 'SAPBRD.dat' is too large. The command I used is : # tar -cf - . | (cd /sapbackup ; tar -xf - ) im copying as root The below is my ulimit -a output : ... (3 Replies)
Discussion started by: samsungsamsung
3 Replies

6. Shell Programming and Scripting

search a string in a huge file

How to search a string which has occured numerous times in a single row. I tried many options, I am facing issue with the file size. Anything I go for, it says it is huge.. File is 82MB. Assume, the file contains the string 'Name' in many places.. Something Like below. ... (5 Replies)
Discussion started by: Muthuraj K
5 Replies

7. Shell Programming and Scripting

How to delete a huge number of files at a time

I met a problem on HPUX with 64G RAM and 20 CPU. There are 5 million files with file name from file0000001.dat to file9999999.dat, in the same directory, and with some other files with random names. I was trying to remove all the files from file0000001.dat to file9999999.dat at the same time.... (9 Replies)
Discussion started by: lisp21
9 Replies

8. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

9. Programming

Read/Write a fairly large amount of data to a file as fast as possible

Hi, I'm trying to figure out the best solution to the following problem, and I'm not yet that much experienced like you. :-) Basically I have to read a fairly large file, composed of "messages" , in order to display all of them through an user interface (made with QT). The messages that... (3 Replies)
Discussion started by: emitrax
3 Replies

10. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number... (1 Reply)
Discussion started by: ting123
1 Replies
Login or Register to Ask a Question