A faster way to read and search


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting A faster way to read and search
# 1  
Old 10-11-2016
A faster way to read and search

I have a simple script that reads in data from fileA.txt and searches line by line for that data in multiple files (*multfiles.txt). It only prints the data when there is more than 1 instance of it. The problem is that its really slow (3+ hours) to complete the entire process. There are nearly 1500 files to search through and each file has 450,000 lines.

Is there a faster method that I could use instead of 'read'.

Code:
while read line
do
   ext=$(grep "$line" *multfiles.txt | wc -l)
   if [ $ext -gt 1 ] ; then
   echo  $line $ext
   fi
done < fileA.txt > output.txt


sample fileA.txt
Code:
25.2292,-80.8958,29.2
25.2292,-80.8542,29.1
25.2292,-80.7292,29.0

sample *multfiles.txt
Code:
25.2292,-80.8958,29.2
25.2292,-80.8542,29.1
25.2292,-80.7292,29.5

Code:
25.2292,-80.8958,29.2
25.2292,-80.8542,29.1
25.2292,-80.7292,29.5

Code:
25.2292,-80.8958,29.2
25.2292,-80.8542,29.5
25.2292,-80.7292,29.5

sample output.txt
Code:
25.2292,-80.8958,29.2 3
25.2292,-80.8542,29.1 2

# 2  
Old 10-11-2016
Hello ncwxpanther,

I haven't tested it, could you please change code ext=$(grep "$line" *multfiles.txt | wc -l) with following and let me know if this helps you.
Code:
ext=$(find -type f -exec grep -c "$line" *multfiles.txt)

Thanks,
R. Singh
# 3  
Old 10-11-2016
How about:
Code:
grep -hxFf fileA.txt *multfiles.txt | sort | uniq -c > output.txt

Code:
   2 25.2292,-80.8542,29.1
   3 25.2292,-80.8958,29.2


--
or alternatively:
Code:
awk 'NR==FNR{A[$0]; next} $0 in A{C[$0]++} END{for(i in C) print i, C[i]}' fileA.txt *multfiles.txt > output.txt


Last edited by Scrutinizer; 10-11-2016 at 05:50 PM.. Reason: Added -x option to the grep approach
# 4  
Old 10-11-2016
Nice problem. Coordinates Smilie

read, a shell builtin, is not the bottleneck, I think. You are launching a lot of processes and the data is large.

Would you benefit from cutting down on the data by first getting ALL duplicates from it ? Like
Code:
sort *multfiles.txt | uniq -d > tempdata

Juha
# 5  
Old 10-11-2016
Hi,
you try this:
Code:
$ awk 'FNR==NR{A[$0]=1;next};A[$0]>0{A[$0]++};END{for (i in A){if(A[i]>1)print i,A[i]-1}}' fileA.txt *multfiles.txt
25.2292,-80.8542,29.1 2
25.2292,-80.8958,29.2 3

Regards.

TOO LATE Smilie
# 6  
Old 10-11-2016
Or, a bit more straight
Code:
awk 'FNR==NR {A[$0]=0;next} ($0 in A) {A[$0]++} END {for (i in A) {if(A[i]>0) print i,A[i]}}' fileA.txt *multfiles.txt

# 7  
Old 10-11-2016
Thanks to all that replied!

I was able to get a successful test from a few replies. Below are the times.

Post #3 (using grep) - 17 secs
Post #3 (using awk) - 27 secs
Post #5 - 1.25 secs
Post #6 - 31 secs
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need help for faster file read and grep in big files

I have a very big input file <inputFile1.txt> which has list of mobile no inputFile1.txt 3434343 3434323 0970978 85233 ... around 1 million records i have another file as inputFile2.txt which has some log detail big file inputFile2.txt afjhjdhfkjdhfkd df h8983 3434343 | 3483 | myout1 |... (3 Replies)
Discussion started by: reldb
3 Replies

2. Shell Programming and Scripting

Read files incrementally and search for particular string.

Example I have following requirements where i need to search for particular string from the log files.Files will be archived with number attached end to it and creates a new log file. First Day i will ran at 8:00 AM Filename:a.log1 Wed Aug 24 04:46:34... (1 Reply)
Discussion started by: nareshnani211
1 Replies

3. Shell Programming and Scripting

Read in search strings from text file, search for string in second text file and output to CSV

Hi guys, I have a text file named file1.txt that is formatted like this: 001 , ID , 20000 002 , Name , Brandon 003 , Phone_Number , 616-234-1999 004 , SSNumber , 234-23-234 005 , Model , Toyota 007 , Engine ,V8 008 , GPS , OFF and I have file2.txt formatted like this: ... (2 Replies)
Discussion started by: An0mander
2 Replies

4. Shell Programming and Scripting

Recursive folder search faster than find?

I'm trying to find folders created by a propritary data aquisition software with the .aps ending--yes, I have never encountered folder with a suffix before (some files also end in .aps) and sort them by date. I need the whole path ls -dt "$dataDir"*".aps"does exactly what I want except for the... (2 Replies)
Discussion started by: Michael Stora
2 Replies

5. Shell Programming and Scripting

perl- read search and replace string from the file

Dear all, I have a number of files and each file has two sections separated by a blank line. At the top section, I have lines which describes the values of the alphabetical characters, # s #; 0.123 # p #; 12.3 # d #; -2.33 # f #; 5.68 <blank line> sssssss spfdffff sdfffffff Now I... (4 Replies)
Discussion started by: sasharma
4 Replies

6. Shell Programming and Scripting

Faster search needed

Hope you guys out there can help. I have 2 files as below: file 1: 111,222,333,444,555,666 777,888,999,000,111,222 111,222,333,444,555,888 file 2: 666,AAA 222,BBB 888,CCC I want to get the 6th column from file 1 (example, 666) and check in file 2 for the value in the 2nd column... (9 Replies)
Discussion started by: daytripper1021
9 Replies

7. Shell Programming and Scripting

search pattern and read lines

Hi, I have a huge de-limited file which has pattern : 99"9876"2010-11-21 12:51:01"J"MNOPQRS ID# 2-1234-1234-0099-9876-0 "" <<read>> 99"9876"2010-11-21 12:51:01"K"R-EMP# 01234567 (LOGOFF) "" <<read>> 99"9876"2010-11-21 12:51:01"L" *AUTO LOGOFF* ... (3 Replies)
Discussion started by: angie1234
3 Replies

8. UNIX for Dummies Questions & Answers

Faster than nested while read loops?

Hi experts, I just want to know if there is a better solution to my nested while read loops below: while read line; do while read line2; do while read line3; do echo "$line $line2 $line3" done < file3.txt done < file2.txt done < file1.txt >... (4 Replies)
Discussion started by: chstr_14
4 Replies

9. Shell Programming and Scripting

read lines between search pattern

I have a file split something like 01/11/2010: No of users 100 02/11/2010: No of users 102 03/11/2010: No of users 99 ... I want to search the file for a particular date and then extract the following line with the date, something like 02/11/2010 No of users 102 I can grep... (6 Replies)
Discussion started by: gefa
6 Replies

10. Shell Programming and Scripting

sendmail.cf: How can I read a .db file and search for a token?

Hello, I need to write code in '/etc/mail/sendmail.cf' to verify that a string exists within a hash file ( Such as /etc/mail/key-value.db ). I've searched the web and did find many great articles regarding 'sendmail.cf' however I'm not clear how I can do this specific thing as the online... (0 Replies)
Discussion started by: Devyn
0 Replies
Login or Register to Ask a Question