Need help for faster file read and grep in big files


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Need help for faster file read and grep in big files
# 1  
Old 06-08-2018
Need help for faster file read and grep in big files

I have a very big input file <inputFile1.txt> which has list of mobile no

inputFile1.txt
Code:
3434343
3434323
0970978
85233

... around 1 million records

i have another file as inputFile2.txt which has some log detail big file
inputFile2.txt
Code:
afjhjdhfkjdhfkd df h8983 3434343 | 3483 | myout1 | 9uohksdf
afjhjdhfkjdhfkd df h8983 3434343 | 3483 | myout2 | 9uohksdf
afjhjdhfkjdhfkd df h8983 0970978| 3483 | myout3 | 9uohksdf


i have another file as inputFile3.txt which has some log detail big file
Code:
afjhjdhfkjdhfkd df h8983 myout1  | 3iroi2 | FinalOut1 | 3243
afjhjdhfkjdhfkd df h8983 myout2  | 3iroi2 | FinalOut2 | 3243
afjhjdhfkjdhfkd df h8983 myout2  | 3iroi2 | FinalOut3 | 3243

Basically i need to take the first line from inputFile1.txt and search it in inputFile2.txt and extract myout1 & myout2 and then extract these in inputFile3.txt and get the FinalOut1 / FinalOut1

basically output as
Code:
3434343 myout1 FinalOut1 
3434343 myout2 FinalOut2 
3434343 myout2 FinalOut3

I was doing it in shell script using grep command .. it is taking forever more than 10-20 hours.
is there any better and faster way to handle it ?

Thanks in advance

Last edited by Scott; 06-08-2018 at 03:54 PM.. Reason: Please use code tags
# 2  
Old 06-08-2018
Guessing you're running grep once per record, if it's taking hours. How about:

Code:
$ awk 'LFN != FILENAME { LFN = FILENAME ; FILENUM++ }
FILENUM==1 { A[$1] ; next }
FILENUM==2 { if($4 in A)        S1[$6] = $4 ; next }
FILENUM==3 { if($4 in S1) print S1[$4], $4, $6 }' \
        FS="[ |]+" inputFile1.txt inputFile2.txt inputFile3.txt

3434343 myout1 FinalOut1
3434343 myout2 FinalOut2
3434343 myout2 FinalOut3

$

One command.

If your real data's any different from what you posted it may need fine tuning.
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 06-08-2018
Try also (tackling it from the other end)
Code:
awk '
FNR == 1        {FILE++
                }
FILE < 3        {FIN[FILE,$4] = FIN[FILE,$4] $8 FS
                }
FILE == 3       {n = split(FIN[2,$1], T1)
                 for (i=1; i<=n; i++)   {m = split(FIN[1,T1[i]], T2)
                                         for (j=1; j<=m; j++) print $1, T1[i], T2[j]
                                        }
                }
' file3 file2 file1
3434343 myout1 FinalOut1
3434343 myout2 FinalOut2
3434343 myout2 FinalOut3

This User Gave Thanks to RudiC For This Post:
# 4  
Old 06-09-2018
Another version, which use surrounding spaces as field separator and takes into account potential variability in field 1 by using its last subfield:
Code:
awk '
  FNR==1{
    fn++
  }
  fn==1 {
    A[$1]
    next
  }
  {
    n=split($1, F, " ")
    i=F[n]
  } 
  fn==2 {
    if(i in A)
      B[$3]=i
  }
  fn==3 {
    if(i in B)
      print B[i], i, $3
  }
' file1 FS=' *[|] *' file2 file3


Last edited by Scrutinizer; 06-09-2018 at 05:53 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Split a big file system to several files

Gents Actually I have question and i need your support. I have this NAS file system mounted as /coresys has size of 7 TB I need to Split this file system into several file systems as mount points I mean how to can I Split it professionally to different NAS mount points how to can I decide... (2 Replies)
Discussion started by: AbuAliiiiiiiiii
2 Replies

2. UNIX for Beginners Questions & Answers

Grep -f for big files

ok guys. this isnt homework or anything. i have been using grep -f all my life but i am trying this for a huge file and it doesnt work. can someone give me a replacement for grep -f pattern file for big files? thanks (6 Replies)
Discussion started by: ahfze
6 Replies

3. Shell Programming and Scripting

A faster way to read and search

I have a simple script that reads in data from fileA.txt and searches line by line for that data in multiple files (*multfiles.txt). It only prints the data when there is more than 1 instance of it. The problem is that its really slow (3+ hours) to complete the entire process. There are nearly 1500... (10 Replies)
Discussion started by: ncwxpanther
10 Replies

4. UNIX for Dummies Questions & Answers

What is the faster way to grep from huge file?

Hi All, I am new to this forum and this is my first post. My requirement is like to optimize the time taken to grep the file with 40000 lines. There are two files FILEA(40000 lines) FILEB(40000 lines). The requirement is like this, both the file will be in the format below... (11 Replies)
Discussion started by: mad man
11 Replies

5. Shell Programming and Scripting

Grep -v -f and sort|diff which way is faster

Hi Gurus, I have two big files. I need to compare the different. currently, I am using sort file1 > file1_temp; sort file2 > file2_tmp diff file1_tmp file2_tmp I can use command grep -v -f file1 file2 just wondering which way is fast to compare two big files. Thanks... (4 Replies)
Discussion started by: ken6503
4 Replies

6. Shell Programming and Scripting

Read a file with n records as one big string using linux

Hello! Is there a way i can read a file with n records as one big string using linux shell script? I have a file in the below format - REC1 REC2 REC3 . . . REC4 Record length is 3000 bytes per record and with a newline char at the end. What i need to do is - read this file as one... (5 Replies)
Discussion started by: mailme0205
5 Replies

7. UNIX for Dummies Questions & Answers

Faster than nested while read loops?

Hi experts, I just want to know if there is a better solution to my nested while read loops below: while read line; do while read line2; do while read line3; do echo "$line $line2 $line3" done < file3.txt done < file2.txt done < file1.txt >... (4 Replies)
Discussion started by: chstr_14
4 Replies

8. UNIX for Advanced & Expert Users

Split a big file into two others files

Hello, i have a very big file that has more then 80 MBytes (100MBytes). So with my CVS Application I cannot commit this file (too Big) because it must have < 80 MBytes. How can I split this file into two others files, i think the AIX Unix command : split -b can do that, buit how is the right... (2 Replies)
Discussion started by: steiner
2 Replies

9. Shell Programming and Scripting

Big data file - sed/grep/awk?

Morning guys. Another day another question. :rolleyes: I am knocking up a script to pull some data from a file. The problem is the file is very big (up to 1 gig in size), so this solution: for results in `grep "^\ ... works, but takes ages (we're talking minutes) to run. The data is held... (8 Replies)
Discussion started by: dlam
8 Replies

10. UNIX for Dummies Questions & Answers

How to grep faster ?

Hi I have to grep for 2000 strings in a file one after the other.Say the file name is Snxx.out which has these strings. I have to search for all the strings in the file Snxx.out one after the other. What is the fastest way to do it ?? Note:The current grep process is taking lot of time per... (7 Replies)
Discussion started by: preethgideon
7 Replies
Login or Register to Ask a Question