Need help for faster file read and grep in big files


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Need help for faster file read and grep in big files
# 1  
Old 06-09-2018
Another version, which use surrounding spaces as field separator and takes into account potential variability in field 1 by using its last subfield:
Code:
awk '
  FNR==1{
    fn++
  }
  fn==1 {
    A[$1]
    next
  }
  {
    n=split($1, F, " ")
    i=F[n]
  } 
  fn==2 {
    if(i in A)
      B[$3]=i
  }
  fn==3 {
    if(i in B)
      print B[i], i, $3
  }
' file1 FS=' *[|] *' file2 file3


Last edited by Scrutinizer; 06-09-2018 at 05:53 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Split a big file system to several files

Gents Actually I have question and i need your support. I have this NAS file system mounted as /coresys has size of 7 TB I need to Split this file system into several file systems as mount points I mean how to can I Split it professionally to different NAS mount points how to can I decide... (2 Replies)
Discussion started by: AbuAliiiiiiiiii
2 Replies

2. UNIX for Beginners Questions & Answers

Grep -f for big files

ok guys. this isnt homework or anything. i have been using grep -f all my life but i am trying this for a huge file and it doesnt work. can someone give me a replacement for grep -f pattern file for big files? thanks (6 Replies)
Discussion started by: ahfze
6 Replies

3. Shell Programming and Scripting

A faster way to read and search

I have a simple script that reads in data from fileA.txt and searches line by line for that data in multiple files (*multfiles.txt). It only prints the data when there is more than 1 instance of it. The problem is that its really slow (3+ hours) to complete the entire process. There are nearly 1500... (10 Replies)
Discussion started by: ncwxpanther
10 Replies

4. UNIX for Dummies Questions & Answers

What is the faster way to grep from huge file?

Hi All, I am new to this forum and this is my first post. My requirement is like to optimize the time taken to grep the file with 40000 lines. There are two files FILEA(40000 lines) FILEB(40000 lines). The requirement is like this, both the file will be in the format below... (11 Replies)
Discussion started by: mad man
11 Replies

5. Shell Programming and Scripting

Grep -v -f and sort|diff which way is faster

Hi Gurus, I have two big files. I need to compare the different. currently, I am using sort file1 > file1_temp; sort file2 > file2_tmp diff file1_tmp file2_tmp I can use command grep -v -f file1 file2 just wondering which way is fast to compare two big files. Thanks... (4 Replies)
Discussion started by: ken6503
4 Replies

6. Shell Programming and Scripting

Read a file with n records as one big string using linux

Hello! Is there a way i can read a file with n records as one big string using linux shell script? I have a file in the below format - REC1 REC2 REC3 . . . REC4 Record length is 3000 bytes per record and with a newline char at the end. What i need to do is - read this file as one... (5 Replies)
Discussion started by: mailme0205
5 Replies

7. UNIX for Dummies Questions & Answers

Faster than nested while read loops?

Hi experts, I just want to know if there is a better solution to my nested while read loops below: while read line; do while read line2; do while read line3; do echo "$line $line2 $line3" done < file3.txt done < file2.txt done < file1.txt >... (4 Replies)
Discussion started by: chstr_14
4 Replies

8. UNIX for Advanced & Expert Users

Split a big file into two others files

Hello, i have a very big file that has more then 80 MBytes (100MBytes). So with my CVS Application I cannot commit this file (too Big) because it must have < 80 MBytes. How can I split this file into two others files, i think the AIX Unix command : split -b can do that, buit how is the right... (2 Replies)
Discussion started by: steiner
2 Replies

9. Shell Programming and Scripting

Big data file - sed/grep/awk?

Morning guys. Another day another question. :rolleyes: I am knocking up a script to pull some data from a file. The problem is the file is very big (up to 1 gig in size), so this solution: for results in `grep "^\ ... works, but takes ages (we're talking minutes) to run. The data is held... (8 Replies)
Discussion started by: dlam
8 Replies

10. UNIX for Dummies Questions & Answers

How to grep faster ?

Hi I have to grep for 2000 strings in a file one after the other.Say the file name is Snxx.out which has these strings. I have to search for all the strings in the file Snxx.out one after the other. What is the fastest way to do it ?? Note:The current grep process is taking lot of time per... (7 Replies)
Discussion started by: preethgideon
7 Replies
Login or Register to Ask a Question
PREZIP-BIN(1)						 Aspell Abbreviated User's Manual					     PREZIP-BIN(1)

NAME
prezip-bin - prefix zip delta word list compressor/decompressor SYNOPSIS
prezip-bin [ -V | -d | -z ] DESCRIPTION
prezip-bin compresses/decompresses sorted word lists from standard input to standard output. Prezip-bin is similar to word-list-compress(1) but it allows a larger character set of {0x00...0x09, 0x0B, 0x0C, 0x0E...0xFF} and multi-words larger than 255 characters in length. It can also decompress word-list-compress(1) compatible files. COMMANDS
Prezip-bin accepts only one of these commands. -V Display prezip-bin version number to standard output. -d Read a compressed word list from standard input and decompress it to standard output. This can be a word-list-compress(1) or a prezip-bin compressed file. -z Read a binary word list from standard input and compress it to standard output. EXAMPLES
prezip-bin -d <wordlist.cwl >wordlist.txt Decompress file wordlist.cwl to text file wordlist.txt prezip-bin -z <wordlist.txt >wordlist.pz 2>errors.txt Compress wordlist.txt to binary file wordlist.pz and send any error messages to a text file named errors.txt LC_COLLATE=C sort -u <wordlist.txt | prezip-bin -z >wordlist.pz Sort a word list, then pipe it to prezip-bin to create a compressed binary wordlist.pz file. prezip-bin -d <words.pz | aspell create master ./words.rws Decompress a wordlist, then pipe it to aspell(1) to create a spelling list. Please check the aspell(1) info manual for proper usage and options. TIPS
Prezip-bin is best used with sorted word list type files. It is not a general purpose compression program since resulting files may actu- ally increase in size. Unlike word-list-compress(1) if your word list has leading or trailing blank spaces for formatting purposes, you should remove them first before you compress your list using prezip-bin -z , otherwise those spaces will be included in the compressed binary output. DIAGNOSTICS
Prezip-bin normally exits with a return code of 0. If it encounters an error, a message is sent to standard error output (stderr), and prezip-bin exits with a non-zero return value. Error messages are listed below: (display help/usage message) Unknown command given on the command line so prezip-bin displays a usage message to standard error output. unknown format The input file appears not to be an expected format, or may possibly be a more advanced format. The output file will be empty. corrupt input This is only for the decompression command -d. The input file appeared to be of a correct format, but something appears wrong now. There may be some valid data in output, but due to input corruption, the rest of the file can not be completed. unexpected EOF The input file appeared okay but ended sooner than expected, therefore the output file is not complete. SEE ALSO
aspell(1), aspell-import(1), run-with-aspell(1), word-list-compress(1) Aspell is fully documented in its Texinfo manual. See the `aspell' entry in info for more complete documentation. REPORTING BUGS
For help, see the Aspell homepage at <http://aspell.net>. Send bug reports/comments to the Aspell user list at the above address. AUTHOR
This info page was written by Jose Da Silva <digital@joescat.com>. prezip-bin-0.1.2 2005-09-30 PREZIP-BIN(1)