Performance issue in Grepping large files Post: 302819725

Sponsored Content

Top Forums Shell Programming and Scripting Performance issue in Grepping large files Post 302819725 by millan on Tuesday 11th of June 2013 10:31:42 AM

06-11-2013

Registered User

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size.
Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files.
If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence to database.
I have implemented following code..but it is taking around 10-12 hours complete.
Could you please suggest how will i change it , so that it will be faster.
I am using Solaris .

Code:

 
/usr/xpg4/bin/find $tmpdir -type f -name "*.rdf" -o -name "*.fmb" -o -name "*.pll" -o -name "*.ctl" -o -name "*.sh" -o -name "*.sql" -o -name "*.prog"| while read filename
do
    while read keyword
    do
       matchCount=`/usr/xpg4/bin/grep -F -i -x "$keyword" "$filename" | wc -l`
       if [ $matchCount -ne 0 ];then
 
  out3=`echo "$filename"|awk -F\. '{print $2}'`
  
  bfilename=`basename "$filename"`
  
  case $out3 in
   'rdf')   catagoery="REPORT";;
      
   'fmb')   catagoery="FORM";;
   'sql')   catagoery="SQL FILE";;
   'pll')   catagoery="Library File";;
   'ctl')   catagoery="Control File";;
   'sh')   catagoery="Shell script";;
    *)    catagoery="OTHER";;
  esac 
  
  echo "bfilename,keyword,matchCount,out3,catagoery are:- $bfilename,$keyword,$matchCount,$out3,$catagoery"
  sqlplus -s $usrname/$password@$dbSID <<-SQL >> spot_fsearch.log
  INSERT INTO AA_DETAIL (FILE_NAME,DEP_OBJECT_NAME,OCCURANCE,FILE_TYPE,PROGRAM_TYPE) values ('$bfilename','$keyword',$matchCount,'$out3','$catagoery');
  UPDATE BB_DETAIL SET (DEP_OBJECT_TYPE,MODULE_SHORT_NAME,APPLICATION,OBJECT_STATUS,OBJ_ADDN_INFO) = (SELECT OBJECT_TYPE,MODULE_SHORT_NAME,APPLICATION,OBJECT_STATUS,OBJ_ADDN_INFO FROM CG_COMPARATIVE_MATRIX_TAB WHERE upper(OBJECT_NAME)=upper('$keyword') AND ROWNUM<2) WHERE upper(DEP_OBJECT_NAME) = upper('$keyword');
  UPDATE CC_CUSTOM_FILES_SUMMARY SET IMPACTED_BY_UPGRADE='$out2' WHERE FILE_NAME='$bfilename';
  quit;
 
SQL
       fi
    done < $keywordfile
done

millan

View Public Profile for millan

Find all posts by millan

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Unix File System performance with large directories

Hi, how does the Unix File System perform with large directories (containing ~30.000 files)? What kind of structure is used for the organization of a directory's content, linear lists, (binary) trees? I hope the description 'Unix File System' is exact enough, I don't know more about the file...

2. Shell Programming and Scripting

Grepping issue..

I found another problem with my disk-adding script today. When looking for disks, I use grep. When I grep for the following disk sizes: 5242880 I also pick up these as well: 524288000 How do I specifically pick out one or the other, using grep, without resorting to the -v option? ...

3. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below...

4. Shell Programming and Scripting

replace issue with large files

I have the following problem: I have two files: S containing sentences (one in each row) and W containing files (one in each row). It might look like this: S: a b c apple d. e f orange g. h banana i j. W: orange banana apple My task is to replace in S all words that appear in W...

5. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ...

6. Red Hat

Empty directory, large size and performance

Hi, I've some directory that I used as working directory for a program. At the end of the procedure, the content is deleted. This directory, when I do a ls -l, appears to still take up some space. After a little research, I've seen on a another board of this forum that it's not really taking...

7. Shell Programming and Scripting

Grepping large list of files

Hi All, I need help to know the exact command when I grep large list of files. Either using ls or find command. However I do not want to find in the subdirectories as the number of subdirectories are not fixed. How do I achieve that. I want something like this: find ./ -name "MYFILE*.txt"...

8. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the...

9. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the...

LEARN ABOUT SUNOS

apropos

apropos(1)							   User Commands							apropos(1)

NAME

       apropos - locate commands by keyword lookup

SYNOPSIS

       apropos keyword...

DESCRIPTION

       The apropos utility displays the man page name, section number, and a short description for each man page whose NAME line contains keyword.
       This information is contained in the /usr/share/man/windex database created by catman(1M). If catman(1M) was not run, or was run  with  the
       -n  option,  apropos  fails. Each word is considered separately and the case of letters is ignored. Words which are part of other words are
       considered; for example, when looking for `compile', apropos finds all instances of `compiler' also.

       apropos is actually just the -k option to the man(1) command.

EXAMPLES

       Example 1: To find a man page whose NAME line contains a keyword

       Try

       example% apropos password

       and

       example% apropos editor

       If the line starts `filename(section) ...' you can run

       man -s section filename
       to display the man page for filename.

       Example 2: To find the man page for the subroutine printf()

       Try

       example% apropos format

       and then

       example% man -s 3s printf

       to get the manual page on the subroutine printf().

FILES

       /usr/share/man/windex	       table of contents and keyword database

ATTRIBUTES

       See attributes(5) for descriptions of the following attributes:

       +-----------------------------+-----------------------------+
       |      ATTRIBUTE TYPE	     |	    ATTRIBUTE VALUE	   |
       +-----------------------------+-----------------------------+
       |Availability		     |SUNWdoc			   |
       +-----------------------------+-----------------------------+
       |CSI			     |Enabled			   |
       +-----------------------------+-----------------------------+

SEE ALSO

       man(1), whatis(1), catman(1M), attributes(5)

DIAGNOSTICS

       /usr/share/man/windex: No such file or directory

	   This database does not exist. catman(1M) must be run to create it.

SunOS 5.10							    20 Dec 1996 							apropos(1)