Sponsored Content
Top Forums Shell Programming and Scripting Severe performance issue while 'grep'ing on large volume of data Post 302502126 by Souvik on Monday 7th of March 2011 01:38:24 AM
Old 03-07-2011
Severe performance issue while 'grep'ing on large volume of data

Background
-------------
The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files.

File-1
------
Contains 50,000 rows with 2 fields in each row, separated by pipe.
Row structure is like Object_Id|Object_Name, as following:

111|XXX
222|YYY
333|ZZZ

File-2
------
Contains 5,000 rows with a single field in each row.
Each row basically represents a filename with full path, as below:

/app00/applmgr/aprod/appl/au/11.5.0/resource/XXAIMG_CUSTOM_11I.pld
/app00/applmgr/aprod/appl/xbol/11.5.0/forms/US/XXARTLONG.fmt
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXINVIVCSU.pld

Task
-----
I need to search for the occurances of each Object_Name (from each row of File-1) in all the 5000 distinct files (names stored in File-2) and get the search results stored in some 3rd file with below row structure. So the total no of loop iterations would be 250,000,000.

File_Name|Object_Id|Occurance_Count
eg,
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXINVIVCSU.pld|222|13

Request
---------
Please provide the shell scripting method to do the desired job in fastest possible time.

Thanks,
Souvik.

Last edited by Souvik; 03-07-2011 at 02:45 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.

I have a file that is 20 - 80+ MB in size that is a certain type of log file. It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example: The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created... (4 Replies)
Discussion started by: elinenbe
4 Replies

2. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

3. UNIX for Advanced & Expert Users

Large volume file formatting

Hi, I have a file which is around 193 gb in size. This file has tonnes of spaces and I need to sanitize it. I tried to use awk script to split this file but it gave me an error like line to long... As of now I am using a sed command to search replace the spaces; however its too slow for such a... (2 Replies)
Discussion started by: darshanw
2 Replies

4. UNIX for Advanced & Expert Users

Gurus needed to diagnose severe performance degradation

Hi everyone, newbie forum poster here. I'm an Oracle DBA and I require some guidance from the Unix gurus here about how to pinpoint where a problem is within a Solaris 9 system running on an 8 CPU Fujitsu server that acts as our Oracle database server. Our sysadmins are trying their best to... (13 Replies)
Discussion started by: DBA_guy
13 Replies

5. HP-UX

Performance issue with 'grep' command for huge file size

I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is: while read line do emp_name=`echo $line` grep -e... (7 Replies)
Discussion started by: arb_1984
7 Replies

6. UNIX for Dummies Questions & Answers

virtual memory and diff'ing very large files

(0 Replies)
Discussion started by: uiop44
0 Replies

7. Programming

Issue when fork()ing processes

Hi guys! I'll simplify my problem. I have the following code: #include <fcntl.h> #include <stdio.h> #include <string.h> #include <stdlib.h> #include <signal.h> #include <fcntl.h> #include <unistd.h> #include <sys/wait.h> #define max 25 #define buffdim 50 void p1(); void p2();... (2 Replies)
Discussion started by: pfpietro
2 Replies

8. UNIX for Dummies Questions & Answers

Large file data handling issue

I have a single record large file, semicolon ';' and pipe '|' separated. I am doing a vi on the file. It is throwing an error "File to long" I need to actually remove the last | symbol from this file. sed -e 's/\|*$//' filename is working fine for small files. But not working on this big... (13 Replies)
Discussion started by: Gurkamal83
13 Replies

9. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

10. Shell Programming and Scripting

Output large volume of data to CSV file

I have a program that output the ownership and permission on each directory and file on the server to a csv file. I am getting error message when I run the program. The program is not outputting to the csv file. Error: the file access permissions do not allow the specified action cannot... (2 Replies)
Discussion started by: dellanicholson
2 Replies
MPI_Alloc_mem(3OpenMPI) 												   MPI_Alloc_mem(3OpenMPI)

NAME
MPI_Alloc_mem - Allocates a specified memory segment. SYNTAX
C Syntax #include <mpi.h> int MPI_Alloc_mem(MPI_Aint size, MPI_Info info, void *baseptr) Fortran Syntax (see FORTRAN NOTES) INCLUDE 'mpif.h' MPI_ALLOC_MEM(SIZE, INFO, BASEPTR, IERROR) INTEGER INFO, IERROR INTEGER(KIND=MPI_ADDRESS_KIND) SIZE, BASEPTR C++ Syntax #include <mpi.h> void* MPI::Alloc_mem(MPI::Aint size, const MPI::Info& info) INPUT PARAMETERS
size Size of memory segment in bytes (nonnegative integer). info Info argument (handle). OUTPUT PARAMETERS
baseptr Pointer to beginning of memory segment allocated. IERROR Fortran only: Error status (integer). DESCRIPTION
MPI_Alloc_mem allocates size bytes of memory. The starting address of this memory is returned in the variable base. FORTRAN NOTES
There is no portable FORTRAN 77 syntax for using MPI_Alloc_mem. There is no portable Fortran syntax for using pointers returned from MPI_Alloc_mem. However, MPI_Alloc_mem can be used with Sun Fortran compilers. From FORTRAN 77, you can use the following non-standard declarations for the SIZE and BASEPTR arguments: INCLUDE "mpif.h" INTEGER*MPI_ADDRESS_KIND SIZE, BASEPTR From either FORTRAN 77 or Fortran 90, you can use "Cray pointers" for the BASEPTR argument. Cray pointers are described further in the For- tran User's Guide and are supported by many Fortran compilers. For example, INCLUDE "mpif.h" REAL*4 A(100,100) POINTER (BASEPTR, A) INTEGER*MPI_ADDRESS_KIND SIZE SIZE = 4 * 100 * 100 CALL MPI_ALLOC_MEM(SIZE,MPI_INFO_NULL,BASEPTR,IERR) ! use A CALL MPI_FREE_MEM(A, IERR) ERRORS
Almost all MPI routines return an error value; C routines as the value of the function and Fortran routines in the last argument. C++ func- tions do not return errors. If the default error handler is set to MPI::ERRORS_THROW_EXCEPTIONS, then on error the C++ exception mechanism will be used to throw an MPI:Exception object. Before the error value is returned, the current MPI error handler is called. By default, this error handler aborts the MPI job, except for I/O function errors. The error handler may be changed with MPI_Comm_set_errhandler; the predefined error handler MPI_ERRORS_RETURN may be used to cause error values to be returned. Note that MPI does not guarantee that an MPI program can continue past an error. SEE ALSO
MPI_Free_mem Open MPI 1.2 September 2006 MPI_Alloc_mem(3OpenMPI)
All times are GMT -4. The time now is 12:09 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy