03-07-2011
Severe performance issue while 'grep'ing on large volume of data
Background
-------------
The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files.
File-1
------
Contains 50,000 rows with 2 fields in each row, separated by pipe.
Row structure is like Object_Id|Object_Name, as following:
111|XXX
222|YYY
333|ZZZ
File-2
------
Contains 5,000 rows with a single field in each row.
Each row basically represents a filename with full path, as below:
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXAIMG_CUSTOM_11I.pld
/app00/applmgr/aprod/appl/xbol/11.5.0/forms/US/XXARTLONG.fmt
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXINVIVCSU.pld
Task
-----
I need to search for the occurances of each Object_Name (from each row of File-1) in all the 5000 distinct files (names stored in File-2) and get the search results stored in some 3rd file with below row structure. So the total no of loop iterations would be 250,000,000.
File_Name|Object_Id|Occurance_Count
eg,
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXINVIVCSU.pld|222|13
Request
---------
Please provide the shell scripting method to do the desired job in fastest possible time.
Thanks,
Souvik.
Last edited by Souvik; 03-07-2011 at 02:45 AM..
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have a file that is 20 - 80+ MB in size that is a certain type of log file.
It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example:
The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created... (4 Replies)
Discussion started by: elinenbe
4 Replies
2. Shell Programming and Scripting
Hello Gurus,
We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this .
Problem Definition:
/Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies
3. UNIX for Advanced & Expert Users
Hi,
I have a file which is around 193 gb in size. This file has tonnes of spaces and I need to sanitize it.
I tried to use awk script to split this file but it gave me an error like line to long...
As of now I am using a sed command to search replace the spaces; however its too slow for such a... (2 Replies)
Discussion started by: darshanw
2 Replies
4. UNIX for Advanced & Expert Users
Hi everyone, newbie forum poster here. I'm an Oracle DBA and I require some guidance from the Unix gurus here about how to pinpoint where a problem is within a Solaris 9 system running on an 8 CPU Fujitsu server that acts as our Oracle database server. Our sysadmins are trying their best to... (13 Replies)
Discussion started by: DBA_guy
13 Replies
5. HP-UX
I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is:
while read line
do
emp_name=`echo $line`
grep -e... (7 Replies)
Discussion started by: arb_1984
7 Replies
6. UNIX for Dummies Questions & Answers
(0 Replies)
Discussion started by: uiop44
0 Replies
7. Programming
Hi guys!
I'll simplify my problem. I have the following code:
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <signal.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/wait.h>
#define max 25
#define buffdim 50
void p1();
void p2();... (2 Replies)
Discussion started by: pfpietro
2 Replies
8. UNIX for Dummies Questions & Answers
I have a single record large file, semicolon ';' and pipe '|' separated. I am doing a vi on the file. It is throwing an error "File to long"
I need to actually remove the last | symbol from this file.
sed -e 's/\|*$//' filename
is working fine for small files. But not working on this big... (13 Replies)
Discussion started by: Gurkamal83
13 Replies
9. Shell Programming and Scripting
I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size.
Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files.
If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies
10. Shell Programming and Scripting
I have a program that output the ownership and permission on each directory and file on the server to a csv file. I am getting error message
when I run the program. The program is not outputting to the csv file.
Error:
the file access permissions do not allow the specified action
cannot... (2 Replies)
Discussion started by: dellanicholson
2 Replies
LEARN ABOUT MOJAVE
mpsmatrixdescriptor
MPSMatrixDescriptor(3) MetalPerformanceShaders.framework MPSMatrixDescriptor(3)
NAME
MPSMatrixDescriptor
SYNOPSIS
#import <MPSMatrixTypes.h>
Inherits NSObject.
Class Methods
(__nonnull instancetype) + matrixDescriptorWithDimensions:columns:rowBytes:dataType:
(__nonnull instancetype) + matrixDescriptorWithRows:columns:rowBytes:dataType:
(__nonnull instancetype) + matrixDescriptorWithRows:columns:matrices:rowBytes:matrixBytes:dataType:
(size_t) + rowBytesFromColumns:dataType:
(size_t) + rowBytesForColumns:dataType:
Properties
NSUInteger rows
NSUInteger columns
NSUInteger matrices
MPSDataType dataType
NSUInteger rowBytes
NSUInteger matrixBytes
Detailed Description
This depends on Metal.framework
A MPSMatrixDescriptor describes the sizes, strides, and data type of a an array of 2-dimensional matrices. All storage is assumed to be in
'matrix-major'. See the description for MPSMatrix for further details.
Method Documentation
+ (__nonnull instancetype) matrixDescriptorWithDimensions: (NSUInteger) rows(NSUInteger) columns(NSUInteger) rowBytes(MPSDataType) dataType
Create a MPSMatrixDescriptor with the specified dimensions and data type.
Parameters:
rows The number of rows of the matrix.
columns The number of columns of the matrix.
rowBytes The number of bytes between starting elements of consecutive rows. Must be a multiple of the element size.
dataType The type of the data to be stored in the matrix.
For performance considerations the optimal row stride may not necessarily be equal to the number of columns in the matrix. The MPSMatrix
class provides a method which may be used to determine this value, see the rowBytesForColumns API in the MPSMatrix class. The number of
matrices described is initialized to 1.
+ (__nonnull instancetype) matrixDescriptorWithRows: (NSUInteger) rows(NSUInteger) columns(NSUInteger) matrices(NSUInteger)
rowBytes(NSUInteger) matrixBytes(MPSDataType) dataType
Create a MPSMatrixDescriptor with the specified dimensions and data type.
Parameters:
rows The number of rows of a single matrix.
columns The number of columns of a single matrix.
matrices The number of matrices in the MPSMatrix object.
rowBytes The number of bytes between starting elements of consecutive rows. Must be a multiple of the element size.
matrixBytes The number of bytes between starting elements of consecutive matrices. Must be a multiple of rowBytes.
dataType The type of the data to be stored in the matrix.
For performance considerations the optimal row stride may not necessarily be equal to the number of columns in the matrix. The MPSMatrix
class provides a method which may be used to determine this value, see the rowBytesForColumns API in the MPSMatrix class.
+ (__nonnull instancetype) matrixDescriptorWithRows: (NSUInteger) rows(NSUInteger) columns(NSUInteger) rowBytes(MPSDataType) dataType
+ (size_t) rowBytesForColumns: (NSUInteger) columns(MPSDataType) dataType
+ (size_t) rowBytesFromColumns: (NSUInteger) columns(MPSDataType) dataType
Return the recommended row stride, in bytes, for a given number of columns.
Parameters:
columns The number of columns in the matrix for which the recommended row stride, in bytes, is to be determined.
dataType The type of matrix data values.
To achieve best performance the optimal stride between rows of a matrix is not necessarily equivalent to the number of columns. This method
returns the row stride, in bytes, which gives best performance for a given number of columns. Using this row stride to construct your array
is recommended, but not required (provided that the stride used is still large enough to allocate a full row of data).
Property Documentation
- columns [read], [write], [nonatomic], [assign]
The number of columns in a matrix.
- dataType [read], [write], [nonatomic], [assign]
The type of the data which makes up the values of the matrix.
- matrices [read], [nonatomic], [assign]
The number of matrices.
- matrixBytes [read], [nonatomic], [assign]
The stride, in bytes, between corresponding elements of consecutive matrices. Must be a multiple of rowBytes.
- rowBytes [read], [write], [nonatomic], [assign]
The stride, in bytes, between corresponding elements of consecutive rows. Must be a multiple of the element size.
- rows [read], [write], [nonatomic], [assign]
The number of rows in a matrix.
Author
Generated automatically by Doxygen for MetalPerformanceShaders.framework from the source code.
Version MetalPerformanceShaders-100 Thu Feb 8 2018 MPSMatrixDescriptor(3)