03-07-2011
Severe performance issue while 'grep'ing on large volume of data
Background
-------------
The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files.
File-1
------
Contains 50,000 rows with 2 fields in each row, separated by pipe.
Row structure is like Object_Id|Object_Name, as following:
111|XXX
222|YYY
333|ZZZ
File-2
------
Contains 5,000 rows with a single field in each row.
Each row basically represents a filename with full path, as below:
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXAIMG_CUSTOM_11I.pld
/app00/applmgr/aprod/appl/xbol/11.5.0/forms/US/XXARTLONG.fmt
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXINVIVCSU.pld
Task
-----
I need to search for the occurances of each Object_Name (from each row of File-1) in all the 5000 distinct files (names stored in File-2) and get the search results stored in some 3rd file with below row structure. So the total no of loop iterations would be 250,000,000.
File_Name|Object_Id|Occurance_Count
eg,
/app00/applmgr/aprod/appl/au/11.5.0/resource/XXINVIVCSU.pld|222|13
Request
---------
Please provide the shell scripting method to do the desired job in fastest possible time.
Thanks,
Souvik.
Last edited by Souvik; 03-07-2011 at 02:45 AM..
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have a file that is 20 - 80+ MB in size that is a certain type of log file.
It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example:
The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created... (4 Replies)
Discussion started by: elinenbe
4 Replies
2. Shell Programming and Scripting
Hello Gurus,
We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this .
Problem Definition:
/Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies
3. UNIX for Advanced & Expert Users
Hi,
I have a file which is around 193 gb in size. This file has tonnes of spaces and I need to sanitize it.
I tried to use awk script to split this file but it gave me an error like line to long...
As of now I am using a sed command to search replace the spaces; however its too slow for such a... (2 Replies)
Discussion started by: darshanw
2 Replies
4. UNIX for Advanced & Expert Users
Hi everyone, newbie forum poster here. I'm an Oracle DBA and I require some guidance from the Unix gurus here about how to pinpoint where a problem is within a Solaris 9 system running on an 8 CPU Fujitsu server that acts as our Oracle database server. Our sysadmins are trying their best to... (13 Replies)
Discussion started by: DBA_guy
13 Replies
5. HP-UX
I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is:
while read line
do
emp_name=`echo $line`
grep -e... (7 Replies)
Discussion started by: arb_1984
7 Replies
6. UNIX for Dummies Questions & Answers
(0 Replies)
Discussion started by: uiop44
0 Replies
7. Programming
Hi guys!
I'll simplify my problem. I have the following code:
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <signal.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/wait.h>
#define max 25
#define buffdim 50
void p1();
void p2();... (2 Replies)
Discussion started by: pfpietro
2 Replies
8. UNIX for Dummies Questions & Answers
I have a single record large file, semicolon ';' and pipe '|' separated. I am doing a vi on the file. It is throwing an error "File to long"
I need to actually remove the last | symbol from this file.
sed -e 's/\|*$//' filename
is working fine for small files. But not working on this big... (13 Replies)
Discussion started by: Gurkamal83
13 Replies
9. Shell Programming and Scripting
I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size.
Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files.
If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies
10. Shell Programming and Scripting
I have a program that output the ownership and permission on each directory and file on the server to a csv file. I am getting error message
when I run the program. The program is not outputting to the csv file.
Error:
the file access permissions do not allow the specified action
cannot... (2 Replies)
Discussion started by: dellanicholson
2 Replies
OD(1) FSF OD(1)
NAME
od - dump files in octal and other formats
SYNOPSIS
od [OPTION]... [FILE]...
od --traditional [FILE] [[+]OFFSET [[+]LABEL]]
DESCRIPTION
Write an unambiguous representation, octal bytes by default, of FILE to standard output. With more than one FILE argument, concatenate
them in the listed order to form the input. With no FILE, or when FILE is -, read standard input.
All arguments to long options are mandatory for short options.
-A, --address-radix=RADIX
decide how file offsets are printed
-j, --skip-bytes=BYTES
skip BYTES input bytes first
-N, --read-bytes=BYTES
limit dump to BYTES input bytes
-s, --strings[=BYTES]
output strings of at least BYTES graphic chars
-t, --format=TYPE
select output format or formats
-v, --output-duplicates
do not use * to mark line suppression
-w, --width[=BYTES]
output BYTES bytes per output line
--traditional
accept arguments in traditional form
--help display this help and exit
--version
output version information and exit
Traditional format specifications may be intermixed; they accumulate:
-a same as -t a, select named characters
-b same as -t oC, select octal bytes
-c same as -t c, select ASCII characters or backslash escapes
-d same as -t u2, select unsigned decimal shorts
-f same as -t fF, select floats
-h same as -t x2, select hexadecimal shorts
-i same as -t d2, select decimal shorts
-l same as -t d4, select decimal longs
-o same as -t o2, select octal shorts
-x same as -t x2, select hexadecimal shorts
For older syntax (second call format), OFFSET means -j OFFSET. LABEL is the pseudo-address at first byte printed, incremented when dump is
progressing. For OFFSET and LABEL, a 0x or 0X prefix indicates hexadecimal, suffixes may be . for octal and b for multiply by 512.
TYPE is made up of one or more of these specifications:
a named character
c ASCII character or backslash escape
d[SIZE]
signed decimal, SIZE bytes per integer
f[SIZE]
floating point, SIZE bytes per integer
o[SIZE]
octal, SIZE bytes per integer
u[SIZE]
unsigned decimal, SIZE bytes per integer
x[SIZE]
hexadecimal, SIZE bytes per integer
SIZE is a number. For TYPE in doux, SIZE may also be C for sizeof(char), S for sizeof(short), I for sizeof(int) or L for sizeof(long). If
TYPE is f, SIZE may also be F for sizeof(float), D for sizeof(double) or L for sizeof(long double).
RADIX is d for decimal, o for octal, x for hexadecimal or n for none. BYTES is hexadecimal with 0x or 0X prefix, it is multiplied by 512
with b suffix, by 1024 with k and by 1048576 with m. Adding a z suffix to any type adds a display of printable characters to the end of
each line of output. --string without a number implies 3. --width without a number implies 32. By default, od uses -A o -t d2 -w 16.
AUTHOR
Written by Jim Meyering.
REPORTING BUGS
Report bugs to <bug-coreutils@gnu.org>.
COPYRIGHT
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICU-
LAR PURPOSE.
SEE ALSO
The full documentation for od is maintained as a Texinfo manual. If the info and od programs are properly installed at your site, the com-
mand
info od
should give you access to the complete manual.
od (coreutils) 4.5.3 February 2003 OD(1)