Sponsored Content
Top Forums Shell Programming and Scripting Performance issue in Grepping large files Post 302819749 by RudiC on Tuesday 11th of June 2013 10:45:38 AM
Old 06-11-2013
Searching 8000 keywords in 300 large files is quite something, but the program you show can be optimized for speed.
a) Don't open and reread the keyword file line by line for every file matching your pattern.
b) Don't run the grep process for every single keyword/file combination (300 x 8000 = 2.4 million times!)
c) Don't use wc -l piped to the greps (again 2.4 million times)
d) Don't run the sql command including login for every single keyword/file combination; collect the results into a file and insert & update afterwards.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Unix File System performance with large directories

Hi, how does the Unix File System perform with large directories (containing ~30.000 files)? What kind of structure is used for the organization of a directory's content, linear lists, (binary) trees? I hope the description 'Unix File System' is exact enough, I don't know more about the file... (3 Replies)
Discussion started by: dive
3 Replies

2. Shell Programming and Scripting

Grepping issue..

I found another problem with my disk-adding script today. When looking for disks, I use grep. When I grep for the following disk sizes: 5242880 I also pick up these as well: 524288000 How do I specifically pick out one or the other, using grep, without resorting to the -v option? ... (9 Replies)
Discussion started by: LinuxRacr
9 Replies

3. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

4. Shell Programming and Scripting

replace issue with large files

I have the following problem: I have two files: S containing sentences (one in each row) and W containing files (one in each row). It might look like this: S: a b c apple d. e f orange g. h banana i j. W: orange banana apple My task is to replace in S all words that appear in W... (2 Replies)
Discussion started by: tootles564
2 Replies

5. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

6. Red Hat

Empty directory, large size and performance

Hi, I've some directory that I used as working directory for a program. At the end of the procedure, the content is deleted. This directory, when I do a ls -l, appears to still take up some space. After a little research, I've seen on a another board of this forum that it's not really taking... (5 Replies)
Discussion started by: bdx
5 Replies

7. Shell Programming and Scripting

Grepping large list of files

Hi All, I need help to know the exact command when I grep large list of files. Either using ls or find command. However I do not want to find in the subdirectories as the number of subdirectories are not fixed. How do I achieve that. I want something like this: find ./ -name "MYFILE*.txt"... (2 Replies)
Discussion started by: angshuman
2 Replies

8. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the... (4 Replies)
Discussion started by: gimley
4 Replies

9. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the... (15 Replies)
Discussion started by: SDohmen
15 Replies
MTREE(8)						    BSD System Manager's Manual 						  MTREE(8)

NAME
mtree -- map a directory hierarchy SYNOPSIS
mtree [-LPUcdeinqrux] [-f spec] [-K keywords] [-k keywords] [-p path] [-s seed] [-X exclude-list] DESCRIPTION
The utility mtree compares the file hierarchy rooted in the current directory against a specification read from the standard input. Messages are written to the standard output for any files whose characteristics do not match the specifications, or which are missing from either the file hierarchy or the specification. The options are as follows: -L Follow all symbolic links in the file hierarchy. -P Don't follow symbolic links in the file hierarchy, instead consider the symbolic link itself in any comparisons. This is the default. -U Modify the owner, group and permissions of existing files to match the specification and create any missing directories or symbolic links. User, group and permissions must all be specified for missing directories to be created. Corrected mismatches are not consid- ered errors. -c Print a specification for the file hierarchy to the standard output. -d Ignore everything except directory type files. -e Don't complain about files that are in the file hierarchy, but not in the specification. -i Indent the output 4 spaces each time a directory level is descended when create a specification with the -c option. This does not affect either the /set statements or the comment before each directory. It does however affect the comment before the close of each directory. -n Do not emit pathname comments when creating a specification. Normally a comment is emitted before each directory and before the close of that directory when using the -c option. -q Quiet mode. Do not complain when a ``missing'' directory cannot be created because it is already exists. This occurs when the direc- tory is a symbolic link. -r Remove any files in the file hierarchy that are not described in the specification. -u Same as -U except a status of 2 is returned if the file hierarchy did not match the specification. -x Don't descend below mount points in the file hierarchy. -f file Read the specification from file, instead of from the standard input. -K keywords Add the specified (whitespace or comma separated) keywords to the current set of keywords. -k keywords Use the ``type'' keyword plus the specified (whitespace or comma separated) keywords instead of the current set of keywords. -p path Use the file hierarchy rooted in path, instead of the current directory. -s seed Display a single checksum to the standard error output that represents all of the files for which the keyword cksum was specified. The checksum is seeded with the specified value. -X exclude-list The specified file contains fnmatch(3) patterns matching files to be excluded from the specification, one to a line. If the pattern contains a '/' character, it will be matched against entire pathnames (relative to the starting directory); otherwise, it will be matched against basenames only. No comments are allowed in the exclude-list file. Specifications are mostly composed of ``keywords'', i.e. strings that that specify values relating to files. No keywords have default val- ues, and if a keyword has no value set, no checks based on it are performed. Currently supported keywords are as follows: cksum The checksum of the file using the default algorithm specified by the cksum(1) utility. flags The file flags as a symbolic name. See chflags(1) for information on these names. If no flags are to be set the string ``none'' may be used to override the current default. ignore Ignore any file hierarchy below this file. gid The file group as a numeric value. gname The file group as a symbolic name. mode The current file's permissions as a numeric (octal) or symbolic value. nlink The number of hard links the file is expected to have. nochange Make sure this file or directory exists but otherwise ignore all attributes. uid The file owner as a numeric value. uname The file owner as a symbolic name. size The size, in bytes, of the file. link The file the symbolic link is expected to reference. time The last modification time of the file. type The type of the file; may be set to any one of the following: block block special device char character special device dir directory fifo fifo file regular file link symbolic link socket socket The default set of keywords are flags, gid, mode, nlink, size, link, time, and uid. There are four types of lines in a specification. The first type of line sets a global value for a keyword, and consists of the string ``/set'' followed by whitespace, followed by sets of keyword/value pairs, separated by whitespace. Keyword/value pairs consist of a keyword, followed by an equals sign (``=''), followed by a value, without whitespace characters. Once a keyword has been set, its value remains unchanged until either reset or unset. The second type of line unsets keywords and consists of the string ``/unset'', followed by whitespace, followed by one or more keywords, sep- arated by whitespace. The third type of line is a file specification and consists of a file name, followed by whitespace, followed by zero or more whitespace sepa- rated keyword/value pairs. The file name may be preceded by whitespace characters. The file name may contain any of the standard file name matching characters (``['', ``]'', ``?'' or ``*''), in which case files in the hierarchy will be associated with the first pattern that they match. Each of the keyword/value pairs consist of a keyword, followed by an equals sign (``=''), followed by the keyword's value, without whitespace characters. These values override, without changing, the global value of the corresponding keyword. All paths are relative. Specifying a directory will cause subsequent files to be searched for in that directory hierarchy. Which brings us to the last type of line in a specification: a line containing only the string ``..'' causes the current directory path to ascend one level. Empty lines and lines whose first non-whitespace character is a hash mark (``#'') are ignored. The mtree utility exits with a status of 0 on success, 1 if any error occurred, and 2 if the file hierarchy did not match the specification. A status of 2 is converted to a status of 0 if the -U option is used. FILES
/etc/mtree system specification directory DIAGNOSTICS
The mtree utility exits 0 on success, and >0 if an error occurs. SEE ALSO
chflags(1), chgrp(1), chmod(1), cksum(1), stat(2), fts(3), chown(8) HISTORY
The mtree utility appeared in 4.3BSD-Reno. The MD5 digest capability was added in FreeBSD 2.1, in response to the widespread use of programs which can spoof cksum(1). The SHA-1 and RIPEMD160 digests were added in FreeBSD 4.0, as new attacks have demonstrated weaknesses in MD5. Support for file flags was added in FreeBSD 4.0, and mostly comes from NetBSD. BSD
February 26, 1999 BSD
All times are GMT -4. The time now is 03:05 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy