Performance issue in Grepping large files Post: 302819749

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Unix File System performance with large directories

Hi, how does the Unix File System perform with large directories (containing ~30.000 files)? What kind of structure is used for the organization of a directory's content, linear lists, (binary) trees? I hope the description 'Unix File System' is exact enough, I don't know more about the file...

2. Shell Programming and Scripting

Grepping issue..

I found another problem with my disk-adding script today. When looking for disks, I use grep. When I grep for the following disk sizes: 5242880 I also pick up these as well: 524288000 How do I specifically pick out one or the other, using grep, without resorting to the -v option? ...

3. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below...

4. Shell Programming and Scripting

replace issue with large files

I have the following problem: I have two files: S containing sentences (one in each row) and W containing files (one in each row). It might look like this: S: a b c apple d. e f orange g. h banana i j. W: orange banana apple My task is to replace in S all words that appear in W...

5. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ...

6. Red Hat

Empty directory, large size and performance

Hi, I've some directory that I used as working directory for a program. At the end of the procedure, the content is deleted. This directory, when I do a ls -l, appears to still take up some space. After a little research, I've seen on a another board of this forum that it's not really taking...

7. Shell Programming and Scripting

Grepping large list of files

Hi All, I need help to know the exact command when I grep large list of files. Either using ls or find command. However I do not want to find in the subdirectories as the number of subdirectories are not fixed. How do I achieve that. I want something like this: find ./ -name "MYFILE*.txt"...

8. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the...

9. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the...

LEARN ABOUT OPENDARWIN

mtree

MTREE(8)						    BSD System Manager's Manual 						  MTREE(8)

NAME

     mtree -- map a directory hierarchy

SYNOPSIS

     mtree [-LPUcdeinqrux] [-f spec] [-K keywords] [-k keywords] [-p path] [-s seed] [-X exclude-list]

DESCRIPTION

     The utility mtree compares the file hierarchy rooted in the current directory against a specification read from the standard input.  Messages
     are written to the standard output for any files whose characteristics do not match the specifications, or which are missing from either the
     file hierarchy or the specification.

     The options are as follows:

     -L    Follow all symbolic links in the file hierarchy.

     -P    Don't follow symbolic links in the file hierarchy, instead consider the symbolic link itself in any comparisons. This is the default.

     -U    Modify the owner, group and permissions of existing files to match the specification and create any missing directories or symbolic
	   links.  User, group and permissions must all be specified for missing directories to be created.  Corrected mismatches are not consid-
	   ered errors.

     -c    Print a specification for the file hierarchy to the standard output.

     -d    Ignore everything except directory type files.

     -e    Don't complain about files that are in the file hierarchy, but not in the specification.

     -i    Indent the output 4 spaces each time a directory level is descended when create a specification with the -c option.	This does not
	   affect either the /set statements or the comment before each directory.  It does however affect the comment before the close of each
	   directory.

     -n    Do not emit pathname comments when creating a specification.  Normally a comment is emitted before each directory and before the close
	   of that directory when using the -c option.

     -q    Quiet mode.	Do not complain when a ``missing'' directory cannot be created because it is already exists.  This occurs when the direc-
	   tory is a symbolic link.

     -r    Remove any files in the file hierarchy that are not described in the specification.

     -u    Same as -U except a status of 2 is returned if the file hierarchy did not match the specification.

     -x    Don't descend below mount points in the file hierarchy.

     -f file
	   Read the specification from file, instead of from the standard input.

     -K keywords
	   Add the specified (whitespace or comma separated) keywords to the current set of keywords.

     -k keywords
	   Use the ``type'' keyword plus the specified (whitespace or comma separated) keywords instead of the current set of keywords.

     -p path
	   Use the file hierarchy rooted in path, instead of the current directory.

     -s seed
	   Display a single checksum to the standard error output that represents all of the files for which the keyword cksum was specified.  The
	   checksum is seeded with the specified value.

     -X exclude-list
	   The specified file contains fnmatch(3) patterns matching files to be excluded from the specification, one to a line.  If the pattern
	   contains a '/' character, it will be matched against entire pathnames (relative to the starting directory); otherwise, it will be
	   matched against basenames only.  No comments are allowed in the exclude-list file.

     Specifications are mostly composed of ``keywords'', i.e. strings that that specify values relating to files.  No keywords have default val-
     ues, and if a keyword has no value set, no checks based on it are performed.

     Currently supported keywords are as follows:

     cksum	 The checksum of the file using the default algorithm specified by the cksum(1) utility.

     flags	 The file flags as a symbolic name.  See chflags(1) for information on these names.  If no flags are to be set the string ``none''
		 may be used to override the current default.

     ignore	 Ignore any file hierarchy below this file.

     gid	 The file group as a numeric value.

     gname	 The file group as a symbolic name.

     mode	 The current file's permissions as a numeric (octal) or symbolic value.

     nlink	 The number of hard links the file is expected to have.

     nochange	 Make sure this file or directory exists but otherwise ignore all attributes.

     uid	 The file owner as a numeric value.

     uname	 The file owner as a symbolic name.

     size	 The size, in bytes, of the file.

     link	 The file the symbolic link is expected to reference.

     time	 The last modification time of the file.

     type	 The type of the file; may be set to any one of the following:

		 block	     block special device
		 char	     character special device
		 dir	     directory
		 fifo	     fifo
		 file	     regular file
		 link	     symbolic link
		 socket      socket

     The default set of keywords are flags, gid, mode, nlink, size, link, time, and uid.

     There are four types of lines in a specification.

     The first type of line sets a global value for a keyword, and consists of the string ``/set'' followed by whitespace, followed by sets of
     keyword/value pairs, separated by whitespace.  Keyword/value pairs consist of a keyword, followed by an equals sign (``=''), followed by a
     value, without whitespace characters.  Once a keyword has been set, its value remains unchanged until either reset or unset.

     The second type of line unsets keywords and consists of the string ``/unset'', followed by whitespace, followed by one or more keywords, sep-
     arated by whitespace.

     The third type of line is a file specification and consists of a file name, followed by whitespace, followed by zero or more whitespace sepa-
     rated keyword/value pairs.  The file name may be preceded by whitespace characters.  The file name may contain any of the standard file name
     matching characters (``['', ``]'', ``?'' or ``*''), in which case files in the hierarchy will be associated with the first pattern that they
     match.

     Each of the keyword/value pairs consist of a keyword, followed by an equals sign (``=''), followed by the keyword's value, without whitespace
     characters.  These values override, without changing, the global value of the corresponding keyword.

     All paths are relative.  Specifying a directory will cause subsequent files to be searched for in that directory hierarchy.  Which brings us
     to the last type of line in a specification: a line containing only the string ``..'' causes the current directory path to ascend one level.

     Empty lines and lines whose first non-whitespace character is a hash mark (``#'') are ignored.

     The mtree utility exits with a status of 0 on success, 1 if any error occurred, and 2 if the file hierarchy did not match the specification.
     A status of 2 is converted to a status of 0 if the -U option is used.

FILES

     /etc/mtree  system specification directory

DIAGNOSTICS

     The mtree utility exits 0 on success, and >0 if an error occurs.

SEE ALSO

     chflags(1), chgrp(1), chmod(1), cksum(1), stat(2), fts(3), chown(8)

HISTORY

     The mtree utility appeared in 4.3BSD-Reno.  The MD5 digest capability was added in FreeBSD 2.1, in response to the widespread use of programs
     which can spoof cksum(1).	The SHA-1 and RIPEMD160 digests were added in FreeBSD 4.0, as new attacks have demonstrated weaknesses in MD5.
     Support for file flags was added in FreeBSD 4.0, and mostly comes from NetBSD.

BSD
								 February 26, 1999							       BSD