finding all files that do not match a certain pattern Post: 302427400

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers finding all files that do not match a certain pattern Post 302427400 by drl on Saturday 5th of June 2010 07:32:53 AM

06-05-2010

Registered User

Hi.

The agrep program was written to help with indexing ( Google glimpse for background). It allows approximate matching and you can control the number of "mistakes" it considers for a successful match. For example, using some of your data with 3 mistakes -- insertions, deletes, substitutions -- allowed per match:

Code:

#!/usr/bin/env bash

# @(#) s1	Demonstrate approximate matching, agrep.

# Infrastructure details, environment, commands for forum posts. 
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe ; pe "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
pe "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p printf specimen agrep
set -o nounset
pe

FILE=${1-data1}

# Display sample of data file, with head & tail as a last resort.
pe " || start [ first:middle:last ]"
specimen 10 $FILE \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"

pl " Results:"
agrep -3 "Craigslist" $FILE
grep -v "Craigslist"

exit 0

produces:

Code:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
printf - is a shell builtin [bash]
specimen (local) 1.17
agrep - ( /usr/bin/agrep Feb 7 2007 )

 || start [ first:middle:last ]
Whole: 10:0:10 of 20 lines in file "data1"
Craigsliist
grault
bar
garble
quux
Craigslitt
rCaigslitt
corge
foo
qux
plugh
baz
warg
thud
Craiglist
fred
raiglist
Craigslt
xyzzy
Craigslit
 || end

-----
 Results:
Craigsliist
Craigslitt
rCaigslitt
Craiglist
raiglist
Craigslt
Craigslit

This caught the variations including a missing "C", and an inversion "rC". The second standard grep is get rid of the correctly-named items.

The executable for agrep was in my Debian repository, but you can obtain it from agrep | freshmeat.net

Best wishes ... cheers, drl

( edit 1: better version, minor typo )

Last edited by drl; 06-05-2010 at 09:02 AM..

drl

View Public Profile for drl

Find all posts by drl

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Hi All, I want to find a specific pattern from approximately 400000 files on solaris platform. Its very heavy for me to grep that pattern to each file individually. Can anybody suggest me some way to search for specific pattern (alpha numeric) from these forty thousand files. Please note that...

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more...

3. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

I need to grep for a pattern in a file. Files are huge and have several repeated occurances of the strings which match pattern. I just need the strings which contain the pattern in the output. For eg. The contents of my file are as follows. The pattern I want to match by is ABCD ...

4. Shell Programming and Scripting

Finding conserved pattern in different files

Hi power user, For examples, I have three different files: file 1: file2: file 3: AAA CCC ZZZ BBB BBB CCC CCC DDD DDD DDD TTT AAA EEE AAA XXX I...

5. Shell Programming and Scripting

Finding 4 current files having specific File Name pattern

Hi All, I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length. 1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number...

6. Shell Programming and Scripting

Finding log files that match number pattern

I have logs files which are generated each day depending on how many processes are running. Some days it could spin up 30 processes. Other days it could spin up 50. The log files all have the same pattern with the number being the different factor. e.g. LOG_FILE_1.log LOG_FILE_2.log etc etc ...

7. UNIX for Dummies Questions & Answers

Finding the same pattern in three consecutive lines in several files in a directory

I know how to search for a pattern/regular expression in many files that I have in a directory. For example, by doing this: grep -Ril "News/U.S." . I can find which files contain the pattern "News/U.S." in a directory. I am unable to accomplish about how to extend this code so that it can...

8. Shell Programming and Scripting

Pattern match using grep between two files

Hello Everyone , I have two files. I want to pick line from file-1 and match with the complete data in file-2 , if there is a match print all the match lines in file 3. Below is the file cat test1.txt vikas vikasjain j ain testt douknow hello@vik@ # 33 ||@@ vcpzxcmvhvdsh...

9. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi...

10. Shell Programming and Scripting

Finding all files based on pattern

Hi All, I need to find all files in a directory which are containing specific pattern. Thing is that file name should not consider if pattern is only in commented area. all contents which are under /* */ are commented all lines which are starting with -- or if -- is a part of some sentence...

LEARN ABOUT OPENSOLARIS

head

head(1) 							   User Commands							   head(1)

NAME

       head - display first few lines of files

SYNOPSIS

   /usr/bin/head
       /usr/bin/head [-number | -n number] [filename]...

   ksh93
       head [-qv] [-n lines] [-c chars] [-s skip][filename]...

DESCRIPTION

   /usr/bin/head
       The  head utility copies the first number of lines of each filename to the standard output. If no filename is given, head copies lines from
       the standard input. The default value of number is 10 lines.

       When more than one file is specified, the start of each file looks like:

	 ==> filename <==

       Thus, a common way to display a set of short files, identifying each one, is:

	 example% head -9999 filename1 filename2 ...

   ksh93
       The head built-in in ksh93 is associated with the /bin and /usr/bin paths. It is invoked when head is executed without  a  pathname  prefix
       and the pathname search finds a /bin/head or /usr/bin/head executable.

       head  copies  one  or more input files to standard output, stopping at a designated point for each file or to the end of the file whichever
       comes first. Copying ends at the point indicated by the options. By default, a header of the form ==> filename <== is output before all but
       the  first  file  but  this  can be changed with the -q and -v options. If no file is given, or if the file is -, head copies from standard
       input starting at the current location.

       The option argument for -c and -s can optionally be followed by one of the following characters to specify a different unit  other  than  a
       single byte:

       b    512 bytes

       k    1-kilobyte

       m    1-megabyte

       For backwards compatibility, -number is equivalent to -n number.

OPTIONS

   /usr/bin/head
       The following options are supported by /usr/bin/head:

       -n number    The  first number lines of each input file is copied to standard output. The number option-argument must be a positive decimal
		    integer.

       -number	    The number argument is a positive decimal integer with the same effect as the -n number option.

       If no options are specified, head acts as if -n 10 had been specified.

   ksh93
       The following options are supported by the head built-in command in ksh93:

       -n		 Copy lines from each file. The default value is 10.
       --lines=lines

       -c		 Copy chars bytes from each file.
       --bytes=chars

       -q		 Never output filename headers.
       --quiet|silent

       -s		 Skip skip characters or lines from each file before copying.
       --skip=skip

       -v		 Always output filename headers.
       --verbose

OPERANDS

       The following operand is supported:

       filename    A path name of an input file. If no file operands are specified, the standard input is used.

USAGE

       See largefile(5) for the description of the behavior of head when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).

EXAMPLES

       Example 1 Writing the First Ten Lines of All Files

       The following example writes the first ten lines of all files, except those with a leading period, in the directory:

	 example% head *

ENVIRONMENT VARIABLES

       See environ(5) for descriptions of the following environment variables that affect the execution of head: LANG, LC_ALL,	LC_CTYPE,  LC_MES-
       SAGES, and NLSPATH.

EXIT STATUS

       The following exit values are returned:

       0     Successful completion.

       >0    An error occurred.

ATTRIBUTES

       See attributes(5) for descriptions of the following attributes:

   /usr/bin/head
       +-----------------------------+-----------------------------+
       |      ATTRIBUTE TYPE	     |	    ATTRIBUTE VALUE	   |
       +-----------------------------+-----------------------------+
       |Availability		     |SUNWcsu			   |
       +-----------------------------+-----------------------------+
       |CSI			     |Enabled			   |
       +-----------------------------+-----------------------------+
       |Interface Stability	     |Committed 		   |
       +-----------------------------+-----------------------------+
       |Standard		     |See standards(5). 	   |
       +-----------------------------+-----------------------------+

   ksh93
       +-----------------------------+-----------------------------+
       |      ATTRIBUTE TYPE	     |	    ATTRIBUTE VALUE	   |
       +-----------------------------+-----------------------------+
       |Availability		     |SUNWcsu			   |
       +-----------------------------+-----------------------------+
       |Interface Stability	     |See below.		   |
       +-----------------------------+-----------------------------+

       The ksh93 built-in binding to /bin and /usr/bin is Volatile. The built-in interfaces are Uncommitted.

SEE ALSO

       cat(1), ksh93(1), more(1), pg(1), tail(1), attributes(5), environ(5), largefile(5), standards(5)

SunOS 5.11							    2 Nov 2007								   head(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Discussion started by: aarora_98

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Discussion started by: jerome Sukumar

3. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

Discussion started by: tektips

4. Shell Programming and Scripting

Finding conserved pattern in different files

Discussion started by: anjas