finding all files that do not match a certain pattern Post: 302427400

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers finding all files that do not match a certain pattern Post 302427400 by drl on Saturday 5th of June 2010 07:32:53 AM

06-05-2010

Registered User

Hi.

The agrep program was written to help with indexing ( Google glimpse for background). It allows approximate matching and you can control the number of "mistakes" it considers for a successful match. For example, using some of your data with 3 mistakes -- insertions, deletes, substitutions -- allowed per match:

Code:

#!/usr/bin/env bash

# @(#) s1	Demonstrate approximate matching, agrep.

# Infrastructure details, environment, commands for forum posts. 
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe ; pe "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
pe "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p printf specimen agrep
set -o nounset
pe

FILE=${1-data1}

# Display sample of data file, with head & tail as a last resort.
pe " || start [ first:middle:last ]"
specimen 10 $FILE \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"

pl " Results:"
agrep -3 "Craigslist" $FILE
grep -v "Craigslist"

exit 0

produces:

Code:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
printf - is a shell builtin [bash]
specimen (local) 1.17
agrep - ( /usr/bin/agrep Feb 7 2007 )

 || start [ first:middle:last ]
Whole: 10:0:10 of 20 lines in file "data1"
Craigsliist
grault
bar
garble
quux
Craigslitt
rCaigslitt
corge
foo
qux
plugh
baz
warg
thud
Craiglist
fred
raiglist
Craigslt
xyzzy
Craigslit
 || end

-----
 Results:
Craigsliist
Craigslitt
rCaigslitt
Craiglist
raiglist
Craigslt
Craigslit

This caught the variations including a missing "C", and an inversion "rC". The second standard grep is get rid of the correctly-named items.

The executable for agrep was in my Debian repository, but you can obtain it from agrep | freshmeat.net

Best wishes ... cheers, drl

( edit 1: better version, minor typo )

Last edited by drl; 06-05-2010 at 09:02 AM..

drl

View Public Profile for drl

Find all posts by drl

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Hi All, I want to find a specific pattern from approximately 400000 files on solaris platform. Its very heavy for me to grep that pattern to each file individually. Can anybody suggest me some way to search for specific pattern (alpha numeric) from these forty thousand files. Please note that...

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more...

3. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

I need to grep for a pattern in a file. Files are huge and have several repeated occurances of the strings which match pattern. I just need the strings which contain the pattern in the output. For eg. The contents of my file are as follows. The pattern I want to match by is ABCD ...

4. Shell Programming and Scripting

Finding conserved pattern in different files

Hi power user, For examples, I have three different files: file 1: file2: file 3: AAA CCC ZZZ BBB BBB CCC CCC DDD DDD DDD TTT AAA EEE AAA XXX I...

5. Shell Programming and Scripting

Finding 4 current files having specific File Name pattern

Hi All, I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length. 1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number...

6. Shell Programming and Scripting

Finding log files that match number pattern

I have logs files which are generated each day depending on how many processes are running. Some days it could spin up 30 processes. Other days it could spin up 50. The log files all have the same pattern with the number being the different factor. e.g. LOG_FILE_1.log LOG_FILE_2.log etc etc ...

7. UNIX for Dummies Questions & Answers

Finding the same pattern in three consecutive lines in several files in a directory

I know how to search for a pattern/regular expression in many files that I have in a directory. For example, by doing this: grep -Ril "News/U.S." . I can find which files contain the pattern "News/U.S." in a directory. I am unable to accomplish about how to extend this code so that it can...

8. Shell Programming and Scripting

Pattern match using grep between two files

Hello Everyone , I have two files. I want to pick line from file-1 and match with the complete data in file-2 , if there is a match print all the match lines in file 3. Below is the file cat test1.txt vikas vikasjain j ain testt douknow hello@vik@ # 33 ||@@ vcpzxcmvhvdsh...

9. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi...

10. Shell Programming and Scripting

Finding all files based on pattern

Hi All, I need to find all files in a directory which are containing specific pattern. Thing is that file name should not consider if pattern is only in commented area. all contents which are under /* */ are commented all lines which are starting with -- or if -- is a part of some sentence...

LEARN ABOUT DEBIAN

xml2po

XML2PO(1)							  [FIXME: manual]							 XML2PO(1)

NAME

       xml2po - program to create a PO-template file from a DocBook XML file and merge it back into a (translated) XML file

SYNOPSIS

       xml2po [OPTIONS] [XMLFILE]

DESCRIPTION

       This manual page documents briefly the xml2po command.

       xml2po is a simple Python program which extracts translatable content from free-form XML documents and outputs gettext compatible POT
       files. Translated PO files can be turned into XML output again.

       It can work it's magic with most "simple" tags, and for complicated tags one has to provide a list of all tags which are "final" (that will
       be put into one "message" in PO file), "ignored" (skipped over) and "space preserving".

OPTIONS

       The program follows the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included
       below.

       -a, --automatic-tags
	   Automatically decide if tags are to be considered "final" or not.

       -k, --keep-entities
	   Don't expand entities (default). See also the -e option.

       -e, --expand-all-entities
	   Expand all entities (including SYSTEM ones).

       -m, --mode=TYPE
	   Treat tags as type TYPE (default: docbook).

       -o, --output=FILE
	   Print resulting text (XML while merging translations with "-p" or "-t" options, POT template file while extracting strings, and
	   translated PO file with "-r" option) to the given FILE.

       -p, --po-file=FILE
	   Specify a PO FILE containing translation and output XML document with translations merged in.

       -r, --reuse=FILE
	   Specify a translated XML document in FILE with the same structure to generate translated PO file for XML document given on command
	   line.

       -t, --translation=FILE
	   Specify a MO file containing translation and output XML document with translations merged in.

       -u, --update-translation=LANG.po
	   Update a PO file using msgmerge.

       -l, --language=LANG
	   Explicitly set language of the translation.

       -h, --help
	   Show summary of options.

       -v, --version
	   Show version of program.

EXAMPLES

   Creating POT template files
       To create a POT template book.pot from an input file book.xml, which consists of chapter1.xml and chapter2.xml (external entities), run:

			       /usr/bin/xml2po -o book.pot book.xml chapter1.xml chapter2.xml

       To expand entities use the -e option:

			       /usr/bin/xml2po -e -o book.pot book.xml

   Creating translated XML files (merging back PO files)
       After translating book.pot into LANG.po, merge the translations back by using -p option for each XML file:

			       /usr/bin/xml2po -p LANG.po -o book.LANG.xml book.xml
			       /usr/bin/xml2po -p LANG.po -o chapter1.LANG.xml chapter1.xml
			       /usr/bin/xml2po -p LANG.po -o chapter2.LANG.xml chapter2.xml

       If you used the -e option to expand entities, you should use it again to merge back the translation into an XML file:

			       /usr/bin/xml2po -e -p LANG.po -o book.LANG.xml book.xml

   Updating PO files
       When base XML file changes, the real advantages of PO files come to surface. There are 2 ways to merge the translation. The first is to
       produce a new POT template file (additionally use the -e if you decided earlier to expand entities). Afterwards run msgmerge to merge the
       translation with the new POT file:

			       /usr/bin/msgmerge -o tmp.po LANG.po book.pot

       Now rename tmp.po to LANG.po and update your translation. Alternatively, xml2po provides the -u option, which does exactly these two steps
       for you. The advantage is, that it also runs msgfmt to give you a statistical output of translation status (count of translated,
       untranslated and fuzzy messages). Additionally use the -e if you decided earlier to expand entities:

			       /usr/bin/xml2po -u LANG.po book.xml

SEE ALSO

       msgmerge (1), msgfmt (1)

AUTHOR

       This manual page was written by Daniel Leidert daniel.leidert@wgdd.de for the Debian system (but may be used by others). Permission is
       granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version
       published by the Free Software Foundation.

COPYRIGHT

       Copyright (C) 2005 Daniel Leidert

[FIXME: source] 						    2005/02/10								 XML2PO(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Discussion started by: aarora_98

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Discussion started by: jerome Sukumar

3. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

Discussion started by: tektips

4. Shell Programming and Scripting

Finding conserved pattern in different files

Discussion started by: anjas