Sponsored Content
Top Forums UNIX for Dummies Questions & Answers finding all files that do not match a certain pattern Post 302427400 by drl on Saturday 5th of June 2010 07:32:53 AM
Old 06-05-2010
Hi.

The agrep program was written to help with indexing ( Google glimpse for background). It allows approximate matching and you can control the number of "mistakes" it considers for a successful match. For example, using some of your data with 3 mistakes -- insertions, deletes, substitutions -- allowed per match:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate approximate matching, agrep.

# Infrastructure details, environment, commands for forum posts. 
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe ; pe "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
pe "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p printf specimen agrep
set -o nounset
pe

FILE=${1-data1}

# Display sample of data file, with head & tail as a last resort.
pe " || start [ first:middle:last ]"
specimen 10 $FILE \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"

pl " Results:"
agrep -3 "Craigslist" $FILE
grep -v "Craigslist"

exit 0

produces:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
printf - is a shell builtin [bash]
specimen (local) 1.17
agrep - ( /usr/bin/agrep Feb 7 2007 )

 || start [ first:middle:last ]
Whole: 10:0:10 of 20 lines in file "data1"
Craigsliist
grault
bar
garble
quux
Craigslitt
rCaigslitt
corge
foo
qux
plugh
baz
warg
thud
Craiglist
fred
raiglist
Craigslt
xyzzy
Craigslit
 || end

-----
 Results:
Craigsliist
Craigslitt
rCaigslitt
Craiglist
raiglist
Craigslt
Craigslit

This caught the variations including a missing "C", and an inversion "rC". The second standard grep is get rid of the correctly-named items.

The executable for agrep was in my Debian repository, but you can obtain it from agrep | freshmeat.net

Best wishes ... cheers, drl

( edit 1: better version, minor typo )

Last edited by drl; 06-05-2010 at 09:02 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Hi All, I want to find a specific pattern from approximately 400000 files on solaris platform. Its very heavy for me to grep that pattern to each file individually. Can anybody suggest me some way to search for specific pattern (alpha numeric) from these forty thousand files. Please note that... (6 Replies)
Discussion started by: aarora_98
6 Replies

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

3. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

I need to grep for a pattern in a file. Files are huge and have several repeated occurances of the strings which match pattern. I just need the strings which contain the pattern in the output. For eg. The contents of my file are as follows. The pattern I want to match by is ABCD ... (5 Replies)
Discussion started by: tektips
5 Replies

4. Shell Programming and Scripting

Finding conserved pattern in different files

Hi power user, For examples, I have three different files: file 1: file2: file 3: AAA CCC ZZZ BBB BBB CCC CCC DDD DDD DDD TTT AAA EEE AAA XXX I... (8 Replies)
Discussion started by: anjas
8 Replies

5. Shell Programming and Scripting

Finding 4 current files having specific File Name pattern

Hi All, I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length. 1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number... (6 Replies)
Discussion started by: lancesunny
6 Replies

6. Shell Programming and Scripting

Finding log files that match number pattern

I have logs files which are generated each day depending on how many processes are running. Some days it could spin up 30 processes. Other days it could spin up 50. The log files all have the same pattern with the number being the different factor. e.g. LOG_FILE_1.log LOG_FILE_2.log etc etc ... (2 Replies)
Discussion started by: atelford
2 Replies

7. UNIX for Dummies Questions & Answers

Finding the same pattern in three consecutive lines in several files in a directory

I know how to search for a pattern/regular expression in many files that I have in a directory. For example, by doing this: grep -Ril "News/U.S." . I can find which files contain the pattern "News/U.S." in a directory. I am unable to accomplish about how to extend this code so that it can... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

8. Shell Programming and Scripting

Pattern match using grep between two files

Hello Everyone , I have two files. I want to pick line from file-1 and match with the complete data in file-2 , if there is a match print all the match lines in file 3. Below is the file cat test1.txt vikas vikasjain j ain testt douknow hello@vik@ # 33 ||@@ vcpzxcmvhvdsh... (1 Reply)
Discussion started by: mailvkjain
1 Replies

9. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi... (2 Replies)
Discussion started by: invinzin21
2 Replies

10. Shell Programming and Scripting

Finding all files based on pattern

Hi All, I need to find all files in a directory which are containing specific pattern. Thing is that file name should not consider if pattern is only in commented area. all contents which are under /* */ are commented all lines which are starting with -- or if -- is a part of some sentence... (13 Replies)
Discussion started by: Lakshman_Gupta
13 Replies
XML2PO(1)							  [FIXME: manual]							 XML2PO(1)

NAME
xml2po - program to create a PO-template file from a DocBook XML file and merge it back into a (translated) XML file SYNOPSIS
xml2po [OPTIONS] [XMLFILE] DESCRIPTION
This manual page documents briefly the xml2po command. xml2po is a simple Python program which extracts translatable content from free-form XML documents and outputs gettext compatible POT files. Translated PO files can be turned into XML output again. It can work it's magic with most "simple" tags, and for complicated tags one has to provide a list of all tags which are "final" (that will be put into one "message" in PO file), "ignored" (skipped over) and "space preserving". OPTIONS
The program follows the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. -a, --automatic-tags Automatically decide if tags are to be considered "final" or not. -k, --keep-entities Don't expand entities (default). See also the -e option. -e, --expand-all-entities Expand all entities (including SYSTEM ones). -m, --mode=TYPE Treat tags as type TYPE (default: docbook). -o, --output=FILE Print resulting text (XML while merging translations with "-p" or "-t" options, POT template file while extracting strings, and translated PO file with "-r" option) to the given FILE. -p, --po-file=FILE Specify a PO FILE containing translation and output XML document with translations merged in. -r, --reuse=FILE Specify a translated XML document in FILE with the same structure to generate translated PO file for XML document given on command line. -t, --translation=FILE Specify a MO file containing translation and output XML document with translations merged in. -u, --update-translation=LANG.po Update a PO file using msgmerge. -l, --language=LANG Explicitly set language of the translation. -h, --help Show summary of options. -v, --version Show version of program. EXAMPLES
Creating POT template files To create a POT template book.pot from an input file book.xml, which consists of chapter1.xml and chapter2.xml (external entities), run: /usr/bin/xml2po -o book.pot book.xml chapter1.xml chapter2.xml To expand entities use the -e option: /usr/bin/xml2po -e -o book.pot book.xml Creating translated XML files (merging back PO files) After translating book.pot into LANG.po, merge the translations back by using -p option for each XML file: /usr/bin/xml2po -p LANG.po -o book.LANG.xml book.xml /usr/bin/xml2po -p LANG.po -o chapter1.LANG.xml chapter1.xml /usr/bin/xml2po -p LANG.po -o chapter2.LANG.xml chapter2.xml If you used the -e option to expand entities, you should use it again to merge back the translation into an XML file: /usr/bin/xml2po -e -p LANG.po -o book.LANG.xml book.xml Updating PO files When base XML file changes, the real advantages of PO files come to surface. There are 2 ways to merge the translation. The first is to produce a new POT template file (additionally use the -e if you decided earlier to expand entities). Afterwards run msgmerge to merge the translation with the new POT file: /usr/bin/msgmerge -o tmp.po LANG.po book.pot Now rename tmp.po to LANG.po and update your translation. Alternatively, xml2po provides the -u option, which does exactly these two steps for you. The advantage is, that it also runs msgfmt to give you a statistical output of translation status (count of translated, untranslated and fuzzy messages). Additionally use the -e if you decided earlier to expand entities: /usr/bin/xml2po -u LANG.po book.xml SEE ALSO
msgmerge (1), msgfmt (1) AUTHOR
This manual page was written by Daniel Leidert daniel.leidert@wgdd.de for the Debian system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. COPYRIGHT
Copyright (C) 2005 Daniel Leidert [FIXME: source] 2005/02/10 XML2PO(1)
All times are GMT -4. The time now is 07:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy