Sponsored Content
Top Forums UNIX for Dummies Questions & Answers finding all files that do not match a certain pattern Post 302427400 by drl on Saturday 5th of June 2010 07:32:53 AM
Old 06-05-2010
Hi.

The agrep program was written to help with indexing ( Google glimpse for background). It allows approximate matching and you can control the number of "mistakes" it considers for a successful match. For example, using some of your data with 3 mistakes -- insertions, deletes, substitutions -- allowed per match:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate approximate matching, agrep.

# Infrastructure details, environment, commands for forum posts. 
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe ; pe "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
pe "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p printf specimen agrep
set -o nounset
pe

FILE=${1-data1}

# Display sample of data file, with head & tail as a last resort.
pe " || start [ first:middle:last ]"
specimen 10 $FILE \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"

pl " Results:"
agrep -3 "Craigslist" $FILE
grep -v "Craigslist"

exit 0

produces:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
printf - is a shell builtin [bash]
specimen (local) 1.17
agrep - ( /usr/bin/agrep Feb 7 2007 )

 || start [ first:middle:last ]
Whole: 10:0:10 of 20 lines in file "data1"
Craigsliist
grault
bar
garble
quux
Craigslitt
rCaigslitt
corge
foo
qux
plugh
baz
warg
thud
Craiglist
fred
raiglist
Craigslt
xyzzy
Craigslit
 || end

-----
 Results:
Craigsliist
Craigslitt
rCaigslitt
Craiglist
raiglist
Craigslt
Craigslit

This caught the variations including a missing "C", and an inversion "rC". The second standard grep is get rid of the correctly-named items.

The executable for agrep was in my Debian repository, but you can obtain it from agrep | freshmeat.net

Best wishes ... cheers, drl

( edit 1: better version, minor typo )

Last edited by drl; 06-05-2010 at 09:02 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Hi All, I want to find a specific pattern from approximately 400000 files on solaris platform. Its very heavy for me to grep that pattern to each file individually. Can anybody suggest me some way to search for specific pattern (alpha numeric) from these forty thousand files. Please note that... (6 Replies)
Discussion started by: aarora_98
6 Replies

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

3. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

I need to grep for a pattern in a file. Files are huge and have several repeated occurances of the strings which match pattern. I just need the strings which contain the pattern in the output. For eg. The contents of my file are as follows. The pattern I want to match by is ABCD ... (5 Replies)
Discussion started by: tektips
5 Replies

4. Shell Programming and Scripting

Finding conserved pattern in different files

Hi power user, For examples, I have three different files: file 1: file2: file 3: AAA CCC ZZZ BBB BBB CCC CCC DDD DDD DDD TTT AAA EEE AAA XXX I... (8 Replies)
Discussion started by: anjas
8 Replies

5. Shell Programming and Scripting

Finding 4 current files having specific File Name pattern

Hi All, I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length. 1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number... (6 Replies)
Discussion started by: lancesunny
6 Replies

6. Shell Programming and Scripting

Finding log files that match number pattern

I have logs files which are generated each day depending on how many processes are running. Some days it could spin up 30 processes. Other days it could spin up 50. The log files all have the same pattern with the number being the different factor. e.g. LOG_FILE_1.log LOG_FILE_2.log etc etc ... (2 Replies)
Discussion started by: atelford
2 Replies

7. UNIX for Dummies Questions & Answers

Finding the same pattern in three consecutive lines in several files in a directory

I know how to search for a pattern/regular expression in many files that I have in a directory. For example, by doing this: grep -Ril "News/U.S." . I can find which files contain the pattern "News/U.S." in a directory. I am unable to accomplish about how to extend this code so that it can... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

8. Shell Programming and Scripting

Pattern match using grep between two files

Hello Everyone , I have two files. I want to pick line from file-1 and match with the complete data in file-2 , if there is a match print all the match lines in file 3. Below is the file cat test1.txt vikas vikasjain j ain testt douknow hello@vik@ # 33 ||@@ vcpzxcmvhvdsh... (1 Reply)
Discussion started by: mailvkjain
1 Replies

9. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi... (2 Replies)
Discussion started by: invinzin21
2 Replies

10. Shell Programming and Scripting

Finding all files based on pattern

Hi All, I need to find all files in a directory which are containing specific pattern. Thing is that file name should not consider if pattern is only in commented area. all contents which are under /* */ are commented all lines which are starting with -- or if -- is a part of some sentence... (13 Replies)
Discussion started by: Lakshman_Gupta
13 Replies
tail(1) 							   User Commands							   tail(1)

NAME
tail - deliver the last part of a file SYNOPSIS
/usr/bin/tail [+-s number [lbcr]] [file] /usr/bin/tail [-lbcr] [file] /usr/bin/tail [+- number [lbcf]] [file] /usr/bin/tail [-lbcf] [file] /usr/xpg4/bin/tail [-f | -r] [-c number | -n number] [file] /usr/xpg4/bin/tail [+- number [l | b | c] [f]] [file] /usr/xpg4/bin/tail [+- number [l] [f | r]] [file] DESCRIPTION
The tail utility copies the named file to the standard output beginning at a designated place. If no file is named, the standard input is used. Copying begins at a point in the file indicated by the -cnumber, -nnumber, or +-number options (if +number is specified, begins at distance number from the beginning; if -number is specified, from the end of the input; if number is NULL, the value 10 is assumed). number is counted in units of lines or byte according to the -c or -n options, or lines, blocks, or bytes, according to the appended option l, b, or c. When no units are specified, counting is by lines. OPTIONS
The following options are supported for both /usr/bin/tail and /usr/xpg4/bin/tail. The -r and -f options are mutually exclusive. If both are specified on the command line, the -f option is ignored. -b Units of blocks. -c Units of bytes. -f Follow. If the input-file is not a pipe, the program does not terminate after the line of the input-file has been copied, but enters an endless loop, wherein it sleeps for a second and then attempts to read and copy further records from the input-file. Thus it can be used to monitor the growth of a file that is being written by some other process. -l Units of lines. -r Reverse. Copies lines from the specified starting point in the file in reverse order. The default for r is to print the entire file in reverse order. /usr/xpg4/bin/tail The following options are supported for /usr/xpg4/bin/tail only: -c number The number option-argument must be a decimal integer whose sign affects the location in the file, measured in bytes, to begin the copying: + Copying starts relative to the beginning of the file. - Copying starts relative to the end of the file. none Copying starts relative to the end of the file. The origin for counting is 1; that is, -c+1 represents the first byte of the file, -c-1 the last. -n number Equivalent to -cnumber, except the starting location in the file is measured in lines instead of bytes. The origin for count- ing is 1. That is, -n+1 represents the first line of the file, -n-1 the last. OPERANDS
The following operand is supported: file A path name of an input file. If no file operands are specified, the standard input is used. USAGE
See largefile(5) for the description of the behavior of tail when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes). EXAMPLES
Example 1 Using the tail Command The following command prints the last ten lines of the file fred, followed by any lines that are appended to fred between the time tail is initiated and killed. example% tail -f fred The next command prints the last 15 bytes of the file fred, followed by any lines that are appended to fred between the time tail is initi- ated and killed: example% tail -15cf fred ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of tail: LANG, LC_ALL, LC_CTYPE, LC_MES- SAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 Successful completion. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: /usr/bin/tail +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ |CSI |Enabled | +-----------------------------+-----------------------------+ /usr/xpg4/bin/tail +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWxcu4 | +-----------------------------+-----------------------------+ |CSI |Enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
cat(1), head(1), more(1), pg(1), dd(1M), attributes(5), environ(5), largefile(5), standards(5) NOTES
Piped tails relative to the end of the file are stored in a buffer, and thus are limited in length. Various kinds of anomalous behavior can happen with character special files. SunOS 5.11 13 Jul 2005 tail(1)
All times are GMT -4. The time now is 11:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy