finding all files that do not match a certain pattern


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers finding all files that do not match a certain pattern
# 1  
finding all files that do not match a certain pattern

I hope I'm asking this the right way --

I've been sending out a lot of resumes and some of them I saw on Craigslist -- so I named the file as 'Craigslist -- (filename)'. Well I noticed that at least one of the files was misspelled as 'Craigslit.'

I want to eventually try to write a shell script that will find all the Craigslist files that do NOT match the standard pattern and change it to the standard one. But I'm having trouble figuring out how to simply find any files that don't match the standard naming convention.

So, for example, how would I find all files that have 'Craigs' in it but not the 'list' part, such as Craigslit, Craigslt, Craigsliist, Craigslitt, etc -- all the possible combinations of letters that are NOT the desired 'list?'



I've tried to use the 'find' command with brackets and braces, but I think I'm not getting something. Or is there a better way/command to use?

Am I making sense?
# 2  
Bug One possible solution to your problem

Quote:
Originally Posted by Straitsfan
I hope I'm asking this the right way --

I've been sending out a lot of resumes and some of them I saw on Craigslist -- so I named the file as 'Craigslist -- (filename)'. Well I noticed that at least one of the files was misspelled as 'Craigslit.'

I want to eventually try to write a shell script that will find all the Craigslist files that do NOT match the standard pattern and change it to the standard one. But I'm having trouble figuring out how to simply find any files that don't match the standard naming convention.

So, for example, how would I find all files that have 'Craigs' in it but not the 'list' part, such as Craigslit, Craigslt, Craigsliist, Craigslitt, etc -- all the possible combinations of letters that are NOT the desired 'list?'



I've tried to use the 'find' command with brackets and braces, but I think I'm not getting something. Or is there a better way/command to use?

Am I making sense?

From what you have asked, I guess, the following command would do the trick

Code:
find . -print | grep -i "craigs*"

# 3  
Hi.

The agrep program was written to help with indexing ( Google glimpse for background). It allows approximate matching and you can control the number of "mistakes" it considers for a successful match. For example, using some of your data with 3 mistakes -- insertions, deletes, substitutions -- allowed per match:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate approximate matching, agrep.

# Infrastructure details, environment, commands for forum posts. 
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe ; pe "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
pe "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p printf specimen agrep
set -o nounset
pe

FILE=${1-data1}

# Display sample of data file, with head & tail as a last resort.
pe " || start [ first:middle:last ]"
specimen 10 $FILE \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"

pl " Results:"
agrep -3 "Craigslist" $FILE
grep -v "Craigslist"

exit 0

produces:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
printf - is a shell builtin [bash]
specimen (local) 1.17
agrep - ( /usr/bin/agrep Feb 7 2007 )

 || start [ first:middle:last ]
Whole: 10:0:10 of 20 lines in file "data1"
Craigsliist
grault
bar
garble
quux
Craigslitt
rCaigslitt
corge
foo
qux
plugh
baz
warg
thud
Craiglist
fred
raiglist
Craigslt
xyzzy
Craigslit
 || end

-----
 Results:
Craigsliist
Craigslitt
rCaigslitt
Craiglist
raiglist
Craigslt
Craigslit

This caught the variations including a missing "C", and an inversion "rC". The second standard grep is get rid of the correctly-named items.

The executable for agrep was in my Debian repository, but you can obtain it from agrep | freshmeat.net

Best wishes ... cheers, drl

( edit 1: better version, minor typo )

Last edited by drl; 06-05-2010 at 10:02 AM..
# 4  
is this what you wanted?

Code:
grep -v

     -v    Prints all lines except those that  contain  the  pat-
           tern.

# 5  
This is a very simple question and everyone seemed to have preferred the complex path.

Here is the answer to the question ... finding all files that do not match a certain pattern ... ?

Use the ! (exclammation mark).

Example:

This example will find all files that do not have a name ending with .pdf

Code:
find . -type f ! -iname "*.pdf"

You can place the "!" right before each expression. So the next one will find every file system entry that is not a directory. This 2nd example allows match symlinks also as well as any entry supported by the OS.

Code:
find . ! -type d ! -iname "*.pdf"

Cheers

A. Siby a.k.a Maestro
This User Gave Thanks to asiby For This Post:
# 6  
Code:
#!/bin/ksh
find . -name Craigs\* -print|while read filename
do
          # Remove directory name
          filename2=`basename "${filename}"`
          # Extract characters up to first space
          prefix=`echo "${filename2}"|awk '{print $1}'`
          # Is the filename prefix correct?
          if [ ! "${prefix}" = "Craigslist" ]
          then
                   echo "${filename}"
          fi
done

 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #544
Difficulty: Medium
Using global variables is generally considered a best practice in modern programming languages..
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding all files based on pattern

Hi All, I need to find all files in a directory which are containing specific pattern. Thing is that file name should not consider if pattern is only in commented area. all contents which are under /* */ are commented all lines which are starting with -- or if -- is a part of some sentence... (13 Replies)
Discussion started by: Lakshman_Gupta
13 Replies

2. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi... (2 Replies)
Discussion started by: invinzin21
2 Replies

3. Shell Programming and Scripting

Pattern match using grep between two files

Hello Everyone , I have two files. I want to pick line from file-1 and match with the complete data in file-2 , if there is a match print all the match lines in file 3. Below is the file cat test1.txt vikas vikasjain j ain testt douknow hello@vik@ # 33 ||@@ vcpzxcmvhvdsh... (1 Reply)
Discussion started by: mailvkjain
1 Replies

4. UNIX for Dummies Questions & Answers

Finding the same pattern in three consecutive lines in several files in a directory

I know how to search for a pattern/regular expression in many files that I have in a directory. For example, by doing this: grep -Ril "News/U.S." . I can find which files contain the pattern "News/U.S." in a directory. I am unable to accomplish about how to extend this code so that it can... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

5. Shell Programming and Scripting

Finding log files that match number pattern

I have logs files which are generated each day depending on how many processes are running. Some days it could spin up 30 processes. Other days it could spin up 50. The log files all have the same pattern with the number being the different factor. e.g. LOG_FILE_1.log LOG_FILE_2.log etc etc ... (2 Replies)
Discussion started by: atelford
2 Replies

6. Shell Programming and Scripting

Finding 4 current files having specific File Name pattern

Hi All, I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length. 1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number... (6 Replies)
Discussion started by: lancesunny
6 Replies

7. Shell Programming and Scripting

Finding conserved pattern in different files

Hi power user, For examples, I have three different files: file 1: file2: file 3: AAA CCC ZZZ BBB BBB CCC CCC DDD DDD DDD TTT AAA EEE AAA XXX I... (8 Replies)
Discussion started by: anjas
8 Replies

8. UNIX for Dummies Questions & Answers

Finding Unique strings which match pattern

I need to grep for a pattern in a file. Files are huge and have several repeated occurances of the strings which match pattern. I just need the strings which contain the pattern in the output. For eg. The contents of my file are as follows. The pattern I want to match by is ABCD ... (5 Replies)
Discussion started by: tektips
5 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

10. Shell Programming and Scripting

Finding a specific pattern from thousands of files ????

Hi All, I want to find a specific pattern from approximately 400000 files on solaris platform. Its very heavy for me to grep that pattern to each file individually. Can anybody suggest me some way to search for specific pattern (alpha numeric) from these forty thousand files. Please note that... (6 Replies)
Discussion started by: aarora_98
6 Replies

Featured Tech Videos