Pattern Matching Syntax


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern Matching Syntax
# 1  
Old 07-13-2010
Pattern Matching Syntax

Hi,

I am trying to write a script to rename a batch of computer files.

The format of the files can appear in the following ways.

Title_Title2_Title3_ABCD0123_Title4.doc
Title title2 DEFG5678 Title3 Title4.doc
XYZA1234-Title.doc

The one constant that I am interested in the file is highlighted in bold. I want to be able capture those details which are always in the following format of 4 letters and 4 numbers then rename the file and move four letters and numbers to the end of the files title.

so the output I want would end up being

title_title2_title3_title4[ABCD1234].doc
title_title2_title3[DEFG5678].doc
title-[XYZA1234].doc

Thank you
For any suggestions you can provide.
# 2  
Old 07-13-2010
Try...
Code:
ls *.doc | awk 'BEGIN{OFS=FS=".";q="\047"}
     match($1,/[A-Z][A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9][-_ ]?/) {
          cmd = "mv " q $0 q " " q \
                substr($1,1,RSTART-1) \
                substr($1,RSTART+RLENGTH) \
                "[" substr($1,RSTART,8) "]" OFS $2 q
          print cmd
          #system (cmd)
     }'

Uncomment the system command if it does what you want.

Result from samples gives...
Code:
mv 'Title_Title2_Title3_ABCD0123_Title4.doc' 'Title_Title2_Title3_Title4[ABCD0123].doc'
mv 'Title title2 DEFG5678 Title3 Title4.doc' 'Title title2 Title3 Title4[DEFG5678].doc'
mv 'XYZA1234-Title.doc' 'Title[XYZA1234].doc'

# 3  
Old 07-14-2010
Code:
tr '_' ' ' | egrep -w -o [A-Z]{4}[0-9]{4}

Almost one command but I had to use tr first to get rid of the underscore because it's seen as part of the word in the first line. Darn. What it does is grep the line for a word that consists of capital letters (4) and numbers (4). -o displays only the match and not the whole line.

test:
Code:
# echo 'Title_Title2_Title3_ABCD0123_Title4.doc
Title title2 DEFG5678 Title3 Title4.doc
XYZA1234-Title.doc
extra testing lines:
1234
abcd1234
abcd123
abc1234
1234abcd
1234ABCD' | tr '_' ' ' | egrep -w -o [A-Z]{4}[0-9]{4}
ABCD0123
DEFG5678
XYZA1234
#

It didn't grep "abcd1234" because it's not capital letters. If you need case insensitive, change the [A-Z] to [aA-zZ]
# 4  
Old 07-14-2010
Code:
#! /usr/bin/sh

ls *.doc|while read file
do
  new=$(echo $file |sed 's/\(.*\)\([A-Z]\{4\}[0-9]\{4\}\)\([ -_]\)\(.*\)\(\..*\)/\1\4\3[\2]\5/')
  echo "mv \"$file\" \"$new\""
done

Code:
mv "Title title2 DEFG5678 Title3 Title4.doc" "Title title2 Title3 Title4 [DEFG5678].doc"
mv "Title_Title2_Title3_ABCD0123_Title4.doc" "Title_Title2_Title3_Title4_[ABCD0123].doc"
mv "XYZA1234-Title.doc" "Title-[XYZA1234].doc"



---------- Post updated at 02:14 PM ---------- Previous update was at 01:59 PM ----------

Quote:
Originally Posted by Ygor
Try...
Code:
ls *.doc | awk 'BEGIN{OFS=FS=".";q="\047"}
     match($1,/[A-Z][A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9][-_ ]?/) {
          cmd = "mv " q $0 q " " q \
                substr($1,1,RSTART-1) \
                substr($1,RSTART+RLENGTH) \
                "[" substr($1,RSTART,8) "]" OFS $2 q
          print cmd
          #system (cmd)
     }'

Uncomment the system command if it does what you want.

Result from samples gives...
Code:
mv 'Title_Title2_Title3_ABCD0123_Title4.doc' 'Title_Title2_Title3_Title4[ABCD0123].doc'
mv 'Title title2 DEFG5678 Title3 Title4.doc' 'Title title2 Title3 Title4[DEFG5678].doc'
mv 'XYZA1234-Title.doc' 'Title[XYZA1234].doc'

In one day, two posts to use the match function. Gr8

Another is
https://www.unix.com/shell-programmin...ng-spaces.html
# 5  
Old 07-14-2010
Thanks guys
I tried Ygors code first and that worked a treat.

It only broke when it encountered a file that had multiple periods but this was something I didn't specify in my example and those were easy to manually fix.
# 6  
Old 07-14-2010
Code:
  #!/bin/bash
#bash 3+ 
shopt -s nocasematch 
for i in [a-z]*[0-9][0-9][0-9][0-9][-_.\ ]*.doc
do 
 [[ $i =~ '([a-z]{4}[0-9]{4})([_-\.\ ]+)' ]] 
 code=${BASH_REMATCH[0]}
 newfile=${i//$code/} 
 newfile=${newfile/.doc/[$code].doc} 
 mv "$i" "$newfile"
done


Last edited by kurumi; 07-14-2010 at 03:26 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Big pattern file matching within another pattern file in awk or shell

Hi I need to do a patten match between files . I am new to shell scripting and have come up with this so far. It take 50 seconds to process files of 2mb size . I need to tune this code as file size will be around 50mb and need to save time. Main issue is that I need to search the pattern from... (2 Replies)
Discussion started by: nitin_daharwal
2 Replies

2. UNIX for Dummies Questions & Answers

Grep -v lines starting with pattern 1 and not matching pattern 2

Hi all! Thanks for taking the time to view this! I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern. Example: Drink a soda Eat a banana Eat multiple bananas Drink an apple juice Eat an apple Eat multiple apples I... (8 Replies)
Discussion started by: demmel
8 Replies

3. Shell Programming and Scripting

PHP - Regex for matching string containing pattern but without pattern itself

The sample file: dept1: user1,user2,user3 dept2: user4,user5,user6 dept3: user7,user8,user9 I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies

4. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

5. UNIX for Dummies Questions & Answers

Find pattern suffix matching pattern

Hi, I am trying to get a result out of this but fails please help. Have two files /tmp/1 & /tmp/hosts. /tmp/1 IP=123.456.789.01 WAS_HOSTNAME=abcdefgh.was.tb.dsdc /tmp/hosts 123.456.789.01 I want this result in /tmp/hosts if hostname is already there dont want duplicate entry. ... (5 Replies)
Discussion started by: rajeshwebspere
5 Replies

6. Shell Programming and Scripting

pattern matching

Hi guys I need to test if the last line of my file is in the following format: # (#sign followed by end-of-line character) Is there a way to do it with awk or sed? I work in ksh88... Thanks a lot for help (2 Replies)
Discussion started by: aoussenko
2 Replies

7. Shell Programming and Scripting

sed - matching pattern one but not pattern two

All, I have the following file: -------------------------------------- # # /etc/pam.d/common-password - password-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of modules that define the services... (2 Replies)
Discussion started by: RobertBerrie
2 Replies

8. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

9. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies

10. Shell Programming and Scripting

pattern matching

Hi, I am newbee to perl. I wanna search a file and fetch a pattern starting with "Total pancakes produced:" and get the numeric value present after the colon. but, the filename and the phrase(Total pancakes produced:) must be given as input since the input varies everytime. For example:... (4 Replies)
Discussion started by: mercuryshipzz
4 Replies
Login or Register to Ask a Question