Quick script to rename files

11-15-2014

Registered User

169, 52

Join Date: Oct 2014

Last Activity: 16 October 2017, 6:37 PM EDT

Location: California USA

Posts: 169

Thanks Given: 18

Thanked 52 Times in 48 Posts

Quote:

Originally Posted by Aia

@jonesal2
...
...
The regex capabilities of Perl or even Awk are superior that anything the shell has to offer.

True.
They're limited in the shell, but they can work in some cases...like a 'one-off' situation maybe.

Code:

#!/bin/sh
for fname in *_*_*_*_*_*_*
do
n1=$(expr "$fname" : '.*_\(.*_.*_.*_\).*_.*_.*')
n2=$(expr "$fname" : '.*_.*_.*_.*_.*_\(.*_.*\)')
echo $n1$n2
done

Last edited by ongoto; 11-15-2014 at 08:16 PM.. Reason: mispelled fname

ongoto

View Public Profile for ongoto

Find all posts by ongoto

11-15-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by ongoto

True.
They're limited in the shell, but they can work in some cases...like a 'one-off' situation maybe.

Code:

#!/bin/sh
for fname in *_*_*_*_*_*_*
do
n1=$(expr "$fname" : '.*_\(.*_.*_.*_\).*_.*_.*')
n2=$(expr "$fname" : '.*_.*_.*_.*_.*_\(.*_.*\)')
echo $n1$n2
done

The only problem with this is that expr isn't a shell built-in. So, when processing 30,000 files, it creates an additional 60,000 processes.

If:

Code:

        rename $src, $dst unless $src eq $dst;

is a built-in in perl, this alone is a strong argument to use perl rather than awk or the shell script I suggested.

If there are other files in this directory with six or more underscores in their names, the OP needs to give us naming details so we can alter the EREs or filename matching patterns to correctly select the files to be processed.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-15-2014

Registered User

169, 52

Join Date: Oct 2014

Last Activity: 16 October 2017, 6:37 PM EDT

Location: California USA

Posts: 169

Thanks Given: 18

Thanked 52 Times in 48 Posts

@ Don Cragun

I never thought of it like that, but you are right. That is a ton of overhead. The OP asked for a quick script and didn't elaborate much so I figured it was a small job.

I'll go along with Perl though. It's regex and string handling is the bomb. Some say Python is it's replacement, but I dunno.

ongoto

View Public Profile for ongoto

Find all posts by ongoto

11-16-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Quote:

Originally Posted by Don Cragun

[..]

Code:

#!/bin/ksh
for i in *_*_*_*_*_*_*
do	printf '%s\n' "$i" | { IFS="_" read x s f y x o z
		echo mv "$i" "${s}_${f}_${y}_${o}_$z"
	}
done

Quote:

Originally Posted by Don Cragun

The only problem with this is that expr isn't a shell built-in. So, when processing 30,000 files, it creates an additional 60,000 processes.
[..]

The approach above will still require an additional process for every file, because of the pipe.

A here-doc or here-string, would not have that drawback:

Code:

for i in *_*_*_*_*_*_\(*\)
do
  IFS=_ read x s f y x o z << EOF
$i
EOF
  echo mv "$i" "${s}_${f}_${y}_${o}_$z"
done

or in modern bash/ksh93/zsh

Code:

for i in *_*_*_*_*_*_\(*\)
do
  IFS=_ read x s f y x o z <<< "$i"
  echo mv "$i" "${s}_${f}_${y}_${o}_$z"
done

or use variable expansion:

Code:

for i in *_*_*_*_*_*_\(*\)
do
  first=${i%_*_*_*}
  last=${i#"$first"_}
  echo mv "$i" "${first#*_}_${last#*_}"
done

But at any rate every mv command will still require one process per file..

----

Quote:

Originally Posted by Aia

@jonesal2

Perl and Awk do have a place on it, especially, when that "easily be done" in the shell can make a file named

This_is_one_I_want_as_is into is_one_I_as_is unintentionally, when all you want is x_surname_firstname_y_20141115_OS_(z) into surname_firstname_y_OS_(z)

Suddenly you find yourself, figuring out what kind of glob you can pass to the for loop, to limit the range, or what kind of check you have to perform inside the body to deal with unwanted matches. The regex capabilities of Perl or even Awk are superior that anything the shell has to offer.

It really depends on how much precision is required. More precision means more complexity, so you would only use it if needed.

Globbing will do fine in the majority of the situations and can also be made more precise if need be. If situation demands then that would need to be tightened.

In situations where more precision is required than globbing can handle, then I agree regex give you tighter control. You could use Perl for that, but modern bash, ksh93 and bash also provide the possibilty of using regex .

But you would still need to check, because

Code:

$dst =~  s/\d+_(\w+_\w+_\d+)_\d+(_\w+_\(\d+\))/$1$2/;

may also not be tight enough (you might need a front and back anchoring for example, or \w may match unwanted digits)

So you will find yourself figuring out what regex to use. In either case it is a good idea to print first and check..

----

On the other hand, I do agree with Don that the performance advantage is a strong argument for using Perl in this case, since rename is a Perl builtin, the operation would not require an additional process for every mv command...

BTW. There appear to be some caveats to the rename function, so extra testing is required:

Quote:

Changes the name of a file; an existing file NEWNAME will be clobbered. Returns true for success, false otherwise.

Behavior of this function varies wildly depending on your system implementation. For example, it will usually not work across file system boundaries, even though the system mv command sometimes compensates for this. Other restrictions include whether it works on directories, open files, or pre-existing files. Check perlport and either the rename(2) manpage or equivalent system documentation for details.

For a platform independent move function look at the File::Copy module.

rename - perldoc.perl.org

Last edited by Scrutinizer; 11-16-2014 at 04:52 AM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

11-16-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by Scrutinizer

The approach above will still require an additional process for every file, because of the pipe.

A here-doc or here-string, would not have that drawback:

Code:

for i in *_*_*_*_*_*_\(*\)
do
  IFS=_ read x s f y x o z << EOF
$i
EOF
  echo mv "$i" "${s}_${f}_${y}_${o}_$z"
done

or in modern bash/ksh93/zsh

Code:

for i in *_*_*_*_*_*_\(*\)
do
  IFS=_ read x s f y x o z <<< "$i"
  echo mv "$i" "${s}_${f}_${y}_${o}_$z"
done

or use variable expansion:

Code:

for i in *_*_*_*_*_*_\(*\)
do
  first=${i%_*_*_*}
  last=${i#"$first"_}
  echo mv "$i" "${first#*_}_${last#*_}"
done

But at any rate every mv command will still require one process per file..

----

It really depends on how much precision is required. More precision means more complexity, so you would only use it if needed.

Globbing will do fine in the majority of the situations and can also be made more precise if need be. If situation demands then that would need to be tightened.

In situations where more precision is required than globbing can handle, then I agree regex give you tighter control. You could use Perl for that, but modern bash, ksh93 and bash also provide the possibilty of using regex .

But you would still need to check, because

Code:

$dst =~  s/\d+_(\w+_\w+_\d+)_\d+(_\w+_\(\d+\))/$1$2/;

may also not be tight enough (you might need a front anchor for example, or \w may match unwanted digits)

So you will find yourself figuring out what regex to use. In either case it is a good idea to print first and check..

----

On the other hand, I do agree with Don that the performance advantage is a strong argument for using Perl in this case, since rename is a Perl builtin, the operation would not require an additional process for every mv command...

BTW. There appear to be some caveats to the rename function, so extra testing is required:

rename - perldoc.perl.org

All of the above are fine alternatives to the script I suggested, but the script I suggested and all of the above alternatives (other than using perl) use the same number of processes.

The standards say that the elements of a pipeline may be executed in the current shell execution environment (as ksh does) or in a subshell environment (as bash does). Neither of these create a new process as long as the commands in that subshell are shell built-ins. This can be seen using a slight modification of the script I suggested that prints the PID at the start of the script and in the last element of the pipeline (and adds the escaped parentheses to the pattern):

Code:

echo $$
for i in *_*_*_*_*_*_\(*\)
do	printf '%s\n' "$i" | { IFS="_" read x s f y x o z
		echo $$
		echo mv "$i" "${s}_${f}_${y}_${o}_$z"
	}
done

Running this with ksh in a directory containing two matching files produces:

Code:

$ ksh tester
66443
66443
mv 123_surname_firstname_y_20141115_OS_(456) surname_firstname_y_OS_(456)
66443
mv x_surname_firstname_y_20141115_OS_(z) surname_firstname_y_OS_(z)
$

and bash (in the same directory) produces:

Code:

bash-3.2$ bash tester
66452
66452
mv 123_surname_firstname_y_20141115_OS_(456) surname_firstname_y_OS_(456)
66452
mv x_surname_firstname_y_20141115_OS_(z) surname_firstname_y_OS_(z)
bash-3.2$

So, if rename in the version of perl on your system does what you need; use perl (which will only need two processes to rename all of the files). Otherwise, any of the shell scripts Scrutinizer and I suggested should get the job done using (n + 1) processes where n is the number of files to be renamed.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-16-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Quote:

Originally Posted by Don Cragun

All of the above are fine alternatives to the script I suggested, but the script I suggested and all of the above alternatives (other than using perl) use the same number of processes.

The standards say that the elements of a pipeline may be executed in the current shell execution environment (as ksh does) or in a subshell environment (as bash does). Neither of these create a new process as long as the commands in that subshell are shell built-ins. This can be seen using a slight modification of the script I suggested that prints the PID at the start of the script and in the last element of the pipeline (and adds the escaped parentheses to the pattern):

Code:

echo $$
for i in *_*_*_*_*_*_\(*\)
do	printf '%s\n' "$i" | { IFS="_" read x s f y x o z
		echo $$
		echo mv "$i" "${s}_${f}_${y}_${o}_$z"
	}
done

Running this with ksh in a directory containing two matching files produces:

Code:

$ ksh tester
66443
66443
mv 123_surname_firstname_y_20141115_OS_(456) surname_firstname_y_OS_(456)
66443
mv x_surname_firstname_y_20141115_OS_(z) surname_firstname_y_OS_(z)
$

and bash (in the same directory) produces:

Code:

bash-3.2$ bash tester
66452
66452
mv 123_surname_firstname_y_20141115_OS_(456) surname_firstname_y_OS_(456)
66452
mv x_surname_firstname_y_20141115_OS_(z) surname_firstname_y_OS_(z)
bash-3.2$

So, if rename in the version of perl on your system does what you need; use perl (which will only need two processes to rename all of the files). Otherwise, any of the shell scripts Scrutinizer and I suggested should get the job done using (n + 1) processes where n is the number of files to be renamed.

Hi Don,

That pertains to the RHS of the pipeline, which can be executed in a subshell or in the foreground. The LHS is always executed in a subshell.

This can easily be checked:

Code:

for i in dash bash ksh zsh
do 
  shl=$i $i -c "{ u=1 ;} | { v=1;}; echo \"\$shl: \\\$u:\$u,\\\$v:\$v\""
done

Code:

dash: $u:,$v:
bash: $u:,$v:
ksh: $u:,$v:1
zsh: $u:,$v:1

This shows that ksh and zsh execute the RHS in the foreground. The RHS in the other shells and the LHS in all shells is executed in a subshell. In bash 4 there is a separate setting (not the default) that can be turned on so that the RHS is also executed in the foreground.

The process id within a subshell cannot be tested with $$. A subshell is a child process that inherits the environment of the parent shell, including the variable $$. Therefore in a subshell of the parent shell, $$ will represent the parent's $$.

Quote:

A subshell environment shall be created as a duplicate of the shell environment, except that signal traps that are not being ignored shall be set to the default action. Changes made to the subshell environment shall not affect the shell environment. Command substitution, commands that are grouped with parentheses, and asynchronous lists shall be executed in a subshell environment. Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment. All other commands shall be executed in the current shell environment.

Shell Command Language

The pid of a subshell can still be checked by starting a child process that is not a subshell and checking $PPID:

Code:

sh -c 'echo $PPID'

I used the following script to test:

Code:

for i in dash bash ksh zsh
do
  shl=$i $i -c "{ echo \"\$shl:LHS:\$(sh -c 'echo \$PPID')\">>tst.out;} | { echo \"\$shl:RHS:\$(sh -c 'echo \$PPID')\">>tst.out;}; echo \$shl:parent:\$\$ >> tst.out"
done

Code:

$ cat tst.out
dash:RHS:57607
dash:LHS:57606
dash:parent:57605
bash:LHS:57613
bash:RHS:57614
bash:parent:57610
ksh:LHS:57618
ksh:RHS:57617
ksh:parent:57617
zsh:LHS:57623
zsh:RHS:57621
zsh:parent:57621

Again this shows that the RHS of pipelines in ksh and zsh are executed in the foreground, but the rest are not..

Therefore using a pipeline in the file moving loop, earlier in the thread requires (2n+1) or (3n+1) processes including the ones for the mv command (depending on the shell that is used), while using a heredoc/string or parameter expansions leads to (n+1) processes.

Last edited by Scrutinizer; 11-16-2014 at 11:12 AM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

11-16-2014

Registered User

10, 1

Join Date: Aug 2012

Last Activity: 17 November 2014, 12:17 PM EST

Posts: 10

Thanks Given: 2

Thanked 1 Time in 1 Post

Thanks for all the replies, I wasn't expecting it to be that complex.

I sort of thought the script might simply search the filename string for the first underscore and delete it plus all the characters before it, then search for (what would then be) the third underscore delete it plus the next 8 characters (since the time stamp is always YYYYMMDD)?

Or am I being too simplistic?

Last edited by jonesal2; 11-16-2014 at 04:33 PM.. Reason: typo

jonesal2

View Public Profile for jonesal2

Find all posts by jonesal2

Shell Programming and Scripting

Quick script to rename files

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script for rename many files

Discussion started by: vegetablu

2. Shell Programming and Scripting

Script to unzip files and Rename the Output-files

Discussion started by: pmkenya

3. Shell Programming and Scripting

Rename files in the script

Discussion started by: mangeshpardhi

4. Shell Programming and Scripting

Script to rename files

Discussion started by: ramky79

5. Shell Programming and Scripting

Script Rename files

Discussion started by: manichino74

6. UNIX for Dummies Questions & Answers

Script to Rename Files

Discussion started by: idano530

7. Shell Programming and Scripting

Shell Script to rename files

Discussion started by: yakuzaa

8. Shell Programming and Scripting

Script to rename files

Discussion started by: cpreovol

9. UNIX for Dummies Questions & Answers

Script to rename files

Discussion started by: Dinkster

10. OS X (Apple)

Rename Files with a script ?

Discussion started by: yoveln