Question/review my script: removing bad chars from filenames | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Question/review my script: removing bad chars from filenames

UNIX for Dummies Questions & Answers


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 10-03-2010
uiop44 uiop44 is offline
Registered User
 
Join Date: Mar 2008
Last Activity: 24 October 2012, 3:07 PM EDT
Posts: 63
Thanks: 4
Thanked 0 Times in 0 Posts
Question/review my script: removing bad chars from filenames

The task: remove undesirable characters from filenames.

Restrictions: Must use basic RE, base utilities (non-GNU) and /bin/sh (ash). No ksh, zsh, perl, etc.

Below is what I've come up with. It seems to work OK but I'm open to shorter, more efficient alternatives.

Inside the square brackets I've included an assortment of chars I'd like to remove. But I'm not able to get the following chars inside the square brackets so that they are also removed:

- square brackets
- single quotes (apostrophes)

One solution is to pipe into another sed. But I'd like to avoid that. My version of sed does not accept hex or octal.


Code:
# these are for debugging; they just print out the command line 

nsc(){ 
IFS='
';for j in $@ ;do printf "mv -vi \"$j\" \134 \n \"$j\"\n" $@;done | sed '/ \\ $/!s/[[:cntrl:][:blank:]\(\)\{\}:;,!?+*=<>#@\^|]//g;s/\]//g'  ;}

# problem: does not remove square brackets and single quotes

nsc(){ IFS='
';for j in $@ ;do printf "mv -vi \"$j\" \134 \n \"$j\"\n" $@;done | sed 's/ \\ $/!s/[[:cntrl:][:blank:]\(\)\{\}:;,!?+*=<>#@\^|]//g;s/\]//g;s/\[//g' | sed "s/\'//g;s/\[//g;s/\]//g" ;}

# removes single quotes and square brackets but uses an extra sed

Note: I have tried the "while read" approach but I found that the read builtin required filenames that do not contain illegal characters for variables, such as parentheses.

Last edited by uiop44; 10-03-2010 at 10:02 PM..
Sponsored Links
    #2  
Old 10-03-2010
agama agama is offline Forum Advisor  
Always Learning
 
Join Date: Jul 2010
Last Activity: 7 April 2014, 3:02 PM EDT
Location: earth>US>UTC-5
Posts: 1,466
Thanks: 110
Thanked 506 Times in 485 Posts
This sed works for me to remove all of the 'special' characters including both open/close square braces and the single quote, all in a single sed substitute statement:


Code:
sed 's/[]['\''"!@#$%^&*()`~[:cntrl:][:space:]\t]//g'

By placing the close square bracket immediately following the open character class, it is not interpreted as the end of the character class.

Using the '\'' construct you can "insert" a single quote into the class.

I don't know if I picked up all of the specials that you wish to remove, but you should be able to add what ever I missed.
Sponsored Links
    #3  
Old 10-03-2010
uiop44 uiop44 is offline
Registered User
 
Join Date: Mar 2008
Last Activity: 24 October 2012, 3:07 PM EDT
Posts: 63
Thanks: 4
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by agama View Post
This sed works for me to remove all of the 'special' characters including both open/close square braces and the single quote, all in a single sed substitute statement:


Code:
sed 's/[]['\''"!@#$%^&*()`~[:cntrl:][:space:]\t]//g'

By placing the close square bracket immediately following the open character class, it is not interpreted as the end of the character class.

Using the '\'' construct you can "insert" a single quote into the class.

I don't know if I picked up all of the specials that you wish to remove, but you should be able to add what ever I missed.
I see an escaped t in your sed command. That tells me you're using a sed that supports more than mine does (e.g., symbols for tabs, newlines, etc. plus octal and hex, if it's the GNU version).

My shell with my sed won't let me escape a single quote if I'm also using single quotes to enclose my sed command sequence.
    #4  
Old 10-03-2010
agama agama is offline Forum Advisor  
Always Learning
 
Join Date: Jul 2010
Last Activity: 7 April 2014, 3:02 PM EDT
Location: earth>US>UTC-5
Posts: 1,466
Thanks: 110
Thanked 506 Times in 485 Posts
You can replace \t with the tab character if your sed doesn't support it. I use AT&T's sed from their AST tool set. Also, if [:space:] includes tab (too lazy to look it up tonight), then you don't need it at all.

Does your shell allow this:

Code:
sed 's/[]['"'"'!@#$%^&*()`~[:cntrl:][:space:]        ]//g'

That's a single quote to close, then a single quote inside of double quotes and finally a single quote to reopen.

If this fails then I'm out of suggestions.
Sponsored Links
    #5  
Old 10-04-2010
uiop44 uiop44 is offline
Registered User
 
Join Date: Mar 2008
Last Activity: 24 October 2012, 3:07 PM EDT
Posts: 63
Thanks: 4
Thanked 0 Times in 0 Posts
Many, many thanks agama!

It all works now.

(Your first solution for the single quote actually works too. I hastily overlooked your single quotes around the forward slash.)

Great stuff. This forum is excellent.

Last edited by uiop44; 10-04-2010 at 02:03 AM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
SED: Removing Filenames From Paths Brusimm Shell Programming and Scripting 4 02-26-2010 05:37 AM
removing non-printable chars from multiple files revax Shell Programming and Scripting 1 01-19-2009 10:18 AM
removing the extension from all filenames in a folder johnmcclintock UNIX for Dummies Questions & Answers 5 05-21-2008 08:23 AM
Removing certain text from multiple filenames Djaunl UNIX for Dummies Questions & Answers 6 01-15-2008 04:52 PM
How to convert C source from 8bit chars to 16bit chars? siegfried Shell Programming and Scripting 0 09-26-2007 02:26 PM



All times are GMT -4. The time now is 04:30 AM.