Sed: Delete lines in files that contain other than 'a-z' ,'0-9', '.' and '-'


I'm looking for a shell command or maybe a small php loop to delete lines in files.txt (in the same directory) that contain character other then 'a-z' ,'0-9', '.' and '-'

All line that have characters like etc... will got his line deleted. I don't want to see the output (it's larges files +- 5meg, and +- 100 files)

It's probably a combinasion of Sed and Regex but i'm unable to find the good syntax to do it Smilie

Every help will be appreciated.

You mentioned "files.txt" and "100 files". Can you be more specific about from which file(s) the text should be deleted?

(assuming all files in directory...)
bash code:
  1. ls | while read FILE; do
  2.   sed -n "/^[a-z0-9.-]\+$/ p" $FILE > FILE.tmp.$$
  3.   cp -f $FILE.tmp.$$ $FILE && rm $FILE.tmp.$$
  4. done

bash code:
  1. ls | while read FILE; do
  2.   grep "^[a-z0-9.-]\+$" $FILE > $FILE.tmp.$$
  3.   cp -f $FILE.tmp.$$ $FILE && rm $FILE.tmp.$$
  4. done

Thanks for your reply Scottn

Your code give me the biggest hint for the last 3 days Smilie

My files are in fact sitemap, like sitemap.1.xml, sitemap.2.xml, sitemap.3.xml, ...
and forgot to mention that I also need to include '<', '>', ':', '/'

I tried to use this code but the ':' is not correctly set in this line I think...

(not working correctly)
sed -n "/^[<>\:a-z0-9.-\/]\{1,\}$/ p" sitemap.1.xml > sitemap.1.xml.tmp;mv sitemap.1.xml.tmp sitemap.1.xml

Sed does seem to be somewhat pedantic about where bits go!

bash code:
  1. ls sitemap.*.xml | while read FILE; do
  2.   sed -n "/^[a-z<>/0-9.:-]\+$/ p" $FILE > FILE.tmp.$$
  3.   cp -f $FILE.tmp.$$ $FILE && rm $FILE.tmp.$$
  4. done
Nice !!
Thanks Smilie

Didn't know about the position thing..!

Thanks for your great help
Login or Register to Ask a Question