Split a Capital rich string

Hello all,

I am new and straight away jump in with a question, sorry!

I am working on a new Mediawiki site and have 1500 html pages I want to add to the system,

I will mostly do them one by one as it needs some editing, but one thing I like to do in one go,

I need to change

HelloInCapitals to Hello In Capitals

It could also be like this PharmaceuticalSocietyOfGreatBritainVBoots1952 needs to be Pharmaceutical Society Of Great Britain V Boots 1952

And another matter in the same files, due to a mistake also instances of

(that is legally binding]] to (that is legally binding)

I cannot simply change ]] as I do need ]] where the word starts with [[

Can someone help me?

 echo "HelloInCapitals" | sed 's/[A-Z]/ &/g'
 Hello In Capitals

Thank you so much!
How can I run that on all 1500 pages?

for file in `find . -name '*.html' -type f -print` ; do echo $file; sed 's/[A-Z]/ &/g' $file > $file.tmp; mv -f $file.tmp $file; done
Originally Posted by malcomex999
 echo "HelloInCapitals" | sed 's/[A-Z]/ &/g'
 Hello In Capitals

This give a space before the line, to be more precise:

sed 's/[A-Z]/ &/g; s/^ //'

sed 's/HelloInCapitals/Hello In Capitals/g' file

This will change the strings in the file. The 'g' (global) says make the change to all occurances on a line.
Redirect the output to write the result to a new version of the file (default is std out.)
Put it in a loop to do it to multiple files
for each in `ls dir`
   cp $each ${each}.orig
   sed 's/HelloInCapitals/Hello In Capitals/g' ${each}.orig > ${each}

Originally Posted by Franklin52
This give a space before the line, to be more precise:

sed 's/[A-Z]/ &/g; s/^ //'

Thanks for that but then it is better to do it with one sed process....

sed -e 's/[A-Z]/ &/g' -e 's/^ //g'

---------- Post updated at 06:45 PM ---------- Previous update was at 06:06 PM ----------

Originally Posted by externalaw
It could also be like this PharmaceuticalSocietyOfGreatBritainVBoots1952 needs to be Pharmaceutical Society Of Great Britain V Boots 1952

There should be a better way...
 >cat infile
>sed -e 's/[A-Z]/ &/g' -e 's/[0-9].../ &/g' -e 's/^ //g' infile
Pharmaceutical Society Of Great Britain V Boots 1952

And not clear by what you mean below...

And another matter in the same files, due to a mistake also instances of

(that is legally binding]] to (that is legally binding)

I cannot simply change ]] as I do need ]] where the word starts with [[

Can someone help me?
sed -e ":h;s/\([A-Za-z]\)\([1-9A-Z]\)/\1 \2/g;th" infile

( ... ]] is tricker unless you want to change all [[ ... ]] to ( ... )

For a couple of reasons.

1. regular expressions are "greedy" - trying to fix this
( blah ]] de [[ blah ]]

might end up like this
( blah ]] de [[ blah )

2. the ( ... ]] might be split over multiple lines

This is hideous but should (might!) fix some of them:
sed "s/\(([A-Za-z ]*\)]]/\1)/;s/\(\[[A-Za-z ]*\))/\1]]/"

