Text Substitution Project


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text Substitution Project
# 1  
Old 10-29-2010
Text Substitution Project

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here", we have _PAGE_TEXT_CLICK). There was one big file with 2400 variables defined (DEFINE _PAGE_TEXT_CLICK "Click Here". He has since moved on, and the demand for a multi-lingual project never materialized. As I am now sole code-maintainer, I find it very cumbersome to make changes to the project, and now want to go back to an English only version.

I have been trying to create a script that will just do a massive 'search and replace' through all 200 scripts (realizing that each script might have about 10 replacements that need to be done, so many attempts at replacing text would fail, as every DEFINE statement is unique).

OK, that's the history! I created this massive 'run.sed' file that looks like this:

Code:
sed -i 's|_TEACHER_EDIT_STUDENT_1_NOTES|\"Notes\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHCITY|\"Birth City\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHSTATE|\"Birth State\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHCOUNTRY|\"Birth Country\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_PRVS_SCHOOLNAME|\"Prvs School Name\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_PRVS_SCHOOLADDRESS|\"Prvs School  Address\"|'
[chopped, 2400 lines]


Single attempts aimed at a single file work, so the syntax appears to be fine.

My bash file that should iterate through the scripts looks like this:

Code:
#!/bin/bash
FILES=/home/me/swift200/*.php
for f in $FILES
do
  echo "Processing $f file..."
  ./run.sed $f
done

The script is executable, I am doing this as root, none of my PHP files have spaces in the name, and I am executing this from the directory that contains the PHP files.

When I run it, many of these go scrolling by:

Quote:
sed:no input files
When I reduce run.sed to just one line, it successfully searches and replaces all occurences of that string in all the php files. I am truly at wits end with this, and would appreciate any insight at all.

Thank you!
This User Gave Thanks to dougp23 For This Post:
# 2  
Old 10-29-2010
Code:
./run.sed "$f"

# 3  
Old 10-29-2010
I have tried quoting the $f, and still, it just keeps scrolling by:

sed: no input files
# 4  
Old 10-29-2010
At first glance there are three fundamental problems here.

1) Using "for" for an open-ended list. A "while" loop is much more robust and will not exceed a maximum command length.
Code:
ls -1 /home/me/swift200/*.php 2>/dev/null | while read f
do

2) The 2400 line "run.sed" does not mention a filename anywhere and does not mention the value of "$f" (which would be $1 in this context).
This is why you are getting error messages.

3) If we could get "run.sed" to work, it would do 2,400 in-situ edits on 200 scripts (480,000 edits). Assuming a minimum script length of 2,400 lines we end up reading a minimum of 1,152,000,000 lines. This is somewhat inefficient.
Knowing the typical number of lines in each script would help the sizing.
Sometimes we can be inefficient and it really doesn't matter. With this task I think we need to look at efficiency.






Now! Unfortunately this is where I come unstuck because I do not have your Operating System (whatever that is) because I do not have a "-i" switch to "sed". I am unable to test a substantial edit list with your version of "sed". I have run "sed" with over 200 edit lines successfully.
From now on this is general untested advice.

1) Don't use "sed -i" unless you have an online copy of the scripts you are editing. One mistake and you will destroy the original.
There is no way that you will get this edit right first time.
Write one script to backup the original scripts and another script to do a quick restore and be prepared to run this restore again-and-again.

2) Use "sed -i -f sedfile name_of_file_to_be_edited"
It is possible to run sed ONCE per script to be edited. Try running "sed" with a sedfile containing 2400 individual sed command lines. Because the sedfile is a file which will be read by "sed" we do not need to protect characters from the Shell.
This User Gave Thanks to methyl For This Post:
# 5  
Old 10-29-2010
Code:
$ cat run.sed 

sed -i 's|_TEACHER_EDIT_STUDENT_1_NOTES|\"Notes\"|' $1
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHCITY|\"Birth City\"|' $1
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHSTATE|\"Birth State\"|' $1
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHCOUNTRY|\"Birth Country\"|' $1
sed -i 's|_TEACHER_EDIT_STUDENT_1_PRVS_SCHOOLNAME|\"Prvs School Name\"|' $1
sed -i 's|_TEACHER_EDIT_STUDENT_1_PRVS_SCHOOLADDRESS|\"Prvs School  Address\"|' $1

This User Gave Thanks to rdcwayx For This Post:
# 6  
Old 10-29-2010
rdcwayx has spotted one of the fundamental problems and our posts have crossed.
If you have a seriously powerful computer then efficiency may not be an issue.
# 7  
Old 10-30-2010
Thanks for the hints and help!

Methyl, the number of edits is actually not that massive. Assume the run.sed file (which is really my dictionary file) has 2400 lines. Each PHP script has maybe 200 lines, and I have 200 scripts. But each script will only get about 10-15 of those replacements. BTW, my version of sed says GNU 4.2.1 and the -i flag is for "inline edit". But you are absolutely correct, I did something like this once with no backup, and it wasn't right, and I had no way to go back to the old files and try something new! Several backups here. Here's what worked (the whole thing took about 2 minutes on a 2GHz Pent, 1GB Ram, Linux):

run.sed (abbreviated):
Code:
#!/bin/sed -f
s/_BROWSER_TITLE/"School Management System"/
s/_MORE_TEXT/"You need more text"/
s/_BAD_ENTRY/"Your entry is invalid"/
....

The script to iterate through all the files:
Code:
#!/bin/bash

FILES=/home/me/swift200/*.php

for f in $FILES
do
    echo "Processing $f file..."
    ./run.sed -i "$f"
done

I'm still pretty rusty with bash scripts, but it worked! A real oddity, I tried what rdcwayx suggested the other night, thinking that the script would use bash's "built-in" variables of $1, $2, $3, etc. This did not work, and echoing the filename at each point would echo something like

working on script .//home/me/swift200/admin_manage_attendance.php

Which of course would not work. Not sure where the extra slash came from. What WOULD work was using $@ instead of $1. Again, I have no idea why, maybe it's my version of bash/linux/sed, but that was the first time I ever saw using $@.

Again, thanks everyone! Your hints got me in the right direction, and I'm all set now!

Last edited by Scott; 10-30-2010 at 08:43 PM.. Reason: Code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

FINDING DUPLICATE PROJECT ( directory project )

I have a project tree like that. after running find command with the -no -empty option, i am able to have a list of non empty directory DO_MY_SEARCH="find . -type d -not -empty -print0" MY_EXCLUDE_DIR1=" -e NOT_IN_USE -e RTMAP -e NOT_USEFULL " echo " " > $MY_TEMP_RESULT_1 while... (2 Replies)
Discussion started by: jcdole
2 Replies

2. News, Links, Events and Announcements

A new project was posted on The UNIX and Linux Forums project board.

A new project was posted on your project board. Project title: Bash Shell Tutoring Estimated Budget: $50/hr Start date: Immediately Required skills: Linux, Bash, Shell, UNIX I work as a datawarehouse designer and developer. Although I usually stick to the role of an analyst,... (0 Replies)
Discussion started by: Neo
0 Replies

3. Shell Programming and Scripting

Linguistic project: extract co-occurrences from text corpus

Hello guys, I've got a big corpus (a huge text file in which words are separated by one or several spaces). I would like to know if there is a simple way - using awk for instance - to extract any co-occurrence appearing at least 3times through the whole corpus for a given word. By co-occurrence,... (7 Replies)
Discussion started by: bobylapointe
7 Replies

4. Solaris

what is the use of /etc/project file and project administration commands?

i have two doubts.. 1. what is the use /etc/project file. i renamed this file and when i tried to switch user or login with some user account the login was happening slowly. but when i renamed it to original name it was working fine... why so? 2. unix already has useradd and grouadd for... (4 Replies)
Discussion started by: chidori
4 Replies

5. Shell Programming and Scripting

Text substitution & getting file name from url

hi, sorry if this seems trivial. i have a file url.txt which consists of a list of urls (it was supposed to be my wget -i file). however, since the server from which i am trying to download uses redirect, wget dows not remeber the filename of ther original url will save to a file name which is... (3 Replies)
Discussion started by: texttoolong
3 Replies

6. Shell Programming and Scripting

Difference between "Command substitution" and "Process substitution"

Hi, What is the actual difference between these two? Why the following code works for process substitution and fails for command substitution? while IFS= read -r line; do echo $line; done < <(cat file)executes successfully and display the contents of the file But, while IFS='\n' read -r... (3 Replies)
Discussion started by: royalibrahim
3 Replies

7. Solaris

SSH doesn't pick up user's project from /etc/project

We have a system running ssh. When a user logs in, they do not get the project they are assigned to (they run under "system"). I verify the project using the command "ps -e -o user,pid,ppid,args,project". If you do a "su - username", the user does get the project they are assigned to (and all... (2 Replies)
Discussion started by: kurgan
2 Replies

8. Shell Programming and Scripting

text substitution

hi, in a text file i want to find the lines which contains A and B,and if these lines do not contain C and D. i want to substitute X with Y. how can i achive these? i tried with sed and awk, but couldnt succeed. (3 Replies)
Discussion started by: yakari
3 Replies

9. UNIX for Dummies Questions & Answers

Update text files in place (string substitution) ??

The auditors have nailed us for world writeable files.... Apparently in years gone by, quite a number of our kornshell scripts have had: umask 000 put in the script. We have been able to turn off world writeable for existing dirs & files, but as these scripts run, new files keep getting... (1 Reply)
Discussion started by: kornshellmaven
1 Replies
Login or Register to Ask a Question