Bash Scripting - sed (substitue)

 
Thread Tools Search this Thread
Homework and Emergencies Homework & Coursework Questions Bash Scripting - sed (substitue)
# 1  
Old 05-20-2011
Bash Scripting - sed (substitue)

1. The problem statement, all variables and given/known data:

I have been asked to create a bash script to delete comments from another file but in the file they have an echo command with this inside of it /* this is an echo */\ so obviously they want to keep this one in the file. I have found this bit of code and im having trouble reading it. starting with the first line with #!/bin/sed -f as we have only be dealing with #!/bin/bash. so therefor all the :x and N bs etc. is making it harder for me to understand so in other words i want to re-write this code starting with #!/bin/bash and using straight calls to the sed command. Any help would be great.

2. Relevant commands, code, scripts, algorithms:

Code:
#!/bin/sed -f

# Simple Sed Program to remove all comments from c program

/\/\*/!b

:x
/\*\//!{
N
bx
}
# delete /*...*/
s/\/\*.*\*\///

3. The attempts at a solution (include all code and scripts):

Code:
#!/bin/bash 
# Simple Sed Program to remove all comments from c program

/\/\*/!b

sed -x
/\*\//!{
sed -N
bx
}
s/\/\*.*\*\///

This is straight up not working.


4. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):

2309ENG: C & Unix Programming - Dr David Rowlands

Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).
# 2  
Old 05-20-2011
What language is the file from which you are removing comments? The fact that you say it has "echo" statements makes me think "shell", but shells do not support "/*....*/" style comments.

Regardless of language, it's quite a leap to say that's it's obvious that comments containing "echo" should be retained. If you were told to remove all comments then that it what you should do. Maybe you should clarify your instructions.

I haven't tried it, but the first sed script you posted looks like it should work. It's pretty clever actually. But you can't split it up into separate sed statements. It's a unit and it needs to stay that way.

/\/\*/!b Loop until we find a line with "/*" in it. Once we find such a line we can proceed to the the rest of the sed script.

Code:
:x
/\*\//!{
N
bx
}

This little paragraph asks if we have a "*/". We might if the earlier sed statement found a one line comment. But if the comment extends to several lines, we will not immediately have the terminating "*/". So read read another input line with the "N" command. The "bx" jumps back to the ":x". That is how we loop in sed. Eventually we will read the "*/" and fall out of the loop to the final statement.

Code:
# delete /*...*/
s/\/\*.*\*\///

The comment is correct. It deletes the "/*...*/" comment. If all you have is one line comments, this one command may be all you need.

I have not tested it. But it all looks good. But each statement depends on the others. You are not going to be able to turn this sed script into several sed scripts. I hope this helps you. Smilie
This User Gave Thanks to Perderabo For This Post:
# 3  
Old 05-21-2011
Oh sorry, i am creating this bash script to take effect on a C programed file to take out all the comments. Yes this script does work and now i know why thanks alot you made it very clear. just dont understand how it knows if its a comment or if the comment isnt actually a comment and its in the echo command and the */ message /* will appear with the symbols being the same as what you would use for a comment.

Thanks again.

---------- Post updated at 12:06 AM ---------- Previous update was at 12:06 AM ----------

Oh sorry, i am creating this bash script to take effect on a C programed file to take out all the comments. Yes this script does work and now i know why thanks alot you made it very clear. just dont understand how it knows if its a comment or if the comment isnt actually a comment and its in the echo command and the */ message /* will appear with the symbols being the same as what you would use for a comment.

Thanks again.
# 4  
Old 05-21-2011
This still leaves you with the problem of writing your script. Let us restart from the beginning:

When you say you want to "remove comments" you have to know what such a comment constitutes: there must be a sequence of characters which start a comment and a sequence, which ends a comment. Consider the following (phantasy-) shell code:

Code:
#!/bin/myshell

some_command    # this is a comment

# this is a comment too
Is this still a comment?

other_command

Lets establish what starts a comment: the "#" sign obviously. And what ends it? The line end (which answers the question if the line 6 is a comment - it is not, because it is after the end of comment-end-marker).

Still there are some possible problems: What about comments in comments? Lines like this:

Code:
command  # comment # what status has this?

Or what about using the comment-start sequence quoted:

Code:
echo "something # is this a comment?"  # what about this?

or escaped:

Code:
echo "something" \# comment?


There is a certain class of programs designed to deal with these problems. They are called "parsers". These programs constitute the first part of a compiler, where the program code gets read, stripped of everything unnecessary (like comments, which the program doesn't need) and is syntax-checked. It all sounds quite complicated and "building compilers" sounds quite like high-level stuff, but this is astonishingly easy.

Lets take stock: we have a set of rules (what starts a comment, what ends a comment) and we have an action (filter the comments out). We saw above, that we need some more rules regarding the quoting and the escaping, let's go over the rules again:
  1. Do not look at characters inside "..." or '....' (quoting)
  2. If a character "\" is encountered treat the one following it as a normal character, regardless of it's usual meanings (escaping)
  3. When you encounter a comment-start-marker throw away it and the text you read until you encounter a comment-end-marker
  4. Output what you have read

Parsers do their work the following way: read in one character after the other. After each character decide if you have found a sequence with a rule attached to it. Finally, after applying all the applicable rules, output the result (if some rule doesn't forbid it).

In fact parsers are just while-loops reading one character after the other and long case-constructs, trying to apply one rule after the other to the character read. It is clear that we will have to maintain some status-flags when we are only to look at one character at a time. We will have to "remember" if we are inside a comment, inside a quoted string, etc..

How about you trying to write a program (in pseudo-code only, just the logic) for your problem and posting it. Then we will go over your solution and implement that in real code.

I hope this helps.

bakunin
# 5  
Old 05-21-2011
Yeah thanks alot of slowly getting the hard of the concept just dont know enough programing wise but i have come up with this as my pseudo code.

1. Search Given File
2. Loop looking for /* to */
2.1 check if /* is inside of "" or part of code if it is leave it
2.3 if not delete it
2.4 look for next /* */ combination
3. Print file with changes made.

---------- Post updated at 08:43 PM ---------- Previous update was at 08:43 PM ----------

Yeah thanks alot of slowly getting the hard of the concept just dont know enough programing wise but i have come up with this as my pseudo code.

1. Search Given File
2. Loop looking for /* to */
2.1 check if /* is inside of "" or part of code if it is leave it
2.3 if not delete it
2.4 look for next /* */ combination
3. Print file with changes made.
# 6  
Old 05-23-2011
Quote:
Originally Posted by syco__
1. Search Given File
2. Loop looking for /* to */
2.1 check if /* is inside of "" or part of code if it is leave it
2.3 if not delete it
2.4 look for next /* */ combination
3. Print file with changes made.
You are thinking about this "from the wrong side". I suggest you read again what i said about parsers reading one character at a time and applying rules to it. I'll take our phantasy-shell example and show you a solution to this, you will still have to come up with a solution for your original problem:

At first, we need some "memory" to remember what we have read so far. Memory comes in the form of status flags (TRUE/FALSE) , which we maintain while reading the code. We start with one for remembering the escape-sequence. We need to do this first, because escaping the strongest "coupling" construct of a language. It is like operator precedence: this is the "operator" with the highest precedence so we have to take care of this first *):

Code:
next_character_is_escaped=false # we have to interpret the next character

WHILE not at the end of the input file
      read a character
      
      IF next_character_is_escaped != true
           [... we will have to work on the character further here ...]
      ELSE
           next_character_is_escaped=false
      END IF

      output the read character

END WHILE

You see, the loop does nothing more than to read one character at a time and maintain a status flag for escaped characters. This relates to the "rule 2" of my last posting.

I suggest you take a piece of paper and try to go through a real script with this pseudo-program "by hand" to see how the status flags are maintained and the logic works (the same for the other pieces of code to follow - this is a invaluable exercise in grasping programming constructs, believe me! *) ). You see that what the escaped-status-flag does is to exclude the next character from being interpreted by the logic (the "further work") we are going to apply now. Let's start with quoting (i ignore single quotes here to keep the example short - this doesn't mean we could forget about them in the real world):


Code:
inside_a_quote=false          # we are not between "..." right now
next_character_is_escaped=false # we have to interpret the next character

WHILE not at the end of the input file
      read a character

      IF next_character_is_escaped != true
           CASE the read character is
                - the escape char
                     next_character_is_escaped=true

                - the quote char
                     IF inside_a_quote == true
                          inside_a_quote=false
                     ELSE
                          inside_a_quote=true
                     END IF 

           END CASE
      ELSE
           next_character_is_escaped=false
      END IF

      output the read character

END WHILE

The added logic just flips the inside-quote-status flag when we encounter double quotes (actually: double quotes which aren't escaped). Now, that we have covered what was "rule 1" in my last posting we are ready to tackle the comments themselves:

Notice, that applying a rule to the character read is indeed changing the way we are applying the following rules: if rule 1 says "don't look at the character any further" this means that rule 3 won't be applied to it, because there is "nothing any more that it could be applied to", so to say.

I have added some logic at the end to let you see if the program correctly catches all instances of comments. Again, i suggest you "execute" it by hand with a piece of paper to see its operation.

Code:
inside_a_quote=false          # we are not between "..." right now
inside_a_comment=false        # we are not between a comment-start and a comment-end
next_character_is_escaped=false # we have to interpret the next character

WHILE not at the end of the input file
      read a character

      IF next_character_is_escaped != true
           CASE the read character is
                - the escape char
                     next_character_is_escaped=true

                - the quote char
                     IF inside_a_quote == true
                          inside_a_quote=false
                     ELSE
                          inside_a_quote=true
                     END IF

                - the comment-start char
                     IF inside_a_quote == false
                          inside_a_comment=true
                     END IF

                - the comment-end char
                     IF inside_a_quote == false
                          inside_a_comment=false
                     END IF

           END CASE
      ELSE
           next_character_is_escaped=false
      END IF

      IF inside_a_comment == false
           output the read character in black
      ELSE
           output the read character in red
      END IF

END WHILE

Some notes reagrding your own program: you will notice that (unlike in my example) there is not a "comment-start character" but a "comment-start SEQUENCE". You will have to implement additional status flags to catch these sequences.

Put your efforts here, in pseudo-code like i did, and we will discuss your solution and finally implement the program itself.

I hope this helps.

bakunin
__________

*) This is easily shown by looking at this: "\"" . If the quoting would be primary and the escaping secondary the escaping would not take place because of being inside a quote. And the output would be 2 characters (backslash-quote sign). There would be no way to have the double quote character inside a quoted string. As it is escaping takes place before quoting and therefore the string means "an escaped double-quote sign inside a quoted string" and this accomplishes the "double-quote inside a quoted string" just fine.

*) Alan Turing did exactly this with his first try at a chess program.

Last edited by bakunin; 05-23-2011 at 02:11 PM.. Reason: changed a logical mistake
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed substitue whole file + 1 substitue with variables on one sed line?.

I'm trying to remove '--X' from the whole file and using variables replace $oldvar with $newvar. I have tried with double quotes but it doesn't seem to work. $newvar is set to /usr/bin/bash. Would appreciate some guidance. newvar=$(which bash) oldvar=/bin/bash sed... (1 Reply)
Discussion started by: itman73
1 Replies

2. Shell Programming and Scripting

Sort, sed, and zero padding date column csv bash scripting

Hello people, I am having problem to sort, sed and zero padding of column in csv file. 7th column only. Input of csv file: 1,2,3,4,5,6,4/1/2010 12:00 AM,8 1,2,3,4,5,6,3/11/2010 9:39 AM,8 1,2,3,4,5,6,5/12/2011 3:43 PM,8 1,2,3,4,5,6,12/20/2009 7:23 PM,8 Output:... (5 Replies)
Discussion started by: sean1357
5 Replies

3. Homework & Coursework Questions

Bash Scripting

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Try running 'phone4 xyz' and see what happens. Modify your program so that if no matching name is found, an... (1 Reply)
Discussion started by: OmgHaxor
1 Replies

4. Shell Programming and Scripting

bash scripting help

have this code but when i run it i get this error ./pulse: line 2: and here is the code #!/bin/bash if ; then pulseaudio -k; fi what am i doing wrong thanks Adam (5 Replies)
Discussion started by: ab52
5 Replies

5. UNIX for Dummies Questions & Answers

Substitue 'Special Characters' in VI

Hi All, I am using LATEX and need to delete all the lines in a file matching: \begin{work} I know there are several ways to do this, but I am trying to do it with the substitute command in VI. The problem is I can't get substitute to recognize the character '\'! How do I do it? ... (7 Replies)
Discussion started by: ScKaSx
7 Replies

6. Shell Programming and Scripting

please help with Bash Scripting????

Hi, can anyone help me with my scrip please. I wanted do following tasks: 1. List all the directory 2. A STDIN to ask user to enter a directory name from listed directories 3. command to check if the directory exists( or a command to validate if the user entered a valid directory name) ... (2 Replies)
Discussion started by: eminjan
2 Replies

7. Shell Programming and Scripting

sed command - substitue first instance

hi i have one file where i want to substitute only first instance of swap with swap1 i want to replcae only first instance of swap in my script i know we can do this with awk. but i need to do this with sed only i tried follwoing code sed 's/swap/swap1' filename but here all... (15 Replies)
Discussion started by: d_swapneel14
15 Replies

8. Shell Programming and Scripting

substitue of values.

$db2 connection ...........Q a=`$db2 -x "select A from tablename where z in (select z from tablename Q where condition fetch first 1 rows only ) with ur"` b=`$db2 -x "select B from tablename where z in (select z from tablename Q where condition fetch first 1 rows only) with ur"` $db2... (2 Replies)
Discussion started by: rollthecoin
2 Replies

9. Shell Programming and Scripting

Need to substitue space with \n

I have a file with a single line in it as below. field1 field2 field3 Different fields separated by spaces. I need the output as below. field1 field2 field3 Any sed/awk solution you can suggest? (6 Replies)
Discussion started by: krishmaths
6 Replies

10. UNIX for Advanced & Expert Users

Need help for VNS substitue solution.....

Hello Unix Gurus, We are doing large system upgrade. We expect upgrade to last 180-200 hours. The servers are located remotely. I am looking for solution which allows me to reconnect to the same session active on unix server where I launched the process. This would protect from local client... (0 Replies)
Discussion started by: mehtasa
0 Replies
Login or Register to Ask a Question