Bash Scripting

05-20-2011

Registered User

22, 0

Join Date: May 2011

Last Activity: 1 August 2011, 11:59 PM EDT

Posts: 22

Thanks Given: 10

Thanked 0 Times in 0 Posts

Bash Scripting - sed (substitue)

1. The problem statement, all variables and given/known data:

I have been asked to create a bash script to delete comments from another file but in the file they have an echo command with this inside of it /* this is an echo */\ so obviously they want to keep this one in the file. I have found this bit of code and im having trouble reading it. starting with the first line with #!/bin/sed -f as we have only be dealing with #!/bin/bash. so therefor all the :x and N bs etc. is making it harder for me to understand so in other words i want to re-write this code starting with #!/bin/bash and using straight calls to the sed command. Any help would be great.

2. Relevant commands, code, scripts, algorithms:

Code:

#!/bin/sed -f

# Simple Sed Program to remove all comments from c program

/\/\*/!b

:x
/\*\//!{
N
bx
}
# delete /*...*/
s/\/\*.*\*\///

3. The attempts at a solution (include all code and scripts):

Code:

#!/bin/bash 
# Simple Sed Program to remove all comments from c program

/\/\*/!b

sed -x
/\*\//!{
sed -N
bx
}
s/\/\*.*\*\///

This is straight up not working.

4. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):

2309ENG: C & Unix Programming - Dr David Rowlands

Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).

syco__

View Public Profile for syco__

Find all posts by syco__

05-20-2011

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

What language is the file from which you are removing comments? The fact that you say it has "echo" statements makes me think "shell", but shells do not support "/*....*/" style comments.

Regardless of language, it's quite a leap to say that's it's obvious that comments containing "echo" should be retained. If you were told to remove all comments then that it what you should do. Maybe you should clarify your instructions.

I haven't tried it, but the first sed script you posted looks like it should work. It's pretty clever actually. But you can't split it up into separate sed statements. It's a unit and it needs to stay that way.

/\/\*/!b Loop until we find a line with "/*" in it. Once we find such a line we can proceed to the the rest of the sed script.

Code:

:x
/\*\//!{
N
bx
}

This little paragraph asks if we have a "*/". We might if the earlier sed statement found a one line comment. But if the comment extends to several lines, we will not immediately have the terminating "*/". So read read another input line with the "N" command. The "bx" jumps back to the ":x". That is how we loop in sed. Eventually we will read the "*/" and fall out of the loop to the final statement.

Code:

# delete /*...*/
s/\/\*.*\*\///

The comment is correct. It deletes the "/*...*/" comment. If all you have is one line comments, this one command may be all you need.

I have not tested it. But it all looks good. But each statement depends on the others. You are not going to be able to turn this sed script into several sed scripts. I hope this helps you.

This User Gave Thanks to Perderabo For This Post:

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

05-21-2011

Registered User

22, 0

Join Date: May 2011

Last Activity: 1 August 2011, 11:59 PM EDT

Posts: 22

Thanks Given: 10

Thanked 0 Times in 0 Posts

Oh sorry, i am creating this bash script to take effect on a C programed file to take out all the comments. Yes this script does work and now i know why thanks alot you made it very clear. just dont understand how it knows if its a comment or if the comment isnt actually a comment and its in the echo command and the */ message /* will appear with the symbols being the same as what you would use for a comment.

Thanks again.

---------- Post updated at 12:06 AM ---------- Previous update was at 12:06 AM ----------

Oh sorry, i am creating this bash script to take effect on a C programed file to take out all the comments. Yes this script does work and now i know why thanks alot you made it very clear. just dont understand how it knows if its a comment or if the comment isnt actually a comment and its in the echo command and the */ message /* will appear with the symbols being the same as what you would use for a comment.

Thanks again.

syco__

View Public Profile for syco__

Find all posts by syco__

05-21-2011

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

This still leaves you with the problem of writing your script. Let us restart from the beginning:

When you say you want to "remove comments" you have to know what such a comment constitutes: there must be a sequence of characters which start a comment and a sequence, which ends a comment. Consider the following (phantasy-) shell code:

Code:

#!/bin/myshell

some_command    # this is a comment

# this is a comment too
Is this still a comment?

other_command

Lets establish what starts a comment: the "#" sign obviously. And what ends it? The line end (which answers the question if the line 6 is a comment - it is not, because it is after the end of comment-end-marker).

Still there are some possible problems: What about comments in comments? Lines like this:

Code:

command  # comment # what status has this?

Or what about using the comment-start sequence quoted:

Code:

echo "something # is this a comment?"  # what about this?

or escaped:

Code:

echo "something" \# comment?

There is a certain class of programs designed to deal with these problems. They are called "parsers". These programs constitute the first part of a compiler, where the program code gets read, stripped of everything unnecessary (like comments, which the program doesn't need) and is syntax-checked. It all sounds quite complicated and "building compilers" sounds quite like high-level stuff, but this is astonishingly easy.

Lets take stock: we have a set of rules (what starts a comment, what ends a comment) and we have an action (filter the comments out). We saw above, that we need some more rules regarding the quoting and the escaping, let's go over the rules again:

Do not look at characters inside "..." or '....' (quoting)
If a character "\" is encountered treat the one following it as a normal character, regardless of it's usual meanings (escaping)
When you encounter a comment-start-marker throw away it and the text you read until you encounter a comment-end-marker
Output what you have read

Parsers do their work the following way: read in one character after the other. After each character decide if you have found a sequence with a rule attached to it. Finally, after applying all the applicable rules, output the result (if some rule doesn't forbid it).

In fact parsers are just while-loops reading one character after the other and long case-constructs, trying to apply one rule after the other to the character read. It is clear that we will have to maintain some status-flags when we are only to look at one character at a time. We will have to "remember" if we are inside a comment, inside a quoted string, etc..

How about you trying to write a program (in pseudo-code only, just the logic) for your problem and posting it. Then we will go over your solution and implement that in real code.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

05-21-2011

Registered User

22, 0

Join Date: May 2011

Last Activity: 1 August 2011, 11:59 PM EDT

Posts: 22

Thanks Given: 10

Thanked 0 Times in 0 Posts

Yeah thanks alot of slowly getting the hard of the concept just dont know enough programing wise but i have come up with this as my pseudo code.

1. Search Given File
2. Loop looking for /* to */
2.1 check if /* is inside of "" or part of code if it is leave it
2.3 if not delete it
2.4 look for next /* */ combination
3. Print file with changes made.

---------- Post updated at 08:43 PM ---------- Previous update was at 08:43 PM ----------

Yeah thanks alot of slowly getting the hard of the concept just dont know enough programing wise but i have come up with this as my pseudo code.

1. Search Given File
2. Loop looking for /* to */
2.1 check if /* is inside of "" or part of code if it is leave it
2.3 if not delete it
2.4 look for next /* */ combination
3. Print file with changes made.

syco__

View Public Profile for syco__

Find all posts by syco__

05-23-2011

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by syco__

1. Search Given File
2. Loop looking for /* to */
2.1 check if /* is inside of "" or part of code if it is leave it
2.3 if not delete it
2.4 look for next /* */ combination
3. Print file with changes made.

You are thinking about this "from the wrong side". I suggest you read again what i said about parsers reading one character at a time and applying rules to it. I'll take our phantasy-shell example and show you a solution to this, you will still have to come up with a solution for your original problem:

At first, we need some "memory" to remember what we have read so far. Memory comes in the form of status flags (TRUE/FALSE) , which we maintain while reading the code. We start with one for remembering the escape-sequence. We need to do this first, because escaping the strongest "coupling" construct of a language. It is like operator precedence: this is the "operator" with the highest precedence so we have to take care of this first *):

Code:

next_character_is_escaped=false # we have to interpret the next character

WHILE not at the end of the input file
      read a character
      
      IF next_character_is_escaped != true
           [... we will have to work on the character further here ...]
      ELSE
           next_character_is_escaped=false
      END IF

      output the read character

END WHILE

You see, the loop does nothing more than to read one character at a time and maintain a status flag for escaped characters. This relates to the "rule 2" of my last posting.

I suggest you take a piece of paper and try to go through a real script with this pseudo-program "by hand" to see how the status flags are maintained and the logic works (the same for the other pieces of code to follow - this is a invaluable exercise in grasping programming constructs, believe me! *) ). You see that what the escaped-status-flag does is to exclude the next character from being interpreted by the logic (the "further work") we are going to apply now. Let's start with quoting (i ignore single quotes here to keep the example short - this doesn't mean we could forget about them in the real world):

Code:

inside_a_quote=false          # we are not between "..." right now
next_character_is_escaped=false # we have to interpret the next character

WHILE not at the end of the input file
      read a character

      IF next_character_is_escaped != true
           CASE the read character is
                - the escape char
                     next_character_is_escaped=true

                - the quote char
                     IF inside_a_quote == true
                          inside_a_quote=false
                     ELSE
                          inside_a_quote=true
                     END IF 

           END CASE
      ELSE
           next_character_is_escaped=false
      END IF

      output the read character

END WHILE

The added logic just flips the inside-quote-status flag when we encounter double quotes (actually: double quotes which aren't escaped). Now, that we have covered what was "rule 1" in my last posting we are ready to tackle the comments themselves:

Notice, that applying a rule to the character read is indeed changing the way we are applying the following rules: if rule 1 says "don't look at the character any further" this means that rule 3 won't be applied to it, because there is "nothing any more that it could be applied to", so to say.

I have added some logic at the end to let you see if the program correctly catches all instances of comments. Again, i suggest you "execute" it by hand with a piece of paper to see its operation.

Code:

inside_a_quote=false          # we are not between "..." right now
inside_a_comment=false        # we are not between a comment-start and a comment-end
next_character_is_escaped=false # we have to interpret the next character

WHILE not at the end of the input file
      read a character

      IF next_character_is_escaped != true
           CASE the read character is
                - the escape char
                     next_character_is_escaped=true

                - the quote char
                     IF inside_a_quote == true
                          inside_a_quote=false
                     ELSE
                          inside_a_quote=true
                     END IF

                - the comment-start char
                     IF inside_a_quote == false
                          inside_a_comment=true
                     END IF

                - the comment-end char
                     IF inside_a_quote == false
                          inside_a_comment=false
                     END IF

           END CASE
      ELSE
           next_character_is_escaped=false
      END IF

      IF inside_a_comment == false
           output the read character in black
      ELSE
           output the read character in red
      END IF

END WHILE

Some notes reagrding your own program: you will notice that (unlike in my example) there is not a "comment-start character" but a "comment-start SEQUENCE". You will have to implement additional status flags to catch these sequences.

Put your efforts here, in pseudo-code like i did, and we will discuss your solution and finally implement the program itself.

I hope this helps.

bakunin
__________

*) This is easily shown by looking at this: "\"" . If the quoting would be primary and the escaping secondary the escaping would not take place because of being inside a quote. And the output would be 2 characters (backslash-quote sign). There would be no way to have the double quote character inside a quoted string. As it is escaping takes place before quoting and therefore the string means "an escaped double-quote sign inside a quoted string" and this accomplishes the "double-quote inside a quoted string" just fine.

*) Alan Turing did exactly this with his first try at a chess program.

Last edited by bakunin; 05-23-2011 at 02:11 PM.. Reason: changed a logical mistake

bakunin

View Public Profile for bakunin

Find all posts by bakunin

Homework & Coursework Questions

Bash Scripting - sed (substitue)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed substitue whole file + 1 substitue with variables on one sed line?.

Discussion started by: itman73

2. Shell Programming and Scripting

Sort, sed, and zero padding date column csv bash scripting

Discussion started by: sean1357

3. Homework & Coursework Questions

Discussion started by: OmgHaxor

4. Shell Programming and Scripting

bash scripting help

Discussion started by: ab52

5. UNIX for Dummies Questions & Answers

Substitue 'Special Characters' in VI

Discussion started by: ScKaSx

6. Shell Programming and Scripting

please help with Bash Scripting????

Discussion started by: eminjan

7. Shell Programming and Scripting

sed command - substitue first instance

Discussion started by: d_swapneel14

8. Shell Programming and Scripting

substitue of values.

Discussion started by: rollthecoin

9. Shell Programming and Scripting

Need to substitue space with \n

Discussion started by: krishmaths

10. UNIX for Advanced & Expert Users

Need help for VNS substitue solution.....

Discussion started by: mehtasa