sed pattern and hold space issues


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed pattern and hold space issues
# 1  
Old 02-02-2011
sed pattern and hold space issues

Good day.

Trying to make a sed script to take text file in a certain format and turn it into mostly formatted html.
I'm 95% there but this last bit is hurting my head finally.

Here's a portion of the text-
Code:
Budgeting and Debt:

 Consumer Credit Counseling of Western PA
CareerLink
112 Hollywood Drive, Suite 101
Butler, PA 16001
1-888-511-2227 (CCCS)
.Financial education and counseling services include
.Last education and counseling services include Credit

 Family Saving Account Program
Armstrong County Community Action Agency
705 Butler Road
Kittanning, PA 16201
724-548-3408
.Helps people with low or moderate incomes establi
.Applicants cannot have income exceeding 200% Federal
.Account Holders can save as little as $10 per week
.Matching funds are provided at 1:1, up to $1000 per
.Last attend financial management workshops

Here's the sed script-
Code:
/:$/,/^$/ {
  s/\(.*\):$/ <\/ul>\
\
 <div class=sections id=>\1:<\/div>\
 <ul>/
 /^$/d
}

/^ [a-zA-Z]/,/^$/ {
  s/^ \(.*$\)/   <li><div class=subsections>\1<\/div>/

  /^[0-9a-zA-Z]/ {
    s/^\(.*\)$/     \1<br>/
  }

  /^\./,/^$/ {
    /^$/! {
      s/^.\(.*\)/       <li>\1<\/li>/
      H
      D
    }
    /^$/ {
#      g
      x
      i\
     <ul>
      a\
     <\/ul>\
   <\/li>
#      G
    }
  }

}

And here is the output from >sed -f sed-script3 < block.txt
Code:
 </ul>

 <div class=sections id=>Budgeting and Debt:</div>
 <ul>
   <li><div class=subsections>Consumer Credit Counseling of Western PA</div>
     CareerLink<br>
     112 Hollywood Drive, Suite 101<br>
     Butler, PA 16001<br>
     1-888-511-2227 (CCCS)<br>
     www.cccspa.org<br>
     <ul>
       <li>Financial education and counseling services include </li>
       <li>Last education and counseling services include Credit </li>
     </ul>
   </li>
   <li><div class=subsections>Family Saving Account Program</div>
     Armstrong County Community Action Agency<br>
     705 Butler Road<br>
     Kittanning, PA 16201<br>
     724-548-3408<br>
     www.armstrongcap.com<br>
     <ul>

       <li>Helps people with low or moderate incomes establi</li>
       <li>Applicants cannot have income exceeding 200% Federal </li>
       <li>Account Holders can save as little as $10 per week </li>
       <li>Matching funds are provided at 1:1, up to $1000 per</li>
       <li>Last attend financial management workshops </li>
     </ul>
   </li>

The last blank line between the <ul> and the <li> is not a huge issue, but I'd really like to understand why the blank line is in there and not on the block above it.
I've commented out the g but it is interesting since if it's in there the blank line in question is gone BUT the <li> list contains the list from the block above.

Guess I just don't understand the pattern and hold space well enough.

Any advice sure would be appreciated.

gunnar.

Moderator's Comments:
Mod Comment Please use code tags when posting data and code samples!

Last edited by Franklin52; 02-03-2011 at 03:38 AM..
# 2  
Old 02-02-2011
Quote:
Any advice sure would be appreciated.
seriously, why are you using sed in the first place? sed syntax is terse and too much of it makes your code "unreadable", at least not at first glance. Use something fit for human consumption, like awk or a programming language. Use sed only for simple string manipulation.
# 3  
Old 02-02-2011
Well that's where I was headed, then I got so close...
And now it's to do it because it can be done!

I wouldn't have thought it doable in sed either, but there it is. Almost done...
# 4  
Old 02-02-2011
Quote:
Originally Posted by fiendracer
And now it's to do it because it can be done!
sure, it can be done. You can even be done using assembly, but would anyone ? I am not saying its wrong to use sed, just that you( or the successor of your code) might have trouble maintaining/understanding it and it will be harder to debug/amend also in case there are changes to your requirement next time.

Just a friendly advice.
# 5  
Old 02-02-2011
You might want to research groff and the other micro utilities.
# 6  
Old 02-03-2011
If any one cares, I figured it out.
Just had to re-re-read the section in my sed & awk book on this.
Thanks to all that suggested that I shouldn't be doing it in sed, it was worth it all to just figure it out.
After all, it really is just some simple text substitution...

here's the line that I had to add-
Code:
s/^\n//

See the H puts a "\n" in the pattern space even if it's empty, so I
just need to delete it.

So here's the script in it's new and nicer commented way-
Code:
# this section processes the major heading that has a : after it
# it's followed by a empty line.  Simple
/:$/,/^$/ {
  s/\(.*\):$/ <\/ul>\
\
 <div class=sections id=>\1:<\/div>\
 <ul>/
 /^$/d
}

# this section processes the block, starting w/ " subsection"
# it goes until a empty line
/^ [a-zA-Z]/,/^$/ {
  s/^ \(.*$\)/   <li><div class=subsections>\1<\/div>/

  # this little section puts a <br> at the end of the lines
  # until it gets to the line w/ the "." at the beginning
  /^[0-9a-zA-Z]/ {
    s/^\(.*\)$/     \1<br>/
  }

  # here's the tricky part
  # process from the "." lines to an empty line
  /^\./,/^$/ {
    # if it's non-empty then process
    # add the html to the line then append it to Hold space
    # then Delete it from the pattern space
    /^$/! {
      s/^.\(.*\)/       <li>\1<\/li>/
      H
      D
    }
    # if it's empty...
    # exchange the pattern and the hold space
    # the Hold command puts a "\n" into the pattern space (!)
    # so delete that fucking "\n" and we're golden
    # get the <ul> inserted and the </ul> appended and let it roll!
    /^$/ {
      x
      s/^\n//
      i\
     <ul>
      a\
     <\/ul>\
   <\/li>
      #  G
    }
  }

}

And here's the output for you non-believers-
Code:
 </ul>

 <div class=sections id=>Budgeting and Debt:</div>
 <ul>
   <li><div class=subsections>Consumer Credit Counseling of Western PA</div>
     CareerLink<br>
     112 Hollywood Drive, Suite 101<br>
     Butler, PA 16001<br>
     1-888-511-2227 (CCCS)<br>
     <ul>
       <li>Financial education and counseling services include </li>
       <li>Last education and counseling services include Credit </li>
     </ul>
   </li>
   <li><div class=subsections>Family Saving Account Program</div>
     Armstrong County Community Action Agency<br>
     705 Butler Road<br>
     Kittanning, PA 16201<br>
     724-548-3408<br>
     <ul>
       <li>Helps people with low or moderate incomes establi</li>
       <li>Applicants cannot have income exceeding 200% Federal </li>
       <li>Account Holders can save as little as $10 per week </li>
       <li>Matching funds are provided at 1:1, up to $1000 per</li>
       <li>Last attend financial management workshops </li>
     </ul>
   </li>

Ain't that a beaut?


Moderator's Comments:
Mod Comment Please use code tags when posting data and code samples!

Last edited by Franklin52; 02-03-2011 at 03:39 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using sed's hold-space to filter file contents

I wrote an awk script to filter "uninteresting" commands from my ~/.bash_history (I know about HISTIGNORE, but I don't want to exclude these commands from my current session's history, I just want to avoid persisting them across sessions). The history file can contain multi-line entries with... (6 Replies)
Discussion started by: ivanbrennan
6 Replies

2. Shell Programming and Scripting

sed - remove space and . for a pattern

Hi, I have file which contains following lines A| 965.|Mr.|35.45| 66. B| 33.| a456.| 77. The output should be A|965|Mr.|35.45|66 B|33| a456.|77 Basically if a Number has space in prefix and . in suffix remove both. So pattern could be if there is a | which has next two characters as... (1 Reply)
Discussion started by: wahi80
1 Replies

3. Shell Programming and Scripting

Hold buffer in sed

Hi Experts, i have a file like below **** table name is xyz row count for previous day 10 row count for today 20 diff between previous and today 10 scan result PASSED **** table name is abc row count for previous day 90 row count for today 35 diff between previous and today 55... (4 Replies)
Discussion started by: Lakshman_Gupta
4 Replies

4. Shell Programming and Scripting

Hold, Replace and Print with sed

Hi, I'm a newbie with scripting so I'd appreciate any help. I have a file import.txt with below text AA_IDNo=IDNoHere AA_Name=NameHere AA_Address=AddressHere AA_Telephone=TelephoneHere AA_Sex=SexHere AA_Birthday=BirthdayHere What I need is that the Lines for Name, Address and... (3 Replies)
Discussion started by: heretolearn
3 Replies

5. Shell Programming and Scripting

Sed pattern space/looping conundrum

Although my sed skills are gradually developing, thanks in large part to this forum, I'm having a hard time dealing with pattern space and looping, which I suspect is what I'll need a better handle on to figure out my current issue, which is converting a multi line file like this: ... (4 Replies)
Discussion started by: tiggyboo
4 Replies

6. Shell Programming and Scripting

sed: hold buffer question

I've been using sed to help with reformatting some html content into latex slides using the beamer class. Since I'm new to sed, I've been reading a lot about it but I'm stuck on this one problem. I have text that looks like this: ******************* line of text that needs to be... (4 Replies)
Discussion started by: tfrei
4 Replies

7. Shell Programming and Scripting

help - sed - insert space between string of form XxxAxxBcx, without replacing the pattern

If the string is of the pattern XxxXyzAbc... The expected out put from sed has to be Xxx Xyz Abc ... eg: if the string is QcfEfQfs, then the expected output is Qcf Ef Efs. If i try to substitute the pattern with space then the sed will replace the character or pattern with space,... (1 Reply)
Discussion started by: frozensmilz
1 Replies

8. Shell Programming and Scripting

Duplicate pattern space (sed)

Hi, I'm new to sed and i'm having a few difficulties.. I need to append the current line to the pattern space, which already contains that same line, e.g.: current line : test pattern space : test|test I was able to do this using the hold space, but the problem is that in the next step of... (2 Replies)
Discussion started by: delucasvb
2 Replies

9. UNIX for Dummies Questions & Answers

blank space in regex pattern using sed

why does sed 's/.* //' show the last word in a line and sed 's/ .*//' show the first word in a line? How is that blank space before or after the ".*" being interpreted in the regex? i would think the first example would delete the first word and the next example would delete the second... (1 Reply)
Discussion started by: glev2005
1 Replies

10. Shell Programming and Scripting

injecting new line in sed substitution (hold space)

Morning, people! I'd like to call upon your expertise again, this time for a sed endeavor. I've already searched around the forums, didn't find anything that helped yet. background: Solaris 9.x, it's a closed system and there are restrictions to what is portable to it. So let's assume I... (4 Replies)
Discussion started by: ProGrammar
4 Replies
Login or Register to Ask a Question