sed fails to apply substitute commands


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed fails to apply substitute commands
# 15  
Old 08-07-2012
This one thread could be an input:
Code:
(name of the script you save it as, i like to call it threaddl) http://boards.4chan.org/tg/res/20218467 1

This way it will exit once everything is done.

Desired output would be really long both explained and the html pasted in
# 16  
Old 08-07-2012
If you can't actually explain what you need, we can't actually help you.
# 17  
Old 08-07-2012
I'll explain the regex'es then. It'll take some time

---------- Post updated at 10:30 PM ---------- Previous update was at 10:10 PM ----------

Reverse engineered:
Code:
# removes a script tag
-e '1 s/.\{1,31\}$//' -e '2,10 d'

# remove rss link tag and more script tags
-e '/^<meta / s_<link[^>]\{1,100\}xml"/>\(<title>[^>]\+</title>\).\+_\1_'

# make favicon source a relative link
# ST="static.4chan.org"
# LOC=the input thread's number
-e '/^<meta / s_//'$ST'/image/\(favicon-\?[a-z]\{0,10\}\.ico\)_'$LOC'/misc/\1_'

# remove alternative style sheets
-e '/^<meta / s_<link rel="alternate style.\+\(<link rel="apple-touch-icon" h\)_\1_'

# make default style sheet's source associated to the board a relative link
-e '/^<meta / s_//'$ST'/css/\([a-z0-9\.]\{1,25\}\.css\)_'$LOC'/misc/\1_'

# remove some more script tag(s)
-e '3,16 d'

# remove name, tripcode and post number nodes that are only for mobile view
-e '$ s_\(<div id="\)pim\([0-9]\{1,25\}\).\{1,1000\}\(\1\(pi\|f\)\2\)_\3_g'

# give thumbnails, full sized pictures, logo, spoiler images, country flags relative source link
-e '$ s_//.\.thumbs\.4chan\.org/[a-z0-9]\{1,10\}/thumb/\([0-9]\{1,25\}s\.jpg\)_'$LOC'/misc/\1_g' -e '$ s_//images\.4chan\.org/[a-z0-9]\{1,10\}/src/\([0-9]\{1,25\}\.\)\(jpg\|gif\|png\)_'$LOC'/\1\2_g' -e '$ s_//'$ST'/image/title/[a-z]\{1,10\}/[a-z0-9]\{1,100\}\.\(jpg\|gif\|png\)_'$LOC'/misc/logo.\1_g' -e '$ s_//'$ST'/image/\(spoiler-\?[a-z0-9]\{0,10\}\....\)_'$LOC'/misc/\1_g' -e '$ s_//'$ST'/image/country/\(\([a-z]\{0,25\}/\)\?[a-z0-9]\{1,25\}\....\)_'$LOC'/misc/\1_g'

# point quote links to relative target and mark the ones with a target of OP or a cross thread post
-e '$ s_\(<a href="\)'$LOC'\(#p\{1,100\}"\)_\1\2_g' -e '$ s_<a href="#p'$LOC'" class="quotelink">&gt;&gt;'$LOC'_& (OP)_g' -e '$ s_\(<a href="[0-9]\{1,100\}\)\(#p[0-9]\{1,100\}" class="quotelink">&gt;&gt;[0-9]\{1,100\}\)_\1.html\2 (Cross-thread)_g'

# remove board link list, report/delete form, theme selector from the bottom of the html page, correct the Return and Top links and inject script
# S1=a script tag
-e '$ s_\(</div></div></div><hr>\)<div class="mobile".\+</div><hr>\(<div class="navLinks navLinksBot">\[<a href="\)\.\./\(\./"[^>]\{0,100\}>Return</a>\] \[<a href="\)#top\(">Top</a>\]\).\+</body>_\1</form>\2\3javascript:scroll(0,0);\4<div id=bottom></div>'$S1'</body>_'

# remove board link list, settings, posting form, correct Return and Bottom links, preserve the logo image and text and announcement, inject a script and a style tag to add image expansion feature
# S2= a script and a style tag
-e '$ s_^\(.\{1,39\}\)<div id="boardNavDesktop" class="desktop">.\{0,7800\}\(<div class="boardBanner"\{0,250\}\)<hr class="abovePostForm"/\?>.\{0,400\}\(<div class="navLinks">.<a href="\)\.\./\(\./.\{0,100\}\)#bottom\(">Bottom</a>]\).\{0,4000\}alt=""/></a>\(</div><hr><a href="ja\)_'$S2'\1\2\3\4javascript:scroll(0,d.documentElement.scrollHeight)\5\6_'

# add the http protocol to links in a tags
-e '$ s_<a\(.\{1,1000\}\)href="//_<a\1href="http://_g'

# 18  
Old 08-07-2012
Some observations ( i can't completely debug the sed-code in a short time, but maybe some pointers might help):

Code:
# removes a script tag
-e '2,10 d'
...
# remove some more script tag(s)
-e '3,16 d'

can't these be combined?

Another thing is this (or variations), which you use quite often. You use shell variable expansion inside a regexp:
Code:
s_'$ST'_'$LOC'_

This is OK in principle, but: you have to make sure the variables contents doesn't have regexp metacharacters in it. It might lead to unintended matches. It might be safer to escape the "." to "\." for instance:
Code:
# ST="static.4chan.org"

Also i am not sure if this way of interrupting the single quotes doesn't break the string. Maybe
Code:
s_'"$ST"'_'"$LOC"'_

would be a safer way to achieve what you want.

I would put back the "*" instead of the "\{1,1000\}" constructs you used. Actually i can't believe someone underwent the effort of writing a sed port and missed the proper implementation of something as basic as the metacharacter "*".

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 19  
Old 08-07-2012
Thanks for pointing out these, bakunin.
I use the {n,m} format to reduce the number of the backtracks.
I actually found problems where I have to use the S1 and S2 variables.
I'll be experimenting with what you suggested.

---------- Post updated at 12:55 AM ---------- Previous update was at 12:45 AM ----------

And now that the unmatched _ is solved, the command with the S2 variable just wouldn't work.
# 20  
Old 08-07-2012
I haven't studied the whole thread in details, but perhaps I saw why your S2 variable doesn't work. It's used in a regexp? The S2 variable has some "[" in it, e.g. a[exp], and that will be used as a regexp expression.
# 21  
Old 08-07-2012
But that's not in the pattern, so square brackets are literal.
Code:
# echo a | sed 's/a/[nope]/' 
[nope] 
#

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Substitute a character with sed

hi all, i'd like to modify a file with sed , i want to substuite a char "-" with "/" how can i do this? Thanks for all regards Francesco (16 Replies)
Discussion started by: Francesco_IT
16 Replies

2. Shell Programming and Scripting

sed - pattern match - apply substitution

Greetings Experts, I am on AIX and in process of creating a re-startable script that connects to Oracle and executes the statements. The sample contents of the file1 is CREATE OR REPLACE VIEW DB_V.TAB1 AS SELECT * FROM DB_T.TAB1; .... CREATE OR REPLACE VIEW DB_V.TAB10 AS SELECT * FROM... (9 Replies)
Discussion started by: chill3chee
9 Replies

3. Shell Programming and Scripting

sed substitute command -- need help

I am trying to do what I thought should be a simple substitution, but I can't get it to work. File: Desire output: I thought I'd start with a sed command to remove the part of the header line preceding the string "comp", then go on to remove the suffix of the target string (e.g. ":3-509(-)"),... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

4. Homework & Coursework Questions

Finding the directories with same permission and then apply some default UNIX commands

Write a Unix shell script named 'mode' that accepts two or more arguments, a file mode, a command and an optional list of parameters and performs the given command with the optional parameters on all files with that given mode. For example, mode 644 ls -l should perform the command ls -l on all... (5 Replies)
Discussion started by: femchi
5 Replies

5. Shell Programming and Scripting

Finding the directories with same permission and then apply some default UNIX commands

HI there. My teacher asked us to write a code for this question Write a Unix shell script named 'mode' that accepts two or more arguments, a file mode, a command and an optional list of parameters and performs the given command with the optional parameters on all files with that given mode. ... (1 Reply)
Discussion started by: femchi
1 Replies

6. UNIX for Dummies Questions & Answers

Using sed to substitute between quotes.

I'm using sed to perform a simply search and replace. The typical data is: <fig><image href="Graphics/BAV.gif" align="left" placement="break" I need to replace the value in the first set of quotes, keeping the remainder of the line the same. Thus: <fig><image href="NEW_VALUE" align="left"... (3 Replies)
Discussion started by: Steve_altius
3 Replies

7. Shell Programming and Scripting

Using sed to substitute first occurrence

I am trying to get rid of some ending tags but I run into some problems. Ex. How are you?</EndTag><Begin>It is fine.</Begin><New> Just about I am trying to get rid of the ending tags, starts with </ and ending with >. (which is </EndTag> and </Begin>) I tried the following sed... (2 Replies)
Discussion started by: quixoticking11
2 Replies

8. Shell Programming and Scripting

Using SED to substitute between two patterns.

Hi All, I'm currently using SED to make various changes to some .xml files I'm working on, but I'm stuck on this particular problem. I want to remove '<placeholder>element-name</placeholder>' from the following: <heading>Element <placeholder>element-name</placeholder> not... (2 Replies)
Discussion started by: Steve_altius
2 Replies

9. Solaris

patchadd fails to apply a patch

Hello, I'm trying to apply the patch on Solaris 9 : $/jac/update$ patchadd ./112945-46 Checking installed patches... One or more patch packages included in 112945-46 are not installed on this system. Patchadd is terminating. The error message is not really talkative so I had a... (7 Replies)
Discussion started by: Tex-Twil
7 Replies

10. UNIX for Dummies Questions & Answers

sed substitute situation

I am having a problem executing a sed substitute in a file. I have tried alot of different things I found in previous posts, however non seem to work. I want to substitute this in $FILE: VALUE=33.4 In the script I have tried the following: prev=$(awk -F"=" '{ print $2 }' $FILE ) new=$(echo... (16 Replies)
Discussion started by: newbreed1
16 Replies
Login or Register to Ask a Question