Replacing text in between the string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing text in between the string
# 8  
Old 07-06-2013
The command:
Code:
sed '/^<[Ii][Mm][Gg]/sXsrc="[^"]*[\\/]Xsrc="images/Xg' file

with file containing the data I listed before produces the output:
Code:
<IMG src="images/3251_8832613.jpg">
<IMG src="images/1303_4703925.jpg">
<IMG src="images/1948_4703925.jpg">
<img src="images/2721_7990660.jpg">
<IMG style="HEIGHT: 3.5in" src="images/90417_4302013100410.jpg">
<IMG style="HEIGHT: 3.5in" src="images/90417_4302013100411.jpg">
<IMG style="WIDTH: 434px; HEIGHT: 369px" height=433 src="images/2360_7990660.jpg" width=438>
<IMG style="WIDTH: 3.5in" src="images/5942_7990660.jpg">
<IMG src="images/2531_4703925.jpg"><BR><IMG src="images/2531_7990660.jpg"><BR></P>

when run on OS X using /usr/bin/sed. What is the output of the commands:
Code:
what sed;uname -a

on your system.
# 9  
Old 07-06-2013
Code:
$ which sed; uname -a
/bin/sed
Linux ubuntu 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

sed --version
GNU sed version 4.2.1

---------- Post updated at 10:05 PM ---------- Previous update was at 10:02 PM ----------

Don, if I read correct your script assumes < is at the beginning of the line:

Code:
sed '/^<

which works on the sample text, however in my case, I have a ton of text before and after image tags.

---------- Post updated at 10:08 PM ---------- Previous update was at 10:05 PM ----------

Looking through the output file looks good Smilie

how can I print ONLY text, not an entire line between <IMG and the > ?

Code:
sed -nE '/IMG/,/jpg">/p' output

prints an entire line, I'd like to see any variations and print ONLY between <IMG and the >

---------- Post updated at 10:10 PM ---------- Previous update was at 10:08 PM ----------

Code:
sed -nE '/<[Ii][Mm][Gg]\ src/,/jpg">/p' output

does the same thing print entire line Smilie

---------- Post updated at 10:15 PM ---------- Previous update was at 10:10 PM ----------

Code:
grep -o "\<[Ii][Mm][Gg].*.jpg\"" output

does match, however it will print all in one line if there are multiple <IMG tags in one line, plus it always includes lots of junk.

Code:
IMG src="https://www.unix.com/images/2359_4703925.jpg"><BR><IMG src="https://www.unix.com/images/2359_7990660.jpg"
IMG style="WIDTH: 448px; HEIGHT: 339px" height=437 src="https://www.unix.com/images/2360_4703925.jpg" width=512></SPAN></P><P><SPAN style="FONT-SIZE: 8pt"><IMG style="WIDTH: 434px; HEIGHT: 369px" height=433 src="https://www.unix.com/images/2360_7990660.jpg" width=438></SPAN></P><SPAN style="FONT-SIZE: 8pt; mso-bidi-font-size: 12.0pt"><o:p><IMG style="WIDTH: 559px; HEIGHT: 385px" height=364 src="https://www.unix.com/images/2360_8832613.jpg"


Last edited by Scrutinizer; 07-06-2013 at 04:36 AM.. Reason: Add many more code tags
# 10  
Old 07-06-2013
Quote:
Originally Posted by 2by4
$ which sed; uname -a
/bin/sed
Linux ubuntu 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

sed --version
GNU sed version 4.2.1

---------- Post updated at 10:05 PM ---------- Previous update was at 10:02 PM ----------

Don, if I read correct your script assumes < is at the beginning of the line:

Code:
sed '/^<

which works on the sample text, however in my case, I have a ton of text before and after image tags.
Yes. Every single sample line you showed us (including the ones that you said did not work with my script) had <IMG or <img at the start of every line you wanted to change.

You can try changing the '^<... to '<..., but if you have complete lines longer than LINE_MAX bytes, then none of the UNIX text processing utilities are defined to work. The script I gave you will also modify any occurrence of src=".../ that it finds on matching lines; so if that occurs in tags other than <IMG.*> and <img.*>, it may make changes you don't want.

If you would actually give a full description of what your input looks like (instead of the unrepresentative samples you have shown us so far), we might be able to help you come up with a script that will work.
# 11  
Old 07-06-2013
Don,

Sorry about that Smilie

I'm working with a multi column tab separated csv file. All columns are small except one, that one includes all the notes and is an html encoded script. The max length for that column is 8192 (database limitation)

Here is a sample input, you already know the other possibilities of <IMG >

Code:
<HEAD><META content="MSHTML 6.00.2800.1126" name=GENERATOR></HEAD><BODY bgColor=#fdf5e6><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"><SPAN style="FONT-SIZE: 13pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><STRONG>&nbsp; Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun</STRONG></SPAN></SPAN></P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"><SPAN style="FONT-SIZE: 13pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><STRONG></STRONG></SPAN></SPAN>&nbsp;</P><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"><SPAN style="FONT-SIZE: 13pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><STRONG><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><TABLE class=MsoNormalTable style="MARGIN: auto auto auto 2.9pt; BORDER-COLLAPSE: collapse; mso-table-layout-alt: fixed; mso-padding-alt: 0in 2.9pt 0in 2.9pt" cellSpacing=0 cellPadding=0 border=0><TBODY><TR style="mso-yfti-irow: 0"><TD style="BORDER-RIGHT: #ece9d8; PADDING-RIGHT: 2.9pt; BORDER-TOP: #ece9d8; PADDING-LEFT: 2.9pt; PADDING-BOTTOM: 0in; BORDER-LEFT: #ece9d8; WIDTH: 477pt; PADDING-TOP: 0in; BORDER-BOTTOM: #ece9d8; BACKGROUND-COLOR: transparent" vAlign=top width=795><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun </SPAN></B><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt; mso-bidi-font-weight: bold">Lorem ipsum dolor sit amet, consectetur adipisicing elit,.</SPAN><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"><?xml:namespace prefix = o /><o:p></o:p></SPAN></B></P></TD></TR><TR style="mso-yfti-irow: 1"><TD style="BORDER-RIGHT: #ece9d8; PADDING-RIGHT: 2.9pt; BORDER-TOP: #ece9d8; PADDING-LEFT: 2.9pt; PADDING-BOTTOM: 0in; BORDER-LEFT: #ece9d8; WIDTH: 477pt; PADDING-TOP: 0in; BORDER-BOTTOM: #ece9d8; BACKGROUND-COLOR: transparent" vAlign=top width=795><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 8pt; COLOR: white; mso-bidi-font-size: 12.0pt"><o:p>&nbsp;</o:p></SPAN></B></P></TD></TR><TR style="mso-yfti-irow: 2"><TD style="BORDER-RIGHT: #ece9d8; PADDING-RIGHT: 2.9pt; BORDER-TOP: #ece9d8; PADDING-LEFT: 2.9pt; PADDING-BOTTOM: 0in; BORDER-LEFT: #ece9d8; WIDTH: 477pt; PADDING-TOP: 0in; BORDER-BOTTOM: #ece9d8; BACKGROUND-COLOR: transparent" vAlign=top width=795><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt">TECHNIQUE:</SPAN></B><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun<B style="mso-bidi-font-weight: normal"><o:p></o:p></B></SPAN></P></TD></TR><TR style="mso-yfti-irow: 3"><TD style="BORDER-RIGHT: #ece9d8; PADDING-RIGHT: 2.9pt; BORDER-TOP: #ece9d8; PADDING-LEFT: 2.9pt; PADDING-BOTTOM: 0in; BORDER-LEFT: #ece9d8; WIDTH: 477pt; PADDING-TOP: 0in; BORDER-BOTTOM: #ece9d8; BACKGROUND-COLOR: transparent" vAlign=top width=795><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 8pt; COLOR: white; mso-bidi-font-size: 12.0pt"><o:p>&nbsp;</o:p></SPAN></B></P></TD></TR><TR style="mso-yfti-irow: 4; mso-yfti-lastrow: yes"><TD style="BORDER-RIGHT: #ece9d8; PADDING-RIGHT: 2.9pt; BORDER-TOP: #ece9d8; PADDING-LEFT: 2.9pt; PADDING-BOTTOM: 0in; BORDER-LEFT: #ece9d8; WIDTH: 477pt; PADDING-TOP: 0in; BORDER-BOTTOM: #ece9d8; BACKGROUND-COLOR: transparent" vAlign=top width=795><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt">FINDINGS:</SPAN></B><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun<o:p></o:p></SPAN></P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"><o:p>&nbsp;</o:p></SPAN></P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun<SPAN style="mso-spacerun: yes">&nbsp; </SPAN>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun.<o:p></o:p></SPAN></P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"><o:p>&nbsp;</o:p></SPAN></P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun.</SPAN><B style="mso-bidi-font-weight: normal"><o:p></o:p></B></P></TD></TR></TBODY></TABLE></P></STRONG></SPAN></SPAN><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"></SPAN>&nbsp;</P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"> </P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt">&nbsp; Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun:<o:p></o:p></SPAN></B></P></SPAN><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"></SPAN>&nbsp;</P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt">&nbsp;Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididun.<o:p></o:p></SPAN></P><P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-ALIGN: justify" align=left><SPAN style="FONT-SIZE: 13pt; mso-bidi-font-size: 12.0pt"><o:p>&nbsp;</o:p></SPAN></P><P align=left><SPAN

# 12  
Old 07-06-2013
For long line length you might get lucky with awk, which in theory is not required by the standards to handle lines exceeding LINE_MAX, but in practice typically does, provided you carve the lines up by specifying a record separator (RS) that creates smaller record lengths (I have not yet come across one that does not). Try:
Code:
awk '/^[Ii][Mm][Gg]/{ sub(/src="[^"]*[\/\\]/, "src=" new "/", $1) }1' new="https://www.unix.com/images" RS=\< FS=\> ORS=\< OFS=\> file

--edit---

I found the following:
Quote:
Whether the variable RS is set to a value other than a <newline> or not, for these files, implementations shall support records terminated with the specified separator up to {LINE_MAX} bytes and may support longer records.
awk: Input Files


So this would suggest in the case of awk LINE_MAX determines the record length, not line length, so it should work (note that is says bytes, not characters)...

Last edited by Scrutinizer; 07-06-2013 at 07:20 AM..
# 13  
Old 07-06-2013
Quote:
Originally Posted by 2by4
Don,

Sorry about that Smilie

I'm working with a multi column tab separated csv file. All columns are small except one, that one includes all the notes and is an html encoded script. The max length for that column is 8192 (database limitation)

Here is a sample input, you already know the other possibilities of <IMG >

Code:
<HEAD><META content="MSHTML 6.00.2800.1126" name=GENERATOR></HEAD> ... ... ... <P align=left><SPAN

If you look at the 8K byte sample input in your posting (not the abbreviated sample in the quote above), you will note that <IMG never appears, <img never appears, and src= never appears. Also note that your 8K byte database field length limited data ends in the middle of a SPAN tag.

If you want us to "fix" fields that have been thrown away by truncation when you put your data into a database, how do you expect us to recreate them from the truncated data?

Showing us a sample of data that contains absolutely nothing that you say you want changed doesn't help. What we need is an accurate description of the input that is to be processed. In particular:
  1. What type of system are you using? (I.e., What is the output from the command uname -a.)
  2. What is the value of LINE_MAX on your system. (I.e., What is the output from the command: getconf LINE_MAX.)
  3. What is the maximum length of your HTML code? We know it is more than 8Kb, but how much more?
  4. Is the total length of any text in your input file before the start of the first tag in your HTML file (i.e., before the 1st < in a line in your TSV [Tab Separated Values] file), the total length of any data from the start of one tag in your HTML to the start of the next tag (e.g. a paragraph of text between tags), or the total length of any text after the start of the last tag in your HTML to the end of your TSV file ever greater than (LINE_MAX - 2) bytes?
  5. If the HTML field was removed from your TSV, would the remaining content still have lines longer than LINE_MAX bytes?
  6. Are there any tags other than <IMG.*> (with mixed case I, M, and G) that contain the string src=" in your HTML? If so, what are they?
  7. Can the strings <IMG (mixed case) or src= appear in any other fields in your TSV (i.e., not in the field that contains the HTML)?
  8. Which field (or fields) in your TSV contain the HTML code that you want processed?
This is a complex problem made much harder by your refusal to accurately describe your input.

Note that Scrutinzer gave an excellent suggestion for a way of approaching this problem. (It is missing a double quote in the output after src= and it adds an unwanted < at the end of file, but those are trivial details given the other problems that need to be addressed.) I have something based on Scrutinizer's suggestion that may come close to what you need, but until I get answers to the above questions, there is no need to guess at whether or not it might work.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing a string

Hi All, I have a many folders in a directory under which there are many subdirectories containing text files containing the word "shyam" in them.I want all the files in all the directories containing "shyam to "ram" ?? sed "s/shyam/ram/g" does it ??But anyone can help me with the script ?? ... (3 Replies)
Discussion started by: Pradeep_1990
3 Replies

2. Programming

Need help for replacing a string in a text file at runtime !

Hi All, I am facing an issue... I need to replace some string in a text file while the same file is read by some other user at the same time. The other user is using it in the Read only mode. So I can't create a temporary file and write the content first and then write it back into the original... (2 Replies)
Discussion started by: agupta2
2 Replies

3. Shell Programming and Scripting

Help with replacing string

Hi All, I have below requirement: I need to read each line in file.txt and replace string starting from position 9 to 24 {111111111111111,222222222222222,333333333333333} by common string "444444444444444" and save file. File.txt: 03000003111111111111111 ... (3 Replies)
Discussion started by: smalode
3 Replies

4. UNIX for Dummies Questions & Answers

replacing a string with another string in a txt file

Dear all, I have a file like below. I want to replace all the '.' in the 3rd column with 'NA'. I don't know how to do that. Anyone has an iead? Thanks a lot! 8 70003200 21.6206 9 70005700 17.5064 10 70002200 . 11 70005100 19.1001 17 70008000 16.1970 32 70012400 26.3465 33... (9 Replies)
Discussion started by: forevertl
9 Replies

5. Shell Programming and Scripting

Help replacing string

Help! I'm trying this command but keep getting illegal syntax etc. awk '{ sub(/00012345/,"000123456"); print}' >newfile I don't understand. It works on one unix machine but not another! (4 Replies)
Discussion started by: Grueben
4 Replies

6. Shell Programming and Scripting

replacing a string in multiple subdirs to a new string??

I have following set of dirs: /dir1/dir2/subdir1 file1 file2 /dir1/dir3/subdir1 file4 file5 /dir1/dir4/subdir1 file6 file7 All of these files have a common string in them say "STRING1", How can I... (3 Replies)
Discussion started by: Hangman2
3 Replies

7. UNIX for Dummies Questions & Answers

Replacing string

Hi there, I'd like to replace STRING_ZERO in FILE_ZERO.txt with the value of VALUEi-th by using something like that: VALUE1=1000 VALUE2=2000 VALUE3=3000 for((i=1;i<=3;i++)); do sed "s/STRING_ZERO/$VALUE'$i'/" FILE_ZERO.txt >> FILE_NEW.txt; done but it doesn't work... Any help... (9 Replies)
Discussion started by: Giordano Bruno
9 Replies

8. Shell Programming and Scripting

Replacing Text in Text file

Hi Guys, I am needing some help writing a shell script to replace the following in a text file /opt/was/apps/was61 with some other path eg /usr/blan/blah/blah. I know that i can do it using sed or perl but just having difficulty writing the escape characters for it All Help... (3 Replies)
Discussion started by: cgilchrist
3 Replies

9. Shell Programming and Scripting

string replacing

hii, i need a unix command which replaces all occurrences of a substring within a string with another substring. My solution: string="plalstalplal" sub1="al" sub2="mlkl" echo sed 's/$s2/$s3/g' < s1 > p i want to know how to read the variables s2 and s3.. thaks a lot bye (1 Reply)
Discussion started by: priya_9patil
1 Replies

10. Shell Programming and Scripting

Replacing text

I was using the following code to replace the path names and it works when it is echo "$PWD/$f" | sed -e 's/^.*chris\.domain\.com/chris.domain.com/' IN fact it works great However I tried to incorporate a variable echo "$PWD/$f" | sed -e... (3 Replies)
Discussion started by: chrchcol
3 Replies
Login or Register to Ask a Question