Deleting words between tags


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Deleting words between tags
# 1  
Old 11-25-2009
MySQL Deleting words between tags

Hi !!!

I need to write a script(ksh) that deletes any character outside <start> tag and </start> from a file.

For eg:
Code:
$cat file.txt
<start>
ad
bd
</start>
as</start>
<start>
d
e
f
mb<start>mu
g
h
i
</start>
a
b
c
<start>
a
f
g
</start>

Then output should be::
Code:
<start>
ad
bd
</start>
<start>mu
g
h
i
</start>
<start>
a
f
g
</start>

Thanks...
# 2  
Old 11-25-2009
Code:
awk '/<start>/,/<\/start>/' file.txt

# 3  
Old 11-25-2009
It wont work... It'll giv the foll. output.
Code:
<start>
ad
bd
</start>
<start>
d
e
f
mb<start>mu
g
h
i
</start>
<start>
a
f
g
</start>


I guess, its not so simple...
# 4  
Old 11-25-2009
Code:
sed '/<start>/,/<\/start>/ d'

# 5  
Old 11-25-2009
Code:
local $/;
my $str=<DATA>;
my @tmp = split(/(?<=<\/start>)/,$str);
map {/(<start>(?!.*<start>).*<\/start>)/s;print $1,"\n";} @tmp;

__DATA__
<start>
ad
bd
</start>
as</start>
<start>
d
e
f
mb<start>mu
g
h
i
</start>
a
b
c
<start>
a
f
g
</start>

# 6  
Old 11-26-2009
In the OP's desired output this should be left out:
Code:
as</start>
<start>
d
e
f
mb

But what is the criterion here? That <start> should be ignored because there was a previous </start> that did not belong to a <start> before that? I think that would have to be clarified or it is going to be a long thread.
# 7  
Old 11-26-2009
I apologize for the late reply...
The rule here is :: every data lie between the tags <start> and </start>. Ignore all the other data...

For eg::
If we consider the following data::
<start>
ad
bd
</start>
as</start>
<start>
d
e
f
mb
<start>mu
g
h
i
</start>
a
b
c

<start>
a
f
g
</start>


The script should discard the data which appers in red(above)::

as</start> --> because this </start> does not have its <start>

<start>
d
e
f
mb --> these lines because </start> is not present.

a
b
c --> Also these lines because these does not lie inside <start> and </start>
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting a list of words from a text file

Hello, I have a list of words separated by spaces I am trying to delete from a text file, and I could not figure out what is the best way to do this. what I tried (does not work) : delete="password key number verify" arr=($delete) for i in arr { sed "s/\<${arr}\>]*//g" in.txt } >... (5 Replies)
Discussion started by: Hawk4520
5 Replies

2. Shell Programming and Scripting

Count words/lines between two tags using awk

Is there an efficient awk that can count the number of lines that occur in between two tags. For instance, consider the following text: <s> Hi PP - my VBD - name DT - is NN - . SENT . </s> <s> Her PP - name VBD - is DT - the NN - same WRT - . SENT - </s> I am interested to know... (4 Replies)
Discussion started by: owwow14
4 Replies

3. UNIX for Dummies Questions & Answers

Deleting words between every appearance of two words

Hi there, newbie there. I've been browsing the forums hoping to find a solution that answers a problem similar to what I need, but haven't had much luck. Any help would be greatly appreciated. Thanks! I need to delete a bunch of text between every appearance of two words in a really large file... (3 Replies)
Discussion started by: lendl
3 Replies

4. Shell Programming and Scripting

Gawk gensub, match capital words and lowercase words

Hi I have strings like these : Vengeance mitt Men Vengeance gloves Women Quatro Windstopper Etip gloves Quatro Windstopper Etip gloves Girls Thermobite hooded jacket Thermobite Triclimate snow jacket Boys Thermobite Triclimate snow jacket and I would like to get the lower case words at... (2 Replies)
Discussion started by: louisJ
2 Replies

5. UNIX for Dummies Questions & Answers

Deleting words and sorting

I have a file that looks some like this: I need to delete most of the information and sort the rest in such way that I get the following output file Any help will be greatly appreciated (3 Replies)
Discussion started by: Xterra
3 Replies

6. UNIX for Dummies Questions & Answers

deleting words in list with more than 2 identical adjacent characters

Morning Guys & Gals, I am trying to figure out a way to remove lines from a file that have more than 2 identical characters in sequence.. So if for instance the list would look like ; the output would be ; I can't seem to get my head around perl (among many other... (7 Replies)
Discussion started by: TAPE
7 Replies

7. Shell Programming and Scripting

delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends! Each line of my input file has this format: word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the... (2 Replies)
Discussion started by: mjomba
2 Replies

8. Shell Programming and Scripting

Need some help deleting words from a line which are not my "Keyword"

Hi, i'm currently new to scripting and need some help with my problem, so i'll jump right to it. I have a file containing text, the file is pretty big so for the sake of this i'll just say this is the text: John id number is abc34938 Grahams id number is pending id number abc64334 is Bob's ... (14 Replies)
Discussion started by: linuxkid
14 Replies

9. Shell Programming and Scripting

deleting blank line and row containing certain words in single sed command

Hi Is it possible to do the following in a single command /usr/xpg4/bin/sed -e '/rows selected/d' /aemu/CALLAUTO/callauto.txt > /aemu/CALLAUTO/callautonew.txt /usr/xpg4/bin/sed -e '/^$/d' /aemu/CALLAUTO/callautonew.txt > /aemu/CALLAUTO/callauto_new.txt exit (1 Reply)
Discussion started by: aemunathan
1 Replies

10. Shell Programming and Scripting

deleting symbols and characters between two words

Hi Please tell me how could i delete symbols, whitespaces, characters, words everything between two words in a line. Let my file is aaa BB ccc ddd eee FF kkk xxx 123456 BB 44^& iop FF 999 xxx uuu rrr BB hhh nnn FF 000 I want to delete everything comes in between BB and FF( deletion... (3 Replies)
Discussion started by: rish_max
3 Replies
Login or Register to Ask a Question