Extract value inside <text> tag for a particular condition.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract value inside <text> tag for a particular condition.
# 1  
Old 12-21-2008
Extract value inside <text> tag for a particular condition.

Hi All!

I have obtained following output from a tool "pdftohtml" ::

So, my input is as under:

<text top="246" left="160" width="84" height="16" font="3">Business purpose</text>
<text top="260" left="506" width="220" height="16" font="3">giving the right information and new insights </text>
<text top="296" left="160" width="67" height="16" font="3">Characteristic</text>
<text top="296" left="278" width="111" height="16" font="3">Operational processing</text>
<text top="296" left="506" width="120" height="16" font="3">Informational processing</text>
<text top="318" left="160" width="55" height="16" font="3">Orientation</text>
<text top="318" left="278" width="56" height="16" font="3">Transaction</text>
<text top="318" left="506" width="42" height="16" font="3">Analysis</text>
<text top="340" left="160" width="43" height="16" font="3">Function</text>
------
----

Now, i want to write a shell script that checks the value of "left" attribute in in each <text> tag and if this value is equal to 160, it saves the content enclosed inside a particular <text> tag in an arbitrary file inside <p> tag.

So, i want output as follows:

<p>Business purpose</p>
<p>Characteristic</p>
<p>Orientation</p>
<p>Function</p>
------
-----

Any help will be Truly Appreciated. Thanks in advance !!!
# 2  
Old 12-21-2008
Something like this?

Code:
sed '/left="160"/s/\(.*>\)\(.*\)\(<.*\)/\1<p>\2<\/p>\3/' file > outfile

Regards
# 3  
Old 12-21-2008
Thanks Franklin. You Rock!!!
# 4  
Old 12-22-2008
hi,

you may use perl ASX to process it.

input:
Code:
<?xml version="1.0"?>
<data>
<text top="246" left="160" width="84" height="16" font="3">Business purpose</text>
<text top="260" left="506" width="220" height="16" font="3">giving the right information and new insights </text>
<text top="296" left="160" width="67" height="16" font="3">Characteristic</text>
<text top="296" left="278" width="111" height="16" font="3">Operational processing</text>
<text top="296" left="506" width="120" height="16" font="3">Informational processing</text>
<text top="318" left="160" width="55" height="16" font="3">Orientation</text>
<text top="318" left="278" width="56" height="16" font="3">Transaction</text>
<text top="318" left="506" width="42" height="16" font="3">Analysis</text>
<text top="340" left="160" width="43" height="16" font="3">Function</text>
</data>

code:

Code:
package Leo;
use XML::SAX::Base;
@ISA=qw(XML::SAX::Base);
sub start_document{
	my $self=shift;
	my $doc=shift;
}
sub start_element{
	my $self=shift;
	my $element=shift;   
	foreach my $key (keys %{$element->{Attributes}}){
		my $attr=$element->{Attributes}->{$key};
		$flag=1 if ($attr->{Name} eq "left" && $attr->{Value}==160);
	}
}
sub characters{
	my $self=shift;
	my $char=shift;
	if ($flag==1){
		print "<P>",$char->{Data},"</P>\n";
		$flag=0;
	}	
}
1

Code:
use XML::SAX;
use Leo;
$parser=XML::SAX::ParserFactory->parser(Handler=>Leo->new);
$parser->parse_uri("a.txt");

result:
Code:
you expectation

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replacing tag based on condition

Hi All, I am having a file like below. The file will having information about the records.If you see the file the file is header and data. For example it have 1 men tag and the tag id will be come after headers. The change is I want to convert All pets tag from P to X. I did a sed like below... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. Shell Programming and Scripting

Help with tag value extraction from xml file based on a matching condition

Hi , I have a situation where I need to search an xml file for the presence of a tag <FollowOnFrom> and also , presence of partial part of the following tag <ContractRequest _LoadId and if these 2 exist ,then extract the value from the following tag <_LocalId> which is "CW2094139". There... (2 Replies)
Discussion started by: paul1234
2 Replies

3. Shell Programming and Scripting

Help with XML tag value extraction based on condition

sample xml file part <?xml version="1.0" encoding="UTF-8"?><ContractWorkspace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _LoadId="export_AJ6iAFmh+pQHq1" xsi:noNamespaceSchemaLocation="ContractWorkspace.xsd"> <_LocalId>CW2218471</_LocalId> <Active>true</Active> ... (3 Replies)
Discussion started by: paul1234
3 Replies

4. Shell Programming and Scripting

Help with XML tag value extraction based on matching condition

sample xml file part <DocumentMinorVersion>0</DocumentMinorVersion> <DocumentVersion>1</DocumentVersion> <EffectiveDate>2017-05-30T00:00:00Z</EffectiveDate> <FollowOnFrom> <ContractRequest _LoadId="export_AJ6iAFoh6g0rE9"> <_LocalId>CRW2218451</_LocalId> ... (4 Replies)
Discussion started by: paul1234
4 Replies

5. Shell Programming and Scripting

How can i find texts inside a html tag using sed?

How can i find texts inside a html tag using sed? Html texts: What i tried: cat infile | sed -e 's/\(<kbd*\)\(.*\)\(kbd>\)/\2/ Expected result like this: sed -i -e 's/@colophon/@@colophon/' \ -e 's/doc@cygnus.com/doc@@cygnus.com/' bfd/doc/bfd.texinfo (5 Replies)
Discussion started by: cola
5 Replies

6. Shell Programming and Scripting

How to remove string inside html tag <a>

Does anybody know how i can remove string from <a> tag? There are several hundred posts in a few forums that need to be cleaned up. The precise situation is ---------- <a href="http://mydomain.com/cgi-bin/anyboard.cgi?fvp=/family/sexuality_and_spirituality/&cmd=rA&cG=43"> ------------- my... (6 Replies)
Discussion started by: georgi58
6 Replies

7. Shell Programming and Scripting

Replace text inside XML file based on condition

Hi All, I want to change the name as SEQ_13 ie., <Property Name="Name">SEQ_13</Property> when the Stage Type is PxSequentialFile ie., <Property Name="StageType">PxSequentialFile</Property> :wall: Input.XML <Main> <Record Identifier="V0S13" Type="CustomStage" Readonly="0">... (3 Replies)
Discussion started by: kmsekhar
3 Replies

8. Shell Programming and Scripting

extract xml tag based on condition

Hi All, I have a large xml file of invoices. The file looks like below: <INVOICES> <INVOICE> <NAME>Customer A</NAME> <INVOICE_NO>1234</INVOICE_NO> </INVOICE> <INVOICE> <NAME>Customer A</NAME> <INVOICE_NO>2345</INVOICE_NO> </INVOICE> <INVOICE> <NAME>Customer A</NAME>... (9 Replies)
Discussion started by: angshuman
9 Replies

9. Shell Programming and Scripting

Finding a string inside A Tag

I have umpteen number of files containing HTML A tags in the below format or I want to find all the lines that contain the word Login= I used this command grep "Login=" * This gave me normal lines as well which contain the word Login= for example, it returned lines which... (2 Replies)
Discussion started by: dahlia84
2 Replies

10. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
Login or Register to Ask a Question