Extract selective block from XML file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract selective block from XML file
# 1  
Old 01-02-2011
Extract selective block from XML file

Hi,

There's an xml file produced from a front-end tool as shown below:

Code:
 
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="DIM_TABLE" OWNERNAME ="RAGS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="DEPT_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_DEPT_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_DEPT_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
 
.... so on


I want to extract only one of these blocks based on the NAME for e.g. If the input is "FACT_TABLE" then only the 1st block OR lines within this block should be selected that is...

Code:
 
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>

OR
Code:
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>

Note: I cannot use any XML parser tool as I don't permission to download it in my office. Also the code would be a part of Korn shell script. Please suggest.

-dips
# 2  
Old 01-02-2011
Like this?
Code:
awk '/NAME ="FACT_TABLE"/,/<\/INPUT>/' infile

Code:
v='"FACT_TABLE"'; sed -n "/NAME =$v/,/<\/INPUT>/p" infile


Last edited by Scrutinizer; 01-02-2011 at 04:52 PM..
# 3  
Old 01-02-2011
Because I like PERL:
Code:
use strict;
use warnings;
use Getopt::Long;
use File::Basename;

my $NAME = basename $0, '.pl';
my $name = '';

Getopt::Long::Configure(qw{ bundling no_ignore_case });

undef $/;

$\ = "\n";
$, = "\n";

unless (GetOptions('name|n=s' => \$name)) {
    print STDERR "USAGE: $NAME [--name=<NAME> | -n <name> ] files...";
    exit 1;
}

while (<>) {
    my @M = m{<INPUT[^>]*NAME\s*=\s*"$name"[^>]*>\s*(.*?)\s*</INPUT>}gs;
    print @M if 0 < @M;
}

Use:
Code:
./scriptname --name=DIM_TABLE inputfile
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="DEPT_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_DEPT_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_DEPT_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>

# 4  
Old 01-04-2011
Hi Scrutinizer,
I tried both of your commands and they work partially i.e. it selects other blocks with different tags and omits the same tag block

The whole file would look like NOTE: I have shown all the tags but shrunk the no. of lines (In the earlier post I showed only an excerpt from the XML file, so may be you didn't get the idea how the whole file would look......I thought it should suffice but I was mistaken!!)

Code:
 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE POWERHOUSE SYSTEM >
<DATAMART NAME="DEVELOPMENT" VERSION="1" CODEPAGE="Latin1" DATABASETYPE="Oracle">
<FOLDER NAME="CUSTOMER_USAGE" GROUP="" OWNER="Administrator" SHARED="NOTSHARED" DESCRIPTION="" PERMISSIONS="rwx---r--" UUID="acb98070232-agaiojhawv9780">
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
        <INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
        <INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="DIM_TABLE" OWNERNAME ="RAGS">
        <INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="DEPT_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
        <INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_DEPT_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_DEPT_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<OUTPUT DATABASE ="Oracle" DESCRIPTION ="TABLE STRUCTURE FOR ERROR TABLE" NAME ="ERROR_TABLE" OBJECTVERSION ="1" TABLEOPTIONS ="" VERSIONNUMBER ="1">
        <OUTPUTFIELD DATATYPE ="date" DESCRIPTION ="" FIELDNUMBER ="1" KEYTYPE ="NOT A KEY" NAME ="TIME_KEY" NULLABLE ="NULL" PICTURETEXT ="" PRECISION ="19" SCALE ="0"/>
        <OUTPUTFIELD DATATYPE ="number(p,s)" DESCRIPTION ="" FIELDNUMBER ="2" KEYTYPE ="NOT A KEY" NAME ="ERROR_KEY" NULLABLE ="NULL" PICTURETEXT ="" PRECISION ="15" SCALE ="0"/>
</OUTPUT>
<TRANSFER DESCRIPTION ="Calculates customer's spending in a store. This is a FACT_TABLE" NAME ="CAL_CUST_SPEND" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="9">
            <TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="CUSTOMER_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
            <TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</TRANSFER>
<TRANSFER DESCRIPTION ="Calculates store's inventory" NAME ="LKP_FACT_TABLE" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="9">
            <TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
            <TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</TRANSFER>
<TRANSFER DESCRIPTION ="Tags errors" NAME ="ERROR_TYPE" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="9">
            <TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="ERROR_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
            <TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="ERROR_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="ERROR_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</TRANSFER>
<INSTANCE DESCRIPTION ="This component pulls all the records from the FACT_TABLE" NAME ="FACT_TABLE" REUSABLE ="NO" INPUT_NAME ="FACT_TABLE" INPUT_TYPE ="Lookup Procedure" TYPE ="INPUT"/>
<INSTANCE DESCRIPTION ="This component pulls all the records from the DIM_TABLE" NAME ="DIM_TABLE" REUSABLE ="NO" INPUT_NAME ="DIM_TABLE" INPUT_TYPE ="Source Qualifier" TYPE ="INPUT">
<ASSOCIATED_INSTANCE NAME ="FACT_TABLE"/>
<ASSOCIATED_INSTANCE NAME ="ANOTHER_TABLE"/>
        </INSTANCE>
<CONNECTOR FROMFIELD ="FROM_FIELD_1" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="FACT_TABLE" TOINSTANCETYPE ="Target Definition"/>
        <CONNECTOR FROMFIELD ="FROM_FIELD_2" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="ANOTHER_TABLE" TOINSTANCETYPE ="Target Definition"/>
<TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="MAIN_TABLE"/>
        <TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="ERROR_TABLE"/>
        <ERPINFO/>
        <METADATAEXTENSION COMPONENTVERSION ="1111111" DATATYPE ="STRING" DESCRIPTION ="name" DOMAIN ="User Defined Metadata Domain" ISEDITABLE ="YES" ISVISIBLE ="YES" ISREUSABLE ="YES" ISSHAREREAD ="NO" ISSHAREWRITE ="NO" MAXLENGTH ="256" NAME ="kk" VALUE ="" VENDORNAME ="ABC"/>
    </MAP>
</FOLDER>
</DATAMART>
</POWERHOUSE>

Results into
Code:
 
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
        <INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
        <INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<INSTANCE DESCRIPTION ="This component pulls all the records from the FACT_TABLE" NAME ="FACT_TABLE" REUSABLE ="NO" INPUT_NAME ="FACT_TABLE" INPUT_TYPE ="Lookup Procedure" TYPE ="INPUT"/>
<INSTANCE DESCRIPTION ="This component pulls all the records from the DIM_TABLE" NAME ="DIM_TABLE" REUSABLE ="NO" INPUT_NAME ="DIM_TABLE" INPUT_TYPE ="Source Qualifier" TYPE ="INPUT">
<ASSOCIATED_INSTANCE NAME ="FACT_TABLE"/>
<ASSOCIATED_INSTANCE NAME ="ANOTHER_TABLE"/>
        </INSTANCE>
<CONNECTOR FROMFIELD ="FROM_FIELD_1" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="FACT_TABLE" TOINSTANCETYPE ="Target Definition"/>
        <CONNECTOR FROMFIELD ="FROM_FIELD_2" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="ANOTHER_TABLE" TOINSTANCETYPE ="Target Definition"/>
<TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="MAIN_TABLE"/>
        <TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="ERROR_TABLE"/>
        <ERPINFO/>
        <METADATAEXTENSION COMPONENTVERSION ="1111111" DATATYPE ="STRING" DESCRIPTION ="name" DOMAIN ="User Defined Metadata Domain" ISEDITABLE ="YES" ISVISIBLE ="YES" ISREUSABLE ="YES" ISSHAREREAD ="NO" ISSHAREWRITE ="NO" MAXLENGTH ="256" NAME ="kk" VALUE ="" VENDORNAME ="ABC"/>
    </MAP>
</FOLDER>
</DATAMART>
</POWERHOUSE>

Please suggest how to select only the lines enclosed in tags <INPUT> & <\INPUT> and having NAME as "FACT_TABLE".
-dips
# 5  
Old 01-04-2011
Try /<INPUT.*NAME ="FACT_TABLE"/ instead of /NAME ="FACT_TABLE"/
This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 01-04-2011
Hi Scrutinizer,

Neat. It worked perfectly fine!! Thankyou.

This kind of "awk" construct means that the lines between FROM KEYWORD & TO KEYWORD be selected. Am I right?
Code:
 
awk '/FROM KEYWORD/,/TO KEYWORD/' infile

-dips
# 7  
Old 01-04-2011
Yes, it is called a "range pattern"
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract XML block when value is matched (Shell script)

Hi everyone, So i'm struggling with an xml (log file) where we get information about some devices, so the logfile is filled with multiple "blocks" like that. Based on the <devId> i want to extract this part of the xml file. If possible I want it to have an script for this, cause we'll use... (5 Replies)
Discussion started by: Pouky
5 Replies

2. Shell Programming and Scripting

Extract a value from an xml file

I have this XML file format and all in one line: Fri Dec 23 00:14:52 2016 Logged Message:689|<?xml version="1.0" encoding="UTF-8"?><PORT_RESPONSE><HEADER><ORIGINATOR>XMG</ORIGINATOR><DESTINAT... (16 Replies)
Discussion started by: mrn6430
16 Replies

3. Shell Programming and Scripting

How can I extract XML block around matching search string?

I want to extract XML block surrounding search string Ex: print XML block for string "myapp1-ear" surrounded by "<application> .. </application>" Input XML: <?xml version="1.0" encoding="UTF-8"?> <deployment-request> <requestor> <first-name>kchinnam</first-name> ... (16 Replies)
Discussion started by: kchinnam
16 Replies

4. UNIX for Dummies Questions & Answers

Extract Element from XML file

<?xml version = '1.0' encoding =... (8 Replies)
Discussion started by: Siva SQL
8 Replies

5. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

6. Shell Programming and Scripting

Get extract text from xml file

Hi Collegue, i have a file say a.xml. it has contents <bpelFault><faultType>1</faultType><genericSystemFault xmlns=""><part name="payload"><v2:Fault... (10 Replies)
Discussion started by: Jewel
10 Replies

7. Shell Programming and Scripting

Extract values from an XML File

Hi, I need to capture all the attributes with delete next to it. The source XML file is attached. The output should contain something like this below: Attributes = legacyExchangeDN Action = Delete Username = Hero Joker Loginid = joker09 OU =... (4 Replies)
Discussion started by: prvnrk
4 Replies

8. Shell Programming and Scripting

Extract details from XML file

Hi , I have one xml file contains more than 60 lines. I need to extract some details from the file and store it in new file.Not the whole file Please find the xml file below: <?xml version="1.0" encoding="UTF-8"?> <DeploymentDescriptors xmlns="http://www.tibco.com/xmlns/dd"> ... (6 Replies)
Discussion started by: ckchelladurai
6 Replies

9. Shell Programming and Scripting

extract a number within an xml file

Hi Everyone, I have an sh script that I am working on and I have run into a little snag that I am hoping someone here can assist me with. I am using wget to retrieve an xml file from thetvdb.com. This part works ok but what I need to be able to do is extract the series ID # from the xml and put... (10 Replies)
Discussion started by: tret
10 Replies

10. Shell Programming and Scripting

How to extract text from xml file

I have some xml files that got created by exporting a website from RedDot. I would like to extract the cost, course number, description, and meeting information. <?xml version="1.0" encoding="UTF-16" standalone="yes" ?> - <PAG PAG0="3AE6FCFD86D34896A82FCA3B7B76FF90" PAG3="525312"... (3 Replies)
Discussion started by: chrisf
3 Replies
Login or Register to Ask a Question