Sponsored Content
Top Forums Shell Programming and Scripting Extract Matched Records from XML Post 302962415 by looney on Monday 14th of December 2015 10:51:45 AM
Old 12-14-2015
Extract Matched Records from XML

Hi All,

I have a requirement to extract para in XML file on the basis of another list file having specific parameters.
I will extract these para from XML and import in one scheduler tool.
file2
Code:
<FOLDER DATACENTER="ControlMserver" VERSION="800" PLATFORM="UNIX" FOLDER_NAME="SH_AP_INT_B01" MODIFIED="False" LAST_UPLOAD="20151202132638UTC" REAL_FOLDER_ID="193" TYPE="1" USED_BY_CODE="0">
        <JOB JOBISN="1" APPLICATION="SH_AP_INT_B01" SUB_APPLICATION="SH_AP_INT_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="JB_AP_ASSG_PRCE_PLAN_INT_B01" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Job" CYCLIC="0" NODEID="ser455" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151202" CHANGE_TIME="185636" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="3" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_INT_B01">
            <VARIABLE NAME="%%PARM1" VALUE="wf_AP_INT_ASSG_PRCE_PLAN_INT_INS" />
            <VARIABLE NAME="%%PARM2" VALUE="ETL" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_PRD_PRDCTLGREL_REF_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="JB_AP_ASSG_PRCE_PLAN_INT_B01-OK" ODATE="ODAT" SIGN="+" />
        </JOB>
        <JOB JOBISN="2" APPLICATION="SH_AP_INT_B01" SUB_APPLICATION="SH_AP_INT_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="JB_AP_ASSG_PROD_INT_B01" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Job" CYCLIC="0" NODEID="ser455" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151202" CHANGE_TIME="185636" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="3" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_INT_B01">
            <VARIABLE NAME="%%PARM1" VALUE="merge_ap_assg_prod_int.sql" />
            <VARIABLE NAME="%%PARM2" VALUE="SQL" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_PRODCTLG_SH_PRD_REF_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_AP_TBCMMT_TERM_LKP_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="JB_AP_ASSG_PROD_INT_B01-OK" ODATE="ODAT" SIGN="+" />
        </JOB>
        <JOB JOBISN="3" APPLICATION="SH_AP_INT_B01" SUB_APPLICATION="SH_AP_INT_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="SH_AP_INT_B01_END" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Dummy" CYCLIC="0" NODEID="ser455" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151202" CHANGE_TIME="185636" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="3" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_INT_B01">
            <VARIABLE NAME="%%PARM1" VALUE="merge_ap_assg_prod_int.sql" />
            <VARIABLE NAME="%%PARM2" VALUE="SQL" />
            <VARIABLE NAME="%%$BMCWAIORIGTYPE" VALUE="Job" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_AP_ASSG_PRCE_PLAN_INT_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_AP_ASSG_PROD_INT_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="SH_AP_INT_B01_END-OK" ODATE="ODAT" SIGN="+" />
        </JOB>
    </FOLDER>
	<FOLDER DATACENTER="ControlMserver" VERSION="800" PLATFORM="UNIX" FOLDER_NAME="SH_AP_RI_B01" MODIFIED="False" LAST_UPLOAD="20151203033754UTC" REAL_FOLDER_ID="194" TYPE="1" USED_BY_CODE="0">
        <JOB JOBISN="1" APPLICATION="SH_AP_RI_B01" SUB_APPLICATION="SH_AP_RI_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="JB_AP_CUST_KEY_ASSG_PROD_RI_B01" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Job" CYCLIC="0" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151203" CHANGE_TIME="090753" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="4" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_RI_B01">
            <VARIABLE NAME="%%PARM1" VALUE="ri_ap_cust_key_assg_prod.sql" />
            <VARIABLE NAME="%%PARM2" VALUE="SQL" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_AP_ASSG_PROD_INT_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="JB_AP_CUST_KEY_ASSG_PROD_RI_B01-OK" ODATE="ODAT" SIGN="+" />
        </JOB>
	</FOLDER>

In above xml data, <FOLDER will create new Folder having name in variable like FOLDER_NAME="SH_AP_INT_B01", within these folder there are many jobs. Each job will start in <JOB.....and end by... </JOB>.. I don't want all Jobs in xml , so I will compare it with list in another file, Only jobs those are present in another file should be in XML.
Name of each job is having variable (eg.) JOBNAME="JB_AP_ASSG_PROD_INT_B01". List of job name is like file1 .
Code:
<JOB JOBNAME="JB_AP_ASSG_PROD_INT_B01"
<JOB JOBNAME="JB_AP_ASSG_PRCE_PLAN_INT_B01"

I tried it like below
Code:
awk 'NR==FNR{a[$1];next} $5 in a { print "<JOB"$0}' RS="<JOB" file1 file2

Since i have to take RS="<JOB", so i have added this in front of every Job name in list file
But the problem is , it is not taking Folder name in it.

Final data i want like below.
Code:
<FOLDER DATACENTER="ControlMserver" VERSION="800" PLATFORM="UNIX" FOLDER_NAME="SH_AP_INT_B01" MODIFIED="False" LAST_UPLOAD="20151202132638UTC" REAL_FOLDER_ID="193" TYPE="1" USED_BY_CODE="0">
        <JOB JOBISN="1" APPLICATION="SH_AP_INT_B01" SUB_APPLICATION="SH_AP_INT_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="JB_AP_ASSG_PRCE_PLAN_INT_B01" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Job" CYCLIC="0" NODEID="ser455" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151202" CHANGE_TIME="185636" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="3" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_INT_B01">
            <VARIABLE NAME="%%PARM1" VALUE="wf_AP_INT_ASSG_PRCE_PLAN_INT_INS" />
            <VARIABLE NAME="%%PARM2" VALUE="ETL" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_PRD_PRDCTLGREL_REF_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="JB_AP_ASSG_PRCE_PLAN_INT_B01-OK" ODATE="ODAT" SIGN="+" />
        </JOB>
        <JOB JOBISN="2" APPLICATION="SH_AP_INT_B01" SUB_APPLICATION="SH_AP_INT_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="JB_AP_ASSG_PROD_INT_B01" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Job" CYCLIC="0" NODEID="ser455" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151202" CHANGE_TIME="185636" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="3" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_INT_B01">
            <VARIABLE NAME="%%PARM1" VALUE="merge_ap_assg_prod_int.sql" />
            <VARIABLE NAME="%%PARM2" VALUE="SQL" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_PRODCTLG_SH_PRD_REF_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_AP_TBCMMT_TERM_LKP_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="JB_AP_ASSG_PROD_INT_B01-OK" ODATE="ODAT" SIGN="+" />
        </JOB>
</FOLDER>


but i am getting like below. Remember i want folder starting <FOLDER and closing with </FOLDER>. There are many folders, and every folder having many jobs.
Code:
<JOB JOBISN="1" APPLICATION="SH_AP_INT_B01" SUB_APPLICATION="SH_AP_INT_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="JB_AP_ASSG_PRCE_PLAN_INT_B01" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Job" CYCLIC="0" NODEID="ser455" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151202" CHANGE_TIME="185636" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="3" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_INT_B01">
            <VARIABLE NAME="%%PARM1" VALUE="wf_AP_INT_ASSG_PRCE_PLAN_INT_INS" />
            <VARIABLE NAME="%%PARM2" VALUE="ETL" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_PRD_PRDCTLGREL_REF_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="JB_AP_ASSG_PRCE_PLAN_INT_B01-OK" ODATE="ODAT" SIGN="+" />
        </JOB>

<JOB JOBISN="2" APPLICATION="SH_AP_INT_B01" SUB_APPLICATION="SH_AP_INT_B01" MEMNAME="ods_script_etl_wrapper.ksh" JOBNAME="JB_AP_ASSG_PROD_INT_B01" CREATED_BY="xyz" RUN_AS="abc" CRITICAL="0" TASKTYPE="Job" CYCLIC="0" NODEID="ser455" INTERVAL="00001M" MEMLIB="/c/bin/" CONFIRM="0" RETRO="0" MAXWAIT="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" DAYS="ALL" JAN="1" FEB="1" MAR="1" APR="1" MAY="1" JUN="1" JUL="1" AUG="1" SEP="1" OCT="1" NOV="1" DEC="1" DAYS_AND_OR="O" SHIFT="Ignore Job" SHIFTNUM="+00" SYSDB="1" IND_CYCLIC="S" CREATION_USER="emuser" CREATION_DATE="20151120" CREATION_TIME="160829" CHANGE_USERID="user" CHANGE_DATE="20151202" CHANGE_TIME="185636" RULE_BASED_CALENDAR_RELATIONSHIP="O" APPL_TYPE="OS" MULTY_AGENT="N" USE_INSTREAM_JCL="N" VERSION_OPCODE="N" IS_CURRENT_VERSION="Y" VERSION_SERIAL="3" VERSION_HOST="PIGAUTAM02" CYCLIC_TOLERANCE="0" CYCLIC_TYPE="C" PARENT_FOLDER="SH_AP_INT_B01">
            <VARIABLE NAME="%%PARM1" VALUE="merge_ap_assg_prod_int.sql" />
            <VARIABLE NAME="%%PARM2" VALUE="SQL" />
            <INCOND NAME="SH_CYCLE_AUDIT_START_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_PRODCTLG_SH_PRD_REF_B01-OK" ODATE="ODAT" AND_OR="A" />
            <INCOND NAME="JB_AP_TBCMMT_TERM_LKP_B01-OK" ODATE="ODAT" AND_OR="A" />
            <OUTCOND NAME="JB_AP_ASSG_PROD_INT_B01-OK" ODATE="ODAT" SIGN="+" />
        </JOB>

Kindly help me on this.

Last edited by looney; 12-14-2015 at 12:07 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

regex/shell script to Parse through XML Records

Hi All, I have been working on something that doesn't seem to have a clear regex solution and I just wanted to run it by everyone to see if I could get some insight into the method of solving this problem. I have a flat text file that contains billing records for users, however the records... (5 Replies)
Discussion started by: Jerrad
5 Replies

2. Shell Programming and Scripting

Grep matched records from huge file

111111111100000000001111111111 123232323200000010001114545454 232435424200000000001232131212 342354234301000000002323423443 232435424200000000001232131212 2390898994200000000001238908092 This is the record format. From 11th position to 20th position in a record there are 0's occuring,and... (6 Replies)
Discussion started by: mjkreddy
6 Replies

3. Shell Programming and Scripting

Extract records from list

Hi Gents, I have a file 1 like this 1 1000 20 2 2000 30 3 1000 40 5 1000 50 And I have other file 1 like 2 1 I would like to get from the file 1 the complete line which are in file 2, the key to compare is the column 2 then output should be. 2 2000 30. I was trying to get it... (5 Replies)
Discussion started by: jiam912
5 Replies

4. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

5. Shell Programming and Scripting

Parse xml in shell script and extract records with specific condition

Hi I have xml file with multiple records and would like to extract records from xml with specific condition if specific tag is present extract entire row otherwise skip . <logentry revision="21510"> <author>mantest</author> <date>2015-02-27</date> <QC_ID>334566</QC_ID>... (12 Replies)
Discussion started by: madankumar.t@hp
12 Replies

6. Shell Programming and Scripting

Extract strings from XML files and create a new XML

Hello everybody, I have a double mission with some XML files, which is pretty challenging for my actual beginner UNIX knowledge. I need to extract some strings from multiple XML files and create a new XML file with the searched strings.. The original XML files contain the source code for... (12 Replies)
Discussion started by: milano.churchil
12 Replies

7. Shell Programming and Scripting

Extract all the sentences that matched two patterns

Hi I have two lists of patterns named A and B consisting of around 200 entries in each and I want to extract all the sentences from a big text file which match atleast one pattern from both A and B. For example, pattern list A consists of : ama ani ahum mari ... ... and pattern... (1 Reply)
Discussion started by: my_Perl
1 Replies

8. Shell Programming and Scripting

How to fetch matched records from files between two different directory?

awk 'NR==FNR{arr;next} $0 in arr' /tmp/Data_mismatch.sh /prd/HK/ACCTCARD_20160115.txt edit by bakunin: seems that one CODE-tag got lost somewhere. i corrected that, but please check your posts more carefully. Thank you. (5 Replies)
Discussion started by: suresh_target
5 Replies

9. Shell Programming and Scripting

Extract between two Exact matched strings.

more data.txt i need this exacted from data.txt This is the command i tried sed -n "/Start_of_DISK_info:\/u/,/End_of_DISK_info:\/u/p" data.txtBut, unfortunately it does not do an exact match. Instead, it prints text between both these strings /u & /u/tmp like below. i need this... (6 Replies)
Discussion started by: mohtashims
6 Replies

10. UNIX for Beginners Questions & Answers

Extract XML block when value is matched (Shell script)

Hi everyone, So i'm struggling with an xml (log file) where we get information about some devices, so the logfile is filled with multiple "blocks" like that. Based on the <devId> i want to extract this part of the xml file. If possible I want it to have an script for this, cause we'll use... (5 Replies)
Discussion started by: Pouky
5 Replies
All times are GMT -4. The time now is 01:22 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy