Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

How to extract text from xml file

Shell Programming and Scripting


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 08-31-2007
chrisf chrisf is offline
Registered User
 
Join Date: Aug 2007
Last Activity: 6 March 2013, 9:17 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
How to extract text from xml file

I have some xml files that got created by exporting a website from RedDot. I would like to extract the cost,
course number, description, and meeting information.


<?xml version="1.0" encoding="UTF-16" standalone="yes" ?>
- <PAG PAG0="3AE6FCFD86D34896A82FCA3B7B76FF90" PAG3="525312" PAG7="38574.3936342593" PAG8="48E1DBCD03594F0E8CE93D9736BD5698" PAG9="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG11="39160.5590162037" PAG12="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG13="39160.5937384259" PAG14="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG15="" PAG16="" PAG17="0" PAG18="1" PAG19="48E1DBCD03594F0E8CE93D9736BD5698" PAG20="" PAG21="79EA41233D5F4B36B0BAC07286866783" PAG22="0" PAG23="0" PAG29="39160.5937384259" PAG30="0" PAG31="38574.3936342593" PAG32="0" PAG33="0">
- <IO_VAL>
<VAL VAL1="3AE6FCFD86D34896A82FCA3B7B76FF90" VAL2="2" VAL3="PAG" VAL4="Advanced HVAC Maintenance" VAL6="3AE6FCFD86D34896A82FCA3B7B76FF90" VAL7="0" VAL8="0" VAL9="38748.7126851852" VAL10="0" />
<VAL VAL1="B6FC365A81BA49F6B87D5F83A385FF50" VAL2="1" VAL3="PGE" VAL4="text" VAL6="B6FC365A81BA49F6B87D5F83A385FF50" VAL7="0" VAL8="0" VAL9="39160.5590046296" VAL10="0">$400<BR>$400</VAL>
<VAL VAL1="0DE7DBA40D9C4570AF7E1052369443CF" VAL2="1" VAL3="PGE" VAL4="text" VAL6="CE65E148437444F6BE216C8C6889B241" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL>
<VAL VAL1="6407D6626D1F448389C817DABD01C51F" VAL2="1" VAL3="PGE" VAL4="text" VAL6="6407D6626D1F448389C817DABD01C51F" VAL7="0" VAL8="0" VAL9="39160.3767361111" VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL>
<VAL VAL1="8B3B923981B346B499770E3DCA8230F0" VAL2="1" VAL3="PGE" VAL4="text" VAL6="D1E8B01771824275997556D439647E4E" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">S<BR>MW</VAL>
<VAL VAL1="BAA7472ACAD742E1A8BAED1FDABCE2E9" VAL2="1" VAL3="PGE" VAL4="text" VAL6="BAA7472ACAD742E1A8BAED1FDABCE2E9" VAL7="0" VAL8="0" VAL9="38755.6905902778" VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course.<EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL>
<VAL VAL1="D48131678F254EDF9D8ABDB2C13EDC6A" VAL2="1" VAL3="PGE" VAL4="text" VAL6="8B75B8517379488CBEBD4E55DBD76E7C" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">M<BR>M</VAL>
<VAL VAL1="E316E14FFDC94C4CBC856554ADF971C1" VAL2="1" VAL3="PGE" VAL4="text" VAL6="E316E14FFDC94C4CBC856554ADF971C1" VAL7="0" VAL8="0" VAL9="39160.3768287037" VAL10="0">*No class&nbsp;7/2-4</VAL>
<VAL VAL1="DF2EF049448F41A7AC18B4B71BA6F66D" VAL2="1" VAL3="PGE" VAL4="text" VAL6="467A8FEB25964EE2924BC3183C5FB424" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL>
</IO_VAL>
</PAG>


The text I would like to extract is from this area

VAL10="0">$400<BR>$400</VAL>
VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL>
VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL>
VAL10="0">S<BR>MW</VAL>
VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course. Course is held in Bldg. <EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL>
VAL10="0">M<BR>M</VAL>
VAL10="0">*No class&nbsp;7/2-4</VAL>
VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL>

I have AIX version 5. Any suggestions would be deeply appreciated.
Sponsored Links
    #2  
Old Unix and Linux 08-31-2007
Neo's Unix or Linux Image
Neo Neo is offline Forum Staff  
Administrator
 
Join Date: Sep 2000
Last Activity: 27 August 2016, 4:18 PM EDT
Location: Asia pacific region
Posts: 13,402
Thanks: 786
Thanked 1,047 Times in 491 Posts
PERL.

Try to write a problem in PERL
Sponsored Links
    #3  
Old Unix and Linux 09-01-2007
ghostdog74 ghostdog74 is offline
Registered User
 
Join Date: Sep 2006
Last Activity: 28 January 2015, 8:30 AM EST
Posts: 2,669
Thanks: 0
Thanked 18 Times in 18 Posts

Code:
awk '/VAL10="0">/ {	  
	  match($0,"VAL10=\"0\">")
	  v1start=RSTART
	  match($0,"</VAL>")
	  v2start=RSTART
	  print substr($0,v1start,v2start)
	}
' "file"

output:

Code:
# ./test.sh
VAL10="0">$400<BR>$400</VAL>
VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL>
VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL>
VAL10="0">S<BR>MW</VAL>
VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course.<EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL>
VAL10="0">M<BR>M</VAL>
VAL10="0">*No class&nbsp;7/2-4</VAL>
VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL>

    #4  
Old Unix and Linux 09-01-2007
chrisf chrisf is offline
Registered User
 
Join Date: Aug 2007
Last Activity: 6 March 2013, 9:17 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
That does the trick. Thank you so much for your help.
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
sed - extract text from xml file gioni Shell Programming and Scripting 2 02-10-2012 12:24 AM
extract part of text file waxo Shell Programming and Scripting 11 06-22-2010 06:18 PM
extract text from a file loperam Shell Programming and Scripting 2 06-18-2010 07:02 PM
How to extract a text portion from a file ejazs0 Shell Programming and Scripting 5 07-10-2009 02:18 AM
c program to extract text between two delimiters from some text file kukretiabhi13 Programming 7 12-03-2008 05:29 PM



All times are GMT -4. The time now is 06:25 PM.