How to extract text from xml file | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

How to extract text from xml file

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 08-31-2007
chrisf chrisf is offline
Registered User
 
Join Date: Aug 2007
Last Activity: 6 March 2013, 9:17 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
How to extract text from xml file

I have some xml files that got created by exporting a website from RedDot. I would like to extract the cost,
course number, description, and meeting information.


<?xml version="1.0" encoding="UTF-16" standalone="yes" ?>
- <PAG PAG0="3AE6FCFD86D34896A82FCA3B7B76FF90" PAG3="525312" PAG7="38574.3936342593" PAG8="48E1DBCD03594F0E8CE93D9736BD5698" PAG9="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG11="39160.5590162037" PAG12="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG13="39160.5937384259" PAG14="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG15="" PAG16="" PAG17="0" PAG18="1" PAG19="48E1DBCD03594F0E8CE93D9736BD5698" PAG20="" PAG21="79EA41233D5F4B36B0BAC07286866783" PAG22="0" PAG23="0" PAG29="39160.5937384259" PAG30="0" PAG31="38574.3936342593" PAG32="0" PAG33="0">
- <IO_VAL>
<VAL VAL1="3AE6FCFD86D34896A82FCA3B7B76FF90" VAL2="2" VAL3="PAG" VAL4="Advanced HVAC Maintenance" VAL6="3AE6FCFD86D34896A82FCA3B7B76FF90" VAL7="0" VAL8="0" VAL9="38748.7126851852" VAL10="0" />
<VAL VAL1="B6FC365A81BA49F6B87D5F83A385FF50" VAL2="1" VAL3="PGE" VAL4="text" VAL6="B6FC365A81BA49F6B87D5F83A385FF50" VAL7="0" VAL8="0" VAL9="39160.5590046296" VAL10="0">$400<BR>$400</VAL>
<VAL VAL1="0DE7DBA40D9C4570AF7E1052369443CF" VAL2="1" VAL3="PGE" VAL4="text" VAL6="CE65E148437444F6BE216C8C6889B241" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL>
<VAL VAL1="6407D6626D1F448389C817DABD01C51F" VAL2="1" VAL3="PGE" VAL4="text" VAL6="6407D6626D1F448389C817DABD01C51F" VAL7="0" VAL8="0" VAL9="39160.3767361111" VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL>
<VAL VAL1="8B3B923981B346B499770E3DCA8230F0" VAL2="1" VAL3="PGE" VAL4="text" VAL6="D1E8B01771824275997556D439647E4E" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">S<BR>MW</VAL>
<VAL VAL1="BAA7472ACAD742E1A8BAED1FDABCE2E9" VAL2="1" VAL3="PGE" VAL4="text" VAL6="BAA7472ACAD742E1A8BAED1FDABCE2E9" VAL7="0" VAL8="0" VAL9="38755.6905902778" VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course.<EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL>
<VAL VAL1="D48131678F254EDF9D8ABDB2C13EDC6A" VAL2="1" VAL3="PGE" VAL4="text" VAL6="8B75B8517379488CBEBD4E55DBD76E7C" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">M<BR>M</VAL>
<VAL VAL1="E316E14FFDC94C4CBC856554ADF971C1" VAL2="1" VAL3="PGE" VAL4="text" VAL6="E316E14FFDC94C4CBC856554ADF971C1" VAL7="0" VAL8="0" VAL9="39160.3768287037" VAL10="0">*No class&nbsp;7/2-4</VAL>
<VAL VAL1="DF2EF049448F41A7AC18B4B71BA6F66D" VAL2="1" VAL3="PGE" VAL4="text" VAL6="467A8FEB25964EE2924BC3183C5FB424" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL>
</IO_VAL>
</PAG>


The text I would like to extract is from this area

VAL10="0">$400<BR>$400</VAL>
VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL>
VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL>
VAL10="0">S<BR>MW</VAL>
VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course. Course is held in Bldg. <EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL>
VAL10="0">M<BR>M</VAL>
VAL10="0">*No class&nbsp;7/2-4</VAL>
VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL>

I have AIX version 5. Any suggestions would be deeply appreciated.
Sponsored Links
    #2  
Old 08-31-2007
Neo's Avatar
Neo Neo is online now Forum Staff  
Administrator
 
Join Date: Sep 2000
Last Activity: 31 July 2014, 3:17 AM EDT
Location: Asia pacific region
Posts: 13,007
Thanks: 516
Thanked 868 Times in 397 Posts
PERL.

Try to write a problem in PERL
Sponsored Links
    #3  
Old 09-01-2007
ghostdog74 ghostdog74 is offline
Registered User
 
Join Date: Sep 2006
Last Activity: 7 November 2013, 6:42 AM EST
Posts: 2,651
Thanks: 0
Thanked 17 Times in 17 Posts

Code:
awk '/VAL10="0">/ {	  
	  match($0,"VAL10=\"0\">")
	  v1start=RSTART
	  match($0,"</VAL>")
	  v2start=RSTART
	  print substr($0,v1start,v2start)
	}
' "file"

output:

Code:
# ./test.sh
VAL10="0">$400<BR>$400</VAL>
VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL>
VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL>
VAL10="0">S<BR>MW</VAL>
VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course.<EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL>
VAL10="0">M<BR>M</VAL>
VAL10="0">*No class&nbsp;7/2-4</VAL>
VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL>

    #4  
Old 09-01-2007
chrisf chrisf is offline
Registered User
 
Join Date: Aug 2007
Last Activity: 6 March 2013, 9:17 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
That does the trick. Thank you so much for your help.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
sed - extract text from xml file gioni Shell Programming and Scripting 2 02-10-2012 12:24 AM
extract part of text file waxo Shell Programming and Scripting 11 06-22-2010 06:18 PM
extract text from a file loperam Shell Programming and Scripting 2 06-18-2010 07:02 PM
How to extract a text portion from a file ejazs0 Shell Programming and Scripting 5 07-10-2009 02:18 AM
c program to extract text between two delimiters from some text file kukretiabhi13 Programming 7 12-03-2008 05:29 PM



All times are GMT -4. The time now is 03:23 AM.