![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| c program to extract text between two delimiters from some text file | kukretiabhi13 | High Level Programming | 7 | 12-03-2008 06:29 PM |
| how to extract columns from a text file | ihot | Shell Programming and Scripting | 16 | 05-05-2008 11:33 PM |
| Extract text in 2 columns of output file. | Danish Shakil | Shell Programming and Scripting | 2 | 10-19-2007 10:03 AM |
| extract some specific text file urgent pls | reyazan | UNIX for Dummies Questions & Answers | 2 | 10-20-2005 09:36 AM |
| How to extract data from a text file | negixx | Shell Programming and Scripting | 1 | 07-19-2005 09:30 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
How to extract text from xml file
I have some xml files that got created by exporting a website from RedDot. I would like to extract the cost,
course number, description, and meeting information. <?xml version="1.0" encoding="UTF-16" standalone="yes" ?> - <PAG PAG0="3AE6FCFD86D34896A82FCA3B7B76FF90" PAG3="525312" PAG7="38574.3936342593" PAG8="48E1DBCD03594F0E8CE93D9736BD5698" PAG9="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG11="39160.5590162037" PAG12="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG13="39160.5937384259" PAG14="C8E8FB21EE5343FEBA77C040EF1C9BFC" PAG15="" PAG16="" PAG17="0" PAG18="1" PAG19="48E1DBCD03594F0E8CE93D9736BD5698" PAG20="" PAG21="79EA41233D5F4B36B0BAC07286866783" PAG22="0" PAG23="0" PAG29="39160.5937384259" PAG30="0" PAG31="38574.3936342593" PAG32="0" PAG33="0"> - <IO_VAL> <VAL VAL1="3AE6FCFD86D34896A82FCA3B7B76FF90" VAL2="2" VAL3="PAG" VAL4="Advanced HVAC Maintenance" VAL6="3AE6FCFD86D34896A82FCA3B7B76FF90" VAL7="0" VAL8="0" VAL9="38748.7126851852" VAL10="0" /> <VAL VAL1="B6FC365A81BA49F6B87D5F83A385FF50" VAL2="1" VAL3="PGE" VAL4="text" VAL6="B6FC365A81BA49F6B87D5F83A385FF50" VAL7="0" VAL8="0" VAL9="39160.5590046296" VAL10="0">$400<BR>$400</VAL> <VAL VAL1="0DE7DBA40D9C4570AF7E1052369443CF" VAL2="1" VAL3="PGE" VAL4="text" VAL6="CE65E148437444F6BE216C8C6889B241" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL> <VAL VAL1="6407D6626D1F448389C817DABD01C51F" VAL2="1" VAL3="PGE" VAL4="text" VAL6="6407D6626D1F448389C817DABD01C51F" VAL7="0" VAL8="0" VAL9="39160.3767361111" VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL> <VAL VAL1="8B3B923981B346B499770E3DCA8230F0" VAL2="1" VAL3="PGE" VAL4="text" VAL6="D1E8B01771824275997556D439647E4E" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">S<BR>MW</VAL> <VAL VAL1="BAA7472ACAD742E1A8BAED1FDABCE2E9" VAL2="1" VAL3="PGE" VAL4="text" VAL6="BAA7472ACAD742E1A8BAED1FDABCE2E9" VAL7="0" VAL8="0" VAL9="38755.6905902778" VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course.<EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL> <VAL VAL1="D48131678F254EDF9D8ABDB2C13EDC6A" VAL2="1" VAL3="PGE" VAL4="text" VAL6="8B75B8517379488CBEBD4E55DBD76E7C" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">M<BR>M</VAL> <VAL VAL1="E316E14FFDC94C4CBC856554ADF971C1" VAL2="1" VAL3="PGE" VAL4="text" VAL6="E316E14FFDC94C4CBC856554ADF971C1" VAL7="0" VAL8="0" VAL9="39160.3768287037" VAL10="0">*No class 7/2-4</VAL> <VAL VAL1="DF2EF049448F41A7AC18B4B71BA6F66D" VAL2="1" VAL3="PGE" VAL4="text" VAL6="467A8FEB25964EE2924BC3183C5FB424" VAL7="0" VAL8="0" VAL9="38574.3936342593" VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL> </IO_VAL> </PAG> The text I would like to extract is from this area VAL10="0">$400<BR>$400</VAL> VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL> VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL> VAL10="0">S<BR>MW</VAL> VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course. Course is held in Bldg. <EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL> VAL10="0">M<BR>M</VAL> VAL10="0">*No class 7/2-4</VAL> VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL> I have AIX version 5. Any suggestions would be deeply appreciated. |
|
||||
|
Code:
awk '/VAL10="0">/ {
match($0,"VAL10=\"0\">")
v1start=RSTART
match($0,"</VAL>")
v2start=RSTART
print substr($0,v1start,v2start)
}
' "file"
Code:
# ./test.sh VAL10="0">$400<BR>$400</VAL> VAL10="0">XPOB 556-501<BR>XPOB 556-502</VAL> VAL10="0">6/2-8/4 <BR>6/4-7/11*</VAL> VAL10="0">S<BR>MW</VAL> VAL10="0">This 40-hour course expands upon the topics covered in the Basic HVAC Maintenance course.<EM>Prerequisite: Basic Heating and Air Conditioning Equipment Maintenance course or instructor approval required prior to registering.</EM> Books not included</VAL> VAL10="0">M<BR>M</VAL> VAL10="0">*No class 7/2-4</VAL> VAL10="0">8 a.m.-noon<BR>8 a.m.-noon</VAL> |
|
||||
|
That does the trick. Thank you so much for your help.
|
| Sponsored Links | ||
|
|