![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| multiple smaller files from one large file | rtroscianecki | UNIX for Dummies Questions & Answers | 2 | 07-15-2009 10:25 PM |
| Break a file into separate files | chiru_h | Shell Programming and Scripting | 8 | 07-29-2008 03:06 AM |
| how to divide single large log file into multiple files. | kamleshm | Shell Programming and Scripting | 1 | 01-15-2008 07:33 PM |
| compare two .dat files and if there is any difference pulled into a separate file | kirankumar | Shell Programming and Scripting | 1 | 04-19-2006 02:13 AM |
| Need to split a large data file using a Unix script | SAIK | HP-UX | 1 | 03-29-2006 04:05 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Using AWK to separate data from a large XML file into multiple files
I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically
Code:
<FMPXMLRESULT>
<METADATA>
<FIELD att="............." id="..."/>
</METADATA>
<RESULTSET FOUND="1763457">
<ROW att="....." etc="....">
<COL>.....etc....</COL>
</ROW>
<ROW att="....." etc="....">
<COL>.....etc....</COL>
</ROW>
<ROW att="....." etc="....">
<COL>.....etc....</COL>
</ROW>
</RESULTSET>
</FMPXMLRESULT>
1. I'd like to generate an XML file that just contains everything within the < METADATA > nodes (the < FIELD > nodes) and I'll name it fields.xml 2.Then I'd like to generate an XML for each individual < ROW > node, and incrementally name each row1.xml, row2.xml, etc... I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following: Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml
Can anyone help me out with these issues? What am I doing wrong? Your help is very much appreciated. |
|
||||
|
Try this awk code.
Code:
/<METADATA>/ {
getline
while ( $0 !~ /<\/METADATA>/ ) {
print > "fields.xml"
getline
}
count=1
nextline
}
/<ROW/ {
rfile="row" count ".xml"
getline
while ($0 !~ "<\/ROW" ) {
print > rfile
getline
}
close(rfile)
count++
nextline
}
|
|
||||
|
Thanks for the quick reply, when I try those:
Code:
awk '/<METADATA>/ {
getline
while ( $0 !~ /<\/METADATA>/ ) {
print > "fields.xml"
getline
}
count=1
nextline
}' db.xml
Code:
awk '/<ROW/ {
rfile="row" count ".xml"
getline
while ($0 !~ "<\/ROW" ) {
print > rfile
getline
}
close(rfile)
count++
nextline
}' db.xml
|
|
||||
|
Sorry I'm a complete AWK beginner, I've been programming for about 8 years, but only learned of AWK about an hour before I posted.
Let me make sure I understand everything completely, this is what I'm trying step by step, please correct me where I'm wrong: 1. I have my working directory, in it I have db.xml file 2. I create a file called split.awk inside my working directory, in it I put the file contents: Code:
/<METADATA>/ {
getline
while ( $0 !~ /<\/METADATA>/ ) {
print > "fields.xml"
getline
}
count=1
nextline
}
/<ROW/ {
rfile="row" count ".xml"
getline
while ($0 !~ "<\/ROW" ) {
print > rfile
getline
}
close(rfile)
count++
nextline
}
Code:
awk -f split.awk db.xml Again, sorry for being such a beginner -- now that I know AWK exists, I plan to purchase a few books on and dive into how I can apply in my day-to-day programming. Thank you! |
|
||||
|
what is the exact output of awk? This seems to happen mostly when there are invisible characters introduced to the awk file during the copy of the text to the .awk file. And make sure all the files are readable and the directory is writable by the account you use to open the terminal window...
|
|
||||
|
Quote:
Code:
awk: can't open file split.awk
source line number 1 source file split.awk
context is
>>> <<<
Quote:
Thanks again to both of you for your help so far. |
![]() |
| Bookmarks |
| Tags |
| awk, large, xml |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|