Using AWK to separate data from a large XML file into multiple files


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 10-16-2009
Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically

Code:
<FMPXMLRESULT>
  <METADATA>
   <FIELD att="............." id="..."/>
  </METADATA>
  <RESULTSET FOUND="1763457">
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
  </RESULTSET>
</FMPXMLRESULT>

There are two things I need to get out of that file:

1. I'd like to generate an XML file that just contains everything within the < METADATA > nodes (the < FIELD > nodes) and I'll name it fields.xml

2.Then I'd like to generate an XML for each individual < ROW > node, and incrementally name each row1.xml, row2.xml, etc...


I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following:
Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml

Which produces a syntax error at line 1 when executed.


Can anyone help me out with these issues? What am I doing wrong?

Your help is very much appreciated.
# 2  
Old 10-17-2009
Try this awk code.

Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}

# 3  
Old 10-17-2009
Thanks for the quick reply, when I try those:

Code:
awk '/<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}' db.xml

and

Code:
awk '/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}' db.xml

I get an illegal statement error, am I doing something wrong? Thank you so much for the help so far!
# 4  
Old 10-17-2009
It wasn't designed to be used as separate clauses.. put the whole thing in a file and use the -f switch.
# 5  
Old 10-17-2009
Sorry I'm a complete AWK beginner, I've been programming for about 8 years, but only learned of AWK about an hour before I posted.

Let me make sure I understand everything completely, this is what I'm trying step by step, please correct me where I'm wrong:

1. I have my working directory, in it I have db.xml file
2. I create a file called split.awk inside my working directory, in it I put the file contents:
Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}

3. I open up terminal, cd to my working directory and then execute:
Code:
awk -f split.awk db.xml

When I execute that, I just get an error saying awk can't find the file.

Again, sorry for being such a beginner -- now that I know AWK exists, I plan to purchase a few books on and dive into how I can apply in my day-to-day programming.

Thank you!
# 6  
Old 10-17-2009
what is the exact output of awk? This seems to happen mostly when there are invisible characters introduced to the awk file during the copy of the text to the .awk file. And make sure all the files are readable and the directory is writable by the account you use to open the terminal window...
# 7  
Old 10-17-2009
Quote:
Originally Posted by JRy
I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following:
Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml

Which produces a syntax error at line 1 when executed
Code:
awk '/<ROW/{close("row"c".xml");c++}c{f="row"c".xml";print $0 > f}' file

Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Split large xml into mutiple files and with header and footer in file karthik Shell Programming and Scripting 36 19 Hours Ago 01:33 AM
awk - Multiple files - 1 file with multi-line data chill3chee Shell Programming and Scripting 1 05-11-2016 04:34 PM
Splitting a single xml file into multiple xml files Narendra921631 Shell Programming and Scripting 3 03-03-2016 03:50 PM
Process multiple large files with awk camor Shell Programming and Scripting 4 12-06-2015 08:50 PM
create separate files from one excel file with multiple sheets harris Shell Programming and Scripting 3 10-05-2012 03:38 AM
Split large zone file dump into multiple files Bluemerlin Shell Programming and Scripting 7 12-21-2011 09:15 AM
How to split a data file into separate files with the file names depending upon a column's value? nithins007 Shell Programming and Scripting 2 09-25-2011 08:38 AM
awk to match multiple regex and create separate output files heecha UNIX for Dummies Questions & Answers 2 08-04-2011 11:46 AM
Splitting large file into multiple files in unix based on pattern jimmy12 Shell Programming and Scripting 19 07-06-2011 04:14 AM
awk - splitting 1 large file into multiple based on same key records kam66 Shell Programming and Scripting 6 01-19-2011 07:55 PM
Using AWK: Extract data from multiple files and output to multiple new files Liverpaul09 UNIX for Dummies Questions & Answers 3 10-12-2010 04:59 AM
handling multiple files using awk command and wants to get separate out file for each hema dhevi Shell Programming and Scripting 2 05-13-2010 05:11 AM
multiple smaller files from one large file rtroscianecki UNIX for Dummies Questions & Answers 2 07-15-2009 11:25 PM
sed or awk to extract data from Xml file yeclota Shell Programming and Scripting 1 03-12-2009 09:15 AM
how to divide single large log file into multiple files. kamleshm Shell Programming and Scripting 1 01-15-2008 07:33 PM