The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
multiple smaller files from one large file rtroscianecki UNIX for Dummies Questions & Answers 2 07-15-2009 10:25 PM
Break a file into separate files chiru_h Shell Programming and Scripting 8 07-29-2008 03:06 AM
how to divide single large log file into multiple files. kamleshm Shell Programming and Scripting 1 01-15-2008 07:33 PM
compare two .dat files and if there is any difference pulled into a separate file kirankumar Shell Programming and Scripting 1 04-19-2006 02:13 AM
Need to split a large data file using a Unix script SAIK HP-UX 1 03-29-2006 04:05 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-16-2009
JRy JRy is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 8
Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically

Code:
<FMPXMLRESULT>
  <METADATA>
   <FIELD att="............." id="..."/>
  </METADATA>
  <RESULTSET FOUND="1763457">
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
  </RESULTSET>
</FMPXMLRESULT>
There are two things I need to get out of that file:

1. I'd like to generate an XML file that just contains everything within the < METADATA > nodes (the < FIELD > nodes) and I'll name it fields.xml

2.Then I'd like to generate an XML for each individual < ROW > node, and incrementally name each row1.xml, row2.xml, etc...


I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following:
Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml
Which produces a syntax error at line 1 when executed.


Can anyone help me out with these issues? What am I doing wrong?

Your help is very much appreciated.
  #2 (permalink)  
Old 10-17-2009
jp2542a jp2542a is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 142
Try this awk code.

Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}
  #3 (permalink)  
Old 10-17-2009
JRy JRy is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 8
Thanks for the quick reply, when I try those:

Code:
awk '/<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}' db.xml
and

Code:
awk '/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}' db.xml
I get an illegal statement error, am I doing something wrong? Thank you so much for the help so far!
  #4 (permalink)  
Old 10-17-2009
jp2542a jp2542a is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 142
It wasn't designed to be used as separate clauses.. put the whole thing in a file and use the -f switch.
  #5 (permalink)  
Old 10-17-2009
JRy JRy is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 8
Sorry I'm a complete AWK beginner, I've been programming for about 8 years, but only learned of AWK about an hour before I posted.

Let me make sure I understand everything completely, this is what I'm trying step by step, please correct me where I'm wrong:

1. I have my working directory, in it I have db.xml file
2. I create a file called split.awk inside my working directory, in it I put the file contents:
Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}
3. I open up terminal, cd to my working directory and then execute:
Code:
awk -f split.awk db.xml
When I execute that, I just get an error saying awk can't find the file.

Again, sorry for being such a beginner -- now that I know AWK exists, I plan to purchase a few books on and dive into how I can apply in my day-to-day programming.

Thank you!
  #6 (permalink)  
Old 10-17-2009
jp2542a jp2542a is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 142
what is the exact output of awk? This seems to happen mostly when there are invisible characters introduced to the awk file during the copy of the text to the .awk file. And make sure all the files are readable and the directory is writable by the account you use to open the terminal window...
  #7 (permalink)  
Old 10-17-2009
JRy JRy is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 8
Quote:
Originally Posted by jp2542a View Post
what is the exact output of awk? This seems to happen mostly when there are invisible characters introduced to the awk file during the copy of the text to the .awk file. And make sure all the files are readable and the directory is writable by the account you use to open the terminal window...
The terminal prints out the following:
Code:
awk: can't open file split.awk
 source line number 1 source file split.awk
 context is
     >>>  <<<
Do I need to put in a full file path? I've already navigated to the directory within terminal, it's in the same directory as db.xml, which seems to get picked up fine.

Quote:
Originally Posted by danmero View Post
Code:
awk '/<ROW/{close("row"c".xml");c++}c{f="row"c".xml";print $0 > f}' file
When I execute this, I just an exact copy of my original file with the number 1 appended to it, ex: db1.xml, but it's also a 500 MB file.


Thanks again to both of you for your help so far.
Reply

Bookmarks

Tags
awk, large, xml

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 06:07 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0