Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Using AWK to separate data from a large XML file into multiple files


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically

Code:
<FMPXMLRESULT>
  <METADATA>
   <FIELD att="............." id="..."/>
  </METADATA>
  <RESULTSET FOUND="1763457">
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
  </RESULTSET>
</FMPXMLRESULT>

There are two things I need to get out of that file:

1. I'd like to generate an XML file that just contains everything within the < METADATA > nodes (the < FIELD > nodes) and I'll name it fields.xml

2.Then I'd like to generate an XML for each individual < ROW > node, and incrementally name each row1.xml, row2.xml, etc...


I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following:
Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml

Which produces a syntax error at line 1 when executed.


Can anyone help me out with these issues? What am I doing wrong?

Your help is very much appreciated.
# 2  
Try this awk code.

Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}

# 3  
Thanks for the quick reply, when I try those:

Code:
awk '/<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}' db.xml

and

Code:
awk '/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}' db.xml

I get an illegal statement error, am I doing something wrong? Thank you so much for the help so far!
# 4  
It wasn't designed to be used as separate clauses.. put the whole thing in a file and use the -f switch.
# 5  
Sorry I'm a complete AWK beginner, I've been programming for about 8 years, but only learned of AWK about an hour before I posted.

Let me make sure I understand everything completely, this is what I'm trying step by step, please correct me where I'm wrong:

1. I have my working directory, in it I have db.xml file
2. I create a file called split.awk inside my working directory, in it I put the file contents:
Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}

3. I open up terminal, cd to my working directory and then execute:
Code:
awk -f split.awk db.xml

When I execute that, I just get an error saying awk can't find the file.

Again, sorry for being such a beginner -- now that I know AWK exists, I plan to purchase a few books on and dive into how I can apply in my day-to-day programming.

Thank you!
# 6  
what is the exact output of awk? This seems to happen mostly when there are invisible characters introduced to the awk file during the copy of the text to the .awk file. And make sure all the files are readable and the directory is writable by the account you use to open the terminal window...
# 7  
Quote:
Originally Posted by JRy
I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following:
Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml

Which produces a syntax error at line 1 when executed
Code:
awk '/<ROW/{close("row"c".xml");c++}c{f="row"c".xml";print $0 > f}' file

Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
awk - Multiple files - 1 file with multi-line data
chill3chee
Greetings experts, Have 2 input files, of which 1 file has 1 record per line; in 2nd file, multiple lines constitute 1 record; Hence declared the RS=";" Now in the first file which ends with ";" at each line of the line; But \nis also being considered as part of the data due to which I am...... Shell Programming and Scripting
1
Shell Programming and Scripting
Splitting a single xml file into multiple xml files
Narendra921631
Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual...... Shell Programming and Scripting
3
Shell Programming and Scripting
Process multiple large files with awk
camor
Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a...... Shell Programming and Scripting
4
Shell Programming and Scripting
awk to match multiple regex and create separate output files
heecha
Howdy Folks, I have a list that looks like this: (file2.txt) AAA BBB CCC DDD and there are 24 of these short words. I am matching these patterns to another file with 755795 lines (file1.txt). I have this code for matching: awk -v f2=file2.txt ' BEGIN { while(...... UNIX for Dummies Questions & Answers
2
UNIX for Dummies Questions & Answers
handling multiple files using awk command and wants to get separate out file for each
hema dhevi
hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk...... Shell Programming and Scripting
2
Shell Programming and Scripting

Featured Tech Videos