Using AWK to separate data from a large XML file into multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using AWK to separate data from a large XML file into multiple files
# 1  
Old 10-16-2009
Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically

Code:
<FMPXMLRESULT>
  <METADATA>
   <FIELD att="............." id="..."/>
  </METADATA>
  <RESULTSET FOUND="1763457">
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
   <ROW att="....." etc="....">
     <COL>.....etc....</COL>
   </ROW>
  </RESULTSET>
</FMPXMLRESULT>

There are two things I need to get out of that file:

1. I'd like to generate an XML file that just contains everything within the < METADATA > nodes (the < FIELD > nodes) and I'll name it fields.xml

2.Then I'd like to generate an XML for each individual < ROW > node, and incrementally name each row1.xml, row2.xml, etc...


I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following:
Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml

Which produces a syntax error at line 1 when executed.


Can anyone help me out with these issues? What am I doing wrong?

Your help is very much appreciated.
# 2  
Old 10-17-2009
Try this awk code.

Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}

# 3  
Old 10-17-2009
Thanks for the quick reply, when I try those:

Code:
awk '/<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}' db.xml

and

Code:
awk '/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}' db.xml

I get an illegal statement error, am I doing something wrong? Thank you so much for the help so far!
# 4  
Old 10-17-2009
It wasn't designed to be used as separate clauses.. put the whole thing in a file and use the -f switch.
# 5  
Old 10-17-2009
Sorry I'm a complete AWK beginner, I've been programming for about 8 years, but only learned of AWK about an hour before I posted.

Let me make sure I understand everything completely, this is what I'm trying step by step, please correct me where I'm wrong:

1. I have my working directory, in it I have db.xml file
2. I create a file called split.awk inside my working directory, in it I put the file contents:
Code:
 /<METADATA>/ {
        getline
        while ( $0 !~ /<\/METADATA>/ ) {
                print > "fields.xml"
                getline
        }
        count=1
        nextline
}

/<ROW/ {
        rfile="row" count ".xml"
        getline
        while ($0 !~ "<\/ROW" ) {
                print > rfile
                getline
        }
        close(rfile)
        count++
        nextline
}

3. I open up terminal, cd to my working directory and then execute:
Code:
awk -f split.awk db.xml

When I execute that, I just get an error saying awk can't find the file.

Again, sorry for being such a beginner -- now that I know AWK exists, I plan to purchase a few books on and dive into how I can apply in my day-to-day programming.

Thank you!
# 6  
Old 10-17-2009
what is the exact output of awk? This seems to happen mostly when there are invisible characters introduced to the awk file during the copy of the text to the .awk file. And make sure all the files are readable and the directory is writable by the account you use to open the terminal window...
# 7  
Old 10-17-2009
Quote:
Originally Posted by JRy
I'm using AWK via Terminal in OS X Leopard, I'm not sure how to go about item #1, but for #2 I tried the following:
Code:
awk '/<ROW/{close("row"c".xml");c++}{print $0 > "row"c".xml"}' db.xml

Which produces a syntax error at line 1 when executed
Code:
awk '/<ROW/{close("row"c".xml");c++}c{f="row"c".xml";print $0 > f}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Split large xml into mutiple files and with header and footer in file tried below it splits unevenly and also i need help in adding header and footer command : csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1} sample xml <?xml version="1.0" encoding="UTF-8"?><Recipient>... (36 Replies)
Discussion started by: karthik
36 Replies

2. Shell Programming and Scripting

awk - Multiple files - 1 file with multi-line data

Greetings experts, Have 2 input files, of which 1 file has 1 record per line; in 2nd file, multiple lines constitute 1 record; Hence declared the RS=";" Now in the first file which ends with ";" at each line of the line; But \nis also being considered as part of the data due to which I am... (1 Reply)
Discussion started by: chill3chee
1 Replies

3. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

4. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a... (4 Replies)
Discussion started by: camor
4 Replies

5. Shell Programming and Scripting

create separate files from one excel file with multiple sheets

Hi, I have one requirement, create separate files (".csv") from one excel file(xlsx) with multiple sheets. These ".csv" files are my source files. So anybody please suggest me the process. Thanks in Advance. Regards, Harris (3 Replies)
Discussion started by: harris
3 Replies

6. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

7. UNIX for Dummies Questions & Answers

awk to match multiple regex and create separate output files

Howdy Folks, I have a list that looks like this: (file2.txt) AAA BBB CCC DDD and there are 24 of these short words. I am matching these patterns to another file with 755795 lines (file1.txt). I have this code for matching: awk -v f2=file2.txt ' BEGIN { while(... (2 Replies)
Discussion started by: heecha
2 Replies

8. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

9. Shell Programming and Scripting

handling multiple files using awk command and wants to get separate out file for each

hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk... (2 Replies)
Discussion started by: hema dhevi
2 Replies

10. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
Discussion started by: rtroscianecki
2 Replies
Login or Register to Ask a Question