Using AWK to separate data from a large XML file into multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using AWK to separate data from a large XML file into multiple files
# 8  
Old 10-17-2009
Quote:
Originally Posted by jp2542a
what is the exact output of awk? This seems to happen mostly when there are invisible characters introduced to the awk file during the copy of the text to the .awk file. And make sure all the files are readable and the directory is writable by the account you use to open the terminal window...
The terminal prints out the following:
Code:
awk: can't open file split.awk
 source line number 1 source file split.awk
 context is
     >>>  <<<

Do I need to put in a full file path? I've already navigated to the directory within terminal, it's in the same directory as db.xml, which seems to get picked up fine.

Quote:
Originally Posted by danmero
Code:
awk '/<ROW/{close("row"c".xml");c++}c{f="row"c".xml";print $0 > f}' file

When I execute this, I just an exact copy of my original file with the number 1 appended to it, ex: db1.xml, but it's also a 500 MB file.


Thanks again to both of you for your help so far.
# 9  
Old 10-17-2009
Quote:
Originally Posted by JRy
When I execute this, I just an exact copy of my original file with the number 1 appended to it, ex: db1.xml, but it's also a 500 MB file.
I can't see where is the problem ... can you elaborate ?

Starting from your original data sample I get:
Code:
# awk '/<ROW/{close("row"c".xml");c++}c{f="row"c".xml";print $0 > f}' file
# ls
file            row1.xml        row2.xml        row3.xml

# 10  
Old 10-17-2009
what happens if you type the awk command in the window that you used to create the split.awk file? The error message suggests that you don't have read access to the split.awk file in the new terminal window....
# 11  
Old 10-17-2009
Quote:
Originally Posted by danmero
I can't see where is the problem ... can you elaborate ?

Starting from your original data sample I get:
Code:
# awk '/<ROW/{close("row"c".xml");c++}c{f="row"c".xml";print $0 > f}' file
# ls
file            row1.xml        row2.xml        row3.xml

Ah, I see the issue. If I test with the sample XML I provided in the original post, it works. But it's when I try to execute it on the actual XML export from FileMaker that it fails and just stops after creating one file, which contains all of the < ROW matches.
I think maybe it's because the FileMaker XML export is really crappy and has absolutely no formatting at all, there's no line breaks, no indenting, no nothing, all the instances of the < ROW > </ROW> node are all on the same line.

Would that be an issue and is there a way around it?


Quote:
Originally Posted by jp2542a
what happens if you type the awk command in the window that you used to create the split.awk file? The error message suggests that you don't have read access to the split.awk file in the new terminal window....
Same result, I also verified the standard OS X account (my account + the staff account + everyone) has both read/write access to split.awk.

** UPDATE ** so I tried using the touch command via terminal to create the file split.awk, then edited it's contents via CODA (still using the exact code you provided), and when I run the following command:
Code:
awk -f split.awk db.xml

I now get the following error:
Code:
awk: illegal statement
 input record number 1, file db.xml
 source line number 8


Last edited by JRy; 10-17-2009 at 03:36 AM..
# 12  
Old 10-17-2009
The only thing that I can think of is that split.awk contains more than just the script code. Invisible characters? I've tried this on both Centos and a windows (cygwin) system...
# 13  
Old 10-17-2009
Quote:
Originally Posted by jp2542a
The only thing that I can think of is that split.awk contains more than just the script code. Invisible characters? I've tried this on both Centos and a windows (cygwin) system...
Here's the exact contents of my file if you want to take a look:
share1t.com File Sharing | Download: split.awk
I still feel like I may be doing something wrong.

Thank you again for your help, I really do appreciate it. Smilie
# 14  
Old 10-17-2009
I downloaded the script onto both my my Centos (linux) system and my windows (cygwin) system. It worked fine on my Linux system. On the windows system, there were translation errors. There are hidden characters...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Split large xml into mutiple files and with header and footer in file tried below it splits unevenly and also i need help in adding header and footer command : csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1} sample xml <?xml version="1.0" encoding="UTF-8"?><Recipient>... (36 Replies)
Discussion started by: karthik
36 Replies

2. Shell Programming and Scripting

awk - Multiple files - 1 file with multi-line data

Greetings experts, Have 2 input files, of which 1 file has 1 record per line; in 2nd file, multiple lines constitute 1 record; Hence declared the RS=";" Now in the first file which ends with ";" at each line of the line; But \nis also being considered as part of the data due to which I am... (1 Reply)
Discussion started by: chill3chee
1 Replies

3. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

4. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a... (4 Replies)
Discussion started by: camor
4 Replies

5. Shell Programming and Scripting

create separate files from one excel file with multiple sheets

Hi, I have one requirement, create separate files (".csv") from one excel file(xlsx) with multiple sheets. These ".csv" files are my source files. So anybody please suggest me the process. Thanks in Advance. Regards, Harris (3 Replies)
Discussion started by: harris
3 Replies

6. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

7. UNIX for Dummies Questions & Answers

awk to match multiple regex and create separate output files

Howdy Folks, I have a list that looks like this: (file2.txt) AAA BBB CCC DDD and there are 24 of these short words. I am matching these patterns to another file with 755795 lines (file1.txt). I have this code for matching: awk -v f2=file2.txt ' BEGIN { while(... (2 Replies)
Discussion started by: heecha
2 Replies

8. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

9. Shell Programming and Scripting

handling multiple files using awk command and wants to get separate out file for each

hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk... (2 Replies)
Discussion started by: hema dhevi
2 Replies

10. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
Discussion started by: rtroscianecki
2 Replies
Login or Register to Ask a Question