Sponsored Content
Top Forums Shell Programming and Scripting Help with Splitting a Large XML file based on size AND tags Post 302907870 by Aviktheory11 on Wednesday 2nd of July 2014 07:16:40 AM
Old 07-02-2014
Linux Help with Splitting a Large XML file based on size AND tags

Hi All,

This is my first post here. Hoping to share and gain knowledge from this great forum !!!!

I've scanned this forum before posting my problem here, but I'm afraid I couldn't find any thread that addresses this exact problem.

I'm trying to split a large XML file (with multiple tag sets) into smaller files of equal size so that the splitting doesn't happen between tags, i.e. I'm trying to have a complete tag set in a file. The size limit of the smaller files is specified in a parameter file. For example, if the size limit is 100 KB, and the Large file is 440 KB, I should have five smaller files of sizes 100 KB,100 KB,100 KB,100 KB and 40 KB.

My initial approach was to create the large file with all the complete tag sets in a single line each, and then to use the split function based on the size limit. However, the complete tag sets are not getting accommodated in single lines since the XMLs are itself Huge. So I was thinking of splitting the large file based on tags, as well as within the size limit.

Below is what I tried to do so far

Code:
#!/bin/bash
export ORACLE_HOME=.........
export ORACLE_SID=...........
export PATH=........
. ./params       # contains the parameter sizelimit
FILE="datafile.txt"
sqlplus -s userid/password@DB <<EOF
SET HEADING OFF
SET PAGESIZE 0
SET LINESIZE 32000
SET LONG 32000
SET NEWPAGE NONE
SET FEEDBACK OFF
SET TRIMSPOOL ON
SET DEFINE ON     
SET VERIFY OFF
SET SERVEROUTPUT OFF
SPOOL $FILE
[....query to create the master file...]
SPOOL OFF
EXIT
EOF
filesize= ls -l $FILE | awk '{print $5}'
#echo $filesize
#echo $sizelimit 
if ! echo "$filesize $sizelimit -p" | bc | grep  > /dev/null ^-;
then split -b $sizelimit $FILE part
else echo "less than the limit"
fi

This was the first attempt in using Split function. However, I don't think this can be used, given my criterion. Assuming the tag sets are like <URL>...</URL>, can anyone suggest any other way out?

Thanks a lot,

- Avik

Last edited by Scrutinizer; 07-02-2014 at 08:53 AM.. Reason: Changed ICODE to CODE tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

2. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

3. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

4. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

5. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

6. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

7. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

8. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

9. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

10. Shell Programming and Scripting

Issue splitting file based on XML tags

more a-d.txt1 <a-dets> <a-serv> <aserv>mymac14,mymac15:MYAPP:mydom:/web/domain/mydom/config <NMGR>:MYAPP:/web/bea_apps/perf/NMGR/NMGR1034 <a-rep-string> 11.12.10.01=192.10.00.26 10.20.18.10=192.10.00.27 </a-rep-string> </a-serv> <w-serv>... (2 Replies)
Discussion started by: mohtashims
2 Replies
tracker-tag(1)							   User Commands						    tracker-tag(1)

NAME
tracker-tag - Add, remove and list tags. SYNOPSIS
tracker-tag [OPTION...] FILE [FILE...] tracker-tag [OPTION...] -t [[TAG] [TAG] ...] DESCRIPTION
tracker-tag allows the caller add tags, remove tags and list tags by URN or to list all tags and the files associated with them. The FILE argument can be either a local path or a URI. It also does not have to be an absolute path. OPTIONS
-?, --help Show summary of options. -l, --limit=N Limit search to N results. The default is 512. -o, --offset=N Offset the search results by N. For example, start at item number 10 in the results. The default is 0. -r, --or-operator Use OR for search terms instead of AND (the default) -t, --list List all tags. Results include the number of files associated with that tag and the tag's unique identifier. You can show the files associated with each tag by using --show-files. The TAG arguments are optional. If no TAG argument is specified, all tags are listed. If one or more TAGs are given, all matching tags are listed. For example, this will match any tags named either foo, bar or baz: $ tracker-tag -t foo bar baz -s, --show-files Show the files associated with each tag. This option is ONLY available WITH the --list option. -a, --add=TAG Add a tag with the name TAG. If no FILE arguments are specified, the tag is simply created (if it didn'talready exist) and no files are associated with it. Multiple FILE arguments can be specified. -d, --delete=TAG Delete a tag with the name TAG. If no FILE arguments are specified, the tag is deleted for ALL files. If FILE arguments are speci- fied, only those files have the TAG deleted. -e, --description=STRING This option ONLY applies when using --add and provides a description to go with the tag label according to STRING. -V, --version Print version. SEE ALSO
tracker-store(1), tracker-sparql(1), tracker-search(1), tracker-info(1). GNU
July 2009 tracker-tag(1)
All times are GMT -4. The time now is 11:13 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy