Sponsored Content
Top Forums Shell Programming and Scripting Help with Splitting a Large XML file based on size AND tags Post 302907973 by Chubler_XL on Wednesday 2nd of July 2014 08:17:12 PM
Old 07-02-2014
how about this:

Code:
#!/bin/bash
export ORACLE_HOME=.........
export ORACLE_SID=...........
export PATH=........
. ./params        # contains the parameter sizelimit
...

if [ $(stat -c%s $FILE) -gt $sizelimit ]
then
    awk -v limit=$sizelimit '
        BEGIN { num=1 }
        {
          if ((bytes+=length)>limit) {
             close(FILENAME "." num)
             num++
          }
          printf "%s%s",$0,RS > FILENAME "." num
        } ' RS="</URL>" $FILE
else
   echo "$FILE: already less than the limit of $sizelimit"
fi

Just be careful awk and many other unix utilities have limits on the length of a single line you may be better off putting a newline character after each </URL>

---------- Post updated at 10:17 AM ---------- Previous update was at 10:06 AM ----------


Depending on your OS the stat command I used above may not be available. A much more portable (but possible less efficient) version would be:

Code:
if [ $(wc -c < $FILE) -gt $sizelimit ]


Last edited by Chubler_XL; 07-02-2014 at 09:19 PM.. Reason: close previous file to ensure awk openfile limit is not exceeded
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

2. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

3. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

4. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

5. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

6. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

7. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

8. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

9. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

10. Shell Programming and Scripting

Issue splitting file based on XML tags

more a-d.txt1 <a-dets> <a-serv> <aserv>mymac14,mymac15:MYAPP:mydom:/web/domain/mydom/config <NMGR>:MYAPP:/web/bea_apps/perf/NMGR/NMGR1034 <a-rep-string> 11.12.10.01=192.10.00.26 10.20.18.10=192.10.00.27 </a-rep-string> </a-serv> <w-serv>... (2 Replies)
Discussion started by: mohtashims
2 Replies
LOSETUP(8)						       MAINTENANCE COMMANDS							LOSETUP(8)

NAME
losetup - set up and control loop devices SYNOPSIS
Get info: losetup loopdev losetup -a losetup -j file [-o offset] Delete loop: losetup -d loopdev... Print name of first unused loop device: losetup -f Setup loop device: losetup [{-e|-E} encryption] [-o offset] [--sizelimit limit] [-p pfd] [-r] {-f[--show]|loopdev} file Resize loop device: losetup -c loopdev DESCRIPTION
losetup is used to associate loop devices with regular files or block devices, to detach loop devices and to query the status of a loop device. If only the loopdev argument is given, the status of the corresponding loop device is shown. Encryption It is possible to specify transfer functions (for encryption/decryption or other purposes) using one of the -E and -e options. There are two mechanisms to specify the desired encryption: by number and by name. If an encryption is specified by number then one has to make sure that the Linux kernel knows about the encryption with that number, probably by patching the kernel. Standard numbers that are always present are 0 (no encryption) and 1 (XOR encryption). When the cryptoloop module is loaded (or compiled in), it uses number 18. This cryptoloop module will take the name of an arbitrary encryption type and find the module that knows how to perform that encryption. OPTIONS
-a, --all show status of all loop devices -c, --set-capacity loopdev force loop driver to reread size of the file associated with the specified loop device -d, --detach loopdev... detach the file or device associated with the specified loop device(s) -e, -E, --encryption encryption_type enable data encryption with specified name or number -f, --find find the first unused loop device. If a file argument is present, use this device. Otherwise, print its name -h, --help print help -H, --phash hash_type Specify the password hash function. Valid values are: sha512(default), sha256, sha384, rmd160, none. -j, --associated file show status of all loop devices associated with given file -k, --keybits num set the number of bits to use in key to num. -o, --offset offset the data start is moved offset bytes into the specified file or device --sizelimit limit the data end is set to no more than sizelimit bytes after the data start -p, --pass-fd num read the passphrase from file descriptor with number num instead of from the terminal -r, --read-only setup read-only loop device --show print device name if the -f option and a file argument are present. The short form of this option (-s) is deprecated. This short form could be in collision with Loop-AES implementation where the same option is used for --sizelimit. -v, --verbose verbose mode RETURN VALUE
losetup returns 0 on success, nonzero on failure. When losetup displays the status of a loop device, it returns 1 if the device is not con- figured and 2 if an error occurred which prevented losetup from determining the status of the device. FILES
/dev/loop0, /dev/loop1, ... loop devices (major=7) EXAMPLE
If you are using the loadable module you must have the module loaded first with the command # modprobe loop Maybe also encryption modules are needed. # modprobe des # modprobe cryptoloop The following commands can be used as an example of using the loop device. # dd if=/dev/zero of=/file bs=1k count=100 # losetup -e des /dev/loop0 /file Password: Init (up to 16 hex digits): # mkfs -t ext2 /dev/loop0 100 # mount -t ext2 /dev/loop0 /mnt ... # umount /dev/loop0 # losetup -d /dev/loop0 If you are using the loadable module you may remove the module with the command # rmmod loop RESTRICTION
DES encryption is painfully slow. On the other hand, XOR is terribly weak. Both are insecure nowadays. Some ciphers may require a licence for you to be allowed to use them. Cryptoloop is deprecated in favor of dm-crypt. For more details see cryptsetup(8). AVAILABILITY
The losetup command is part of the util-linux-ng package and is available from ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/. Linux 2003-07-01 LOSETUP(8)
All times are GMT -4. The time now is 09:44 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy