Sponsored Content
Top Forums Shell Programming and Scripting Help with Splitting a Large XML file based on size AND tags Post 302907996 by Aviktheory11 on Thursday 3rd of July 2014 01:11:35 AM
Old 07-03-2014
Quote:
Originally Posted by Chubler_XL
Sorry I should have tried my code on more than 1 large URL as I have forgotten to reset the bytes variable please accept this updated version:

Code:
#!/bin/bash
export ORACLE_HOME=.........
export ORACLE_SID=...........
export PATH=........
. ./params        # contains the parameter sizelimit
...

if [ $(stat -c%s $FILE) -gt $sizelimit ]
then
    awk -v limit=$sizelimit '
        BEGIN { num=1 }
        {
          if ((bytes+=length)>limit) {
             close(FILENAME "." num)
             bytes=length
             num++
          }
          printf "%s%s",$0,RS > FILENAME "." num
        } ' RS="</URL>" $FILE
else
   echo "$FILE: already less than the limit of $sizelimit"
fi

Hi Chubler_XL,

I tried out your code snippet. There're a couple of observations that I made.

Firstly, the splitted files are being generated, but the sizelimit is not being considered as that in the parameter file, but the size of the initial file itself. For e.g., suppose the initial file ("output") was created with size 324010 bytes, whereas the parameter file specified the size limit of 10240 bytes. However there are two files created by the script, one is "output" (the initial file) with size 324010 bytes, and "output.2" with size 324017, both with the same data.

I guess there might be something amiss with the bytes variable assignment, but I'm not sure.

Secondly, just for my knowledge, is your script supposedly appending </URL> to the end of every file that gets generated?



Thanks,

- Avik
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

2. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

3. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

4. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

5. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

6. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

7. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

8. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

9. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

10. Shell Programming and Scripting

Issue splitting file based on XML tags

more a-d.txt1 <a-dets> <a-serv> <aserv>mymac14,mymac15:MYAPP:mydom:/web/domain/mydom/config <NMGR>:MYAPP:/web/bea_apps/perf/NMGR/NMGR1034 <a-rep-string> 11.12.10.01=192.10.00.26 10.20.18.10=192.10.00.27 </a-rep-string> </a-serv> <w-serv>... (2 Replies)
Discussion started by: mohtashims
2 Replies
DISKTAB(5)						      BSD File Formats Manual							DISKTAB(5)

NAME
disktab -- disk description file SYNOPSIS
#include <disktab.h> DESCRIPTION
disktab is a simple database which describes disk geometries and disk partition characteristics. It is used to initialize the disk label on the disk. The format is patterned after the termcap(5) terminal data base. Entries in disktab consist of a number of `:' separated fields. The first entry for each disk gives the names which are known for the disk, separated by `|' characters. The last name given should be a long name fully identifying the disk. The following list indicates the normal values stored for each disk entry. Name Type Description ty str Type of disk (e.g. removable, winchester) dt str Type of controller (e.g. SMD, ESDI, floppy) ns num Number of sectors per track nt num Number of tracks per cylinder nc num Total number of cylinders on the disk sc num Number of sectors per cylinder, ns*nt default su num Number of sectors per unit, sc*nc default se num Sector size in bytes, DEV_BSIZE default sf bool Controller supports bad144-style bad sector forwarding rm num Rotation speed, rpm, 3600 default sk num Sector skew per track, default 0 cs num Sector skew per cylinder, default 0 hs num Headswitch time, usec, default 0 ts num One-cylinder seek time, usec, default 0 il num Sector interleave (n:1), 1 default d[0-4] num Drive-type-dependent parameters bs num Boot block size, default BBSIZE sb num Superblock size, default SBSIZE ba num Block size for partition `a' (bytes) bd num Block size for partition `d' (bytes) be num Block size for partition `e' (bytes) bf num Block size for partition `f' (bytes) bg num Block size for partition `g' (bytes) bh num Block size for partition `h' (bytes) fa num Fragment size for partition `a' (bytes) fd num Fragment size for partition `d' (bytes) fe num Fragment size for partition `e' (bytes) ff num Fragment size for partition `f' (bytes) fg num Fragment size for partition `g' (bytes) fh num Fragment size for partition `h' (bytes) oa num Offset of partition `a' in sectors ob num Offset of partition `b' in sectors oc num Offset of partition `c' in sectors od num Offset of partition `d' in sectors oe num Offset of partition `e' in sectors of num Offset of partition `f' in sectors og num Offset of partition `g' in sectors oh num Offset of partition `h' in sectors pa num Size of partition `a' in sectors pb num Size of partition `b' in sectors pc num Size of partition `c' in sectors pd num Size of partition `d' in sectors pe num Size of partition `e' in sectors pf num Size of partition `f' in sectors pg num Size of partition `g' in sectors ph num Size of partition `h' in sectors ta str Partition type of partition `a' (4.2BSD filesystem, swap, etc) tb str Partition type of partition `b' tc str Partition type of partition `c' td str Partition type of partition `d' te str Partition type of partition `e' tf str Partition type of partition `f' tg str Partition type of partition `g' th str Partition type of partition `h' FILES
/etc/disktab SEE ALSO
getdiskbyname(3), disklabel(5), disklabel(8), newfs(8) HISTORY
The disktab description file appeared in 4.2BSD. BSD
June 5, 1993 BSD
All times are GMT -4. The time now is 04:08 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy