How to Split a file -- so that each file has N number of Blocks?
Using Linux ,trying to come up with a shell script to automate below but not able to
I have a input XML file (XML.txt) with over 200,00 XML blocks, I need to inject this XML file into an application queue for processing, but due to resource contraints I will need to split them up so that each file only contains 50 XML blocks.
EVERY XML block begins with text [MESSAGE BEGIN] as FIRST LINE and ends with text [MESSAGE END] as LAST LINE , number of lines in each block can vary.
Basically, I want to split file XML.txt into N number of files XML1.txt , XML2.txt, XML3.txt.....XMLn.txt , where each of these files contains maximum 50 XML blocks (i.e from [MESSAGE BEGIN] to [MESSAGE END])
As a side note - if the total number of blocks does not divide evenly by 50, then the last file of the splits will have fewer blocks in it. The remainder of (total blocks) / 50.
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
I like awk, but I don't like to continually create one-off scripts. We have enough of this kind of data at our shop that we looked for a general approach to collecting (grouping, bundling) lines so that we could use the standard *nix utilities to manipulate the groups.
However, such utilities are not easily found. We did find one that is mentioned below, but we wanted a few extra features, so we wrote our own.
Using either one of those commands, we pipe the result into standard utility spilt to obtain 2 groups per file, like so:
producing:
In this demo, our masuli (make-super-lines) utility replaces all newlines with a "@", then tacks a newline at the end of a group. Thus split will capture 2 groups (of a variable number of lines in each group) to individual files.
Both utilities can place a NULL at the end of a group. This is generally ignored, but may be useful for the growing number of utilities that can process such "Z"-like records (e.g. xargs, GNU sort). This is a two-edged sword, the downside being that, in the case of split, each file needs to be post-processed, a time-consuming task. This could be probably be addressed by modifications to the utility.
Hi
I have a requirement, where i will receive multiple files in a folder (say: /fol1/fol2/). There will be at least 14 to 16 files. The size of the files will different, some may be 80GB or 90GB, some may be less than 5 GB (and the size of the files are very unpredictable). But the names of the... (10 Replies)
Hello All ,
I have a file which needs to split based on the blank lines
Name ABC
Address London
Age 32
(4 blank new line)
Name DEF
Address London
Age 30
(4 blank new line)
Name DEF
Address London (8 Replies)
Hi
i have requirement like below
M <form_name> sdasadasdMklkM
D ......
D .....
M form_name> sdasadasdMklkM
D ......
D .....
D ......
D .....
M form_name> sdasadasdMklkM
D ......
M form_name> sdasadasdMklkM
i want split file based on line number by finding... (10 Replies)
Hi,
I am new to unix. we have a requirement here to split a single file into multiples files based on the number of people available for processing. So i tried my hand at writing some code as below.
#!/bin/bash
var1=`wc -l $filename`
var2=$var1/$splitno
split -l $var2 $1
Please help me... (6 Replies)
Hello,
I have a file like this:
FILE.TXT:
(define argc :: int)
(assert ( > argc 1))
(assert ( = argc 1))
<check>
#
(define c :: float)
(assert ( > c 0))
(assert ( = c 0))
<check>
#
now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Dear all
I am trying to divide a file using the number of words as a condition. Alternatively, I would at least like to be able to retrieve the first x words of a given file. Any tips?
Thanks in advance. (7 Replies)
Experts,
I have a file datafile.txt that consists of 1732 Line,
I want to split the file into equal number of lines with 10 file.
(The last file can have 2 line extra to match 1732)
Please advise how to do that,
Thanks in advance.. (2 Replies)
How do i split a variable of numbers with spaces... for example
echo "100 100 100 100" > temp.txt
as the values can always change in temp.txt, i think it will be feasible to split the numbers in accordance to column.
How is it possible to make it into $a $b $c $d? (3 Replies)
I have been googling on the 'split' unix command to see if it can split a large file into 'n' number of files. Can anyone spare an example or a code snippet?
Thanks,
- CB (2 Replies)
Hello all.
Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need.
My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to... (11 Replies)