How to Split a file -- so that each file has N number of Blocks?

Login or Register to Ask a Question and Join Our Community

How to Split a file -- so that each file has N number of Blocks?

Tags

awk, shell scripts, solved

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting How to Split a file -- so that each file has N number of Blocks?

10-07-2015

Registered User

11, 0

Join Date: Sep 2015

Last Activity: 19 November 2015, 9:44 PM EST

Posts: 11

Thanks Given: 1

Thanked 0 Times in 0 Posts

How to Split a file -- so that each file has N number of Blocks?

Using Linux ,trying to come up with a shell script to automate below but not able to

I have a input XML file (XML.txt) with over 200,00 XML blocks, I need to inject this XML file into an application queue for processing, but due to resource contraints I will need to split them up so that each file only contains 50 XML blocks.

EVERY XML block begins with text [MESSAGE BEGIN] as FIRST LINE and ends with text [MESSAGE END] as LAST LINE , number of lines in each block can vary.

Basically, I want to split file XML.txt into N number of files XML1.txt , XML2.txt, XML3.txt.....XMLn.txt , where each of these files contains maximum 50 XML blocks (i.e from [MESSAGE BEGIN] to [MESSAGE END])

example of an XML block :

Code:

[MESSAGE BEGIN]
  <Tag1>....
  <Tag2>.....
   ......
  <Tagn>
[MESSAGE END]

alldbest

View Public Profile for alldbest

Find all posts by alldbest

10-07-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Any attempts from your side?

---------- Post updated at 16:56 ---------- Previous update was at 16:55 ----------

Howsoever, try

Code:

awk '/MESSAGE BEGIN/ {if (!(LC++%BLOCKS)) {if (OF) close (OF); OF="XML" ++FC ".txt"}} {print $0 > OF}' BLOCKS=50 file

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-07-2015

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

As a side note - if the total number of blocks does not divide evenly by 50, then the last file of the splits will have fewer blocks in it. The remainder of (total blocks) / 50.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

10-08-2015

Registered User

2,202, 340

Join Date: Apr 2007

Last Activity: 10 May 2020, 8:59 AM EDT

Location: 44.21.48N 80.50.15W

Posts: 2,202

Thanks Given: 3

Thanked 340 Times in 306 Posts

Or if you prefer a script:

Code:

count=0; export count                         
file=1; export file                           
while read line                               
do                                            
        echo "$line" >>file$file.txt          
        if [ "$line" = "[MESSAGE END]" ]      
        then                                  
                count=`expr $count + 1`       
                if [ $count -eq 50 ]          
                then                          
                        file=`expr $file + 1` 
                        count=0               
                fi                            
        fi                                    
done

jgt

View Public Profile for jgt

Visit jgt's homepage!

Find all posts by jgt

10-13-2015

Registered User

11, 0

Join Date: Sep 2015

Last Activity: 19 November 2015, 9:44 PM EST

Posts: 11

Thanks Given: 1

Thanked 0 Times in 0 Posts

Solution from RuDiC worked , thak you everyone

alldbest

View Public Profile for alldbest

Find all posts by alldbest

10-19-2015

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

I like awk, but I don't like to continually create one-off scripts. We have enough of this kind of data at our shop that we looked for a general approach to collecting (grouping, bundling) lines so that we could use the standard *nix utilities to manipulate the groups.

However, such utilities are not easily found. We did find one that is mentioned below, but we wanted a few extra features, so we wrote our own.

Using either one of those commands, we pipe the result into standard utility spilt to obtain 2 groups per file, like so:

Code:

#!/usr/bin/env bash

# @(#) s3	Demonstrate collection of blocks into separate files, cat0par, masuli.
# For cat0par, see:
# https://github.com/jakobi/script-archive/blob/master/cli.list.grep/cat0par
# Verified: Fri Oct  9 13:26:13 CDT 2015

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C cat0par masuli tr

# Remove debris from previous runs.
rm -f x*

FILE=${1-data1}
N=50
N=2
if [ $# -gt 1 ]
then
  shift
  N=${1}
fi

pl " Input data file $FILE:"
cat $FILE

pl " Results, splitting into groups of $N:"
# 200000/50 -> 4000
# cat0par -nonl='@' -start '^\[MESSAGE BEGIN\]' $FILE |
masuli -m=',^\[MESSAGE BEGIN,' -r='@' -g='\n' $FILE |
tee f1 |
split --lines="$N"
pe
pe " Files created by split:"
ls x*

pl " Sample of split files, content = $N:"
head xaa

pl " Transformation back into separate lines, xaa:"
rm -f t1
tr -d '\n' < xaa |
tr '@' '\n' > t1
mv t1 xaa
cat xaa

exit 0

producing:

Code:

$ ./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
cat0par (local) 1.3
masuli (local) 1.18
tr (GNU coreutils) 6.10

-----
 Input data file data1:
[MESSAGE BEGIN]
  <First>....
  <Tag2>.....
   ......
  <Tagn>
[MESSAGE END]
[MESSAGE BEGIN]
  <second>....
   ......
  <Tagn>
[MESSAGE END]
[MESSAGE BEGIN]
  <third>....
  <Tag2>.....
  <Tag3>.....
  <Tagn>
[MESSAGE END]
[MESSAGE BEGIN]
  <fourth>....
  <Tag2>.....
  <Tag3>.....
  <Tag4>.....
  <Tagn>
[MESSAGE END]

-----
 Results, splitting into groups of 2:

 Files created by split:
xaa  xab

-----
 Sample of split files, content = 2:
[MESSAGE BEGIN]@  <First>....@  <Tag2>.....@   ......@  <Tagn>@[MESSAGE END]@
[MESSAGE BEGIN]@  <second>....@   ......@  <Tagn>@[MESSAGE END]@

-----
 Transformation back into separate lines, xaa:
[MESSAGE BEGIN]
  <First>....
  <Tag2>.....
   ......
  <Tagn>
[MESSAGE END]
[MESSAGE BEGIN]
  <second>....
   ......
  <Tagn>
[MESSAGE END]

In this demo, our masuli (make-super-lines) utility replaces all newlines with a "@", then tacks a newline at the end of a group. Thus split will capture 2 groups (of a variable number of lines in each group) to individual files.

Both utilities can place a NULL at the end of a group. This is generally ignored, but may be useful for the growing number of utilities that can process such "Z"-like records (e.g. xargs, GNU sort). This is a two-edged sword, the downside being that, in the case of split, each file needs to be post-processed, a time-consuming task. This could be probably be addressed by modifications to the utility.

Best wishes ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split File based on number of rows

Hi I have a requirement, where i will receive multiple files in a folder (say: /fol1/fol2/). There will be at least 14 to 16 files. The size of the files will different, some may be 80GB or 90GB, some may be less than 5 GB (and the size of the files are very unpredictable). But the names of the...

2. UNIX for Dummies Questions & Answers

Split file based on number of blank lines

Hello All , I have a file which needs to split based on the blank lines Name ABC Address London Age 32 (4 blank new line) Name DEF Address London Age 30 (4 blank new line) Name DEF Address London

3. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding...

4. UNIX for Dummies Questions & Answers

Split single file into n number of files

Hi, I am new to unix. we have a requirement here to split a single file into multiples files based on the number of people available for processing. So i tried my hand at writing some code as below. #!/bin/bash var1=`wc -l $filename` var2=$var1/$splitno split -l $var2 $1 Please help me...

5. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as...

6. Shell Programming and Scripting

Split file by number of words

Dear all I am trying to divide a file using the number of words as a condition. Alternatively, I would at least like to be able to retrieve the first x words of a given file. Any tips? Thanks in advance.

7. Shell Programming and Scripting

Scripting help: Split a file into equal number of lines.

Experts, I have a file datafile.txt that consists of 1732 Line, I want to split the file into equal number of lines with 10 file. (The last file can have 2 line extra to match 1732) Please advise how to do that, Thanks in advance..

8. Shell Programming and Scripting

Split File of Number with spaces

How do i split a variable of numbers with spaces... for example echo "100 100 100 100" > temp.txt as the values can always change in temp.txt, i think it will be feasible to split the numbers in accordance to column. How is it possible to make it into $a $b $c $d?

9. UNIX for Dummies Questions & Answers

split a file into a specified number of files

I have been googling on the 'split' unix command to see if it can split a large file into 'n' number of files. Can anyone spare an example or a code snippet? Thanks, - CB

10. Shell Programming and Scripting

Split File Based on Line Number Pattern

Hello all. Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need. My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to...

Login or Register to Ask a Question