Sponsored Content
Full Discussion: Splitting files
Top Forums Shell Programming and Scripting Splitting files Post 302665921 by drl on Tuesday 3rd of July 2012 11:01:24 AM
Old 07-03-2012
Hi.

Here is a shell script that calls a perl script to split a file. The contents of the split results are not uniform in number, but they seem somewhat balanced. It works on small datasets like the sample, but whether it would work on a very large dataset is unknown -- for example, it may be limited by memory. This is a quickly written solution, and if you are curious, you'd need to look over the documentation and code, and / or contact the author the perl module Text::Parts:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate perl Text::Parts for splitting file.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
edges() { head -3 $1;pe "---";tail -3 $1; pe ; }
C=$HOME/bin/context && [ -f $C ] && $C

FILE=${1-data1}
pl " Edges of $(wc -l <$FILE) lines in input data file $FILE:"
# head -5 $FILE ; pe "---" ; tail -5 $FILE
edges $FILE

pl " Sample perl script to split file:"
cat p1

# Remove debris form previous runs, try various PARTS.
rm -f file*.txt
PARTS=6
PARTS=3
PARTS=10
pl " Splitting $FILE into $PARTS parts:"
./p1 $FILE $PARTS

pl " Files created:"
wc -l file*.txt

pl " Sample result, edges of part 1:"
edges file1.txt
pl " Sample result, edges of part $PARTS:"
edges file$PARTS.txt

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39

-----
 Edges of 80 lines in input data file data1:
1
2
3
---
78
79
80


-----
 Sample perl script to split file:
#!/usr/bin/env perl

# @(#) p1	Split file into n parts, Text::Parts.
# See:
# http://search.cpan.org/~ktat/Text-Parts-0.15/lib/Text/Parts.pm

use strict;
use warnings;
use Text::Parts;

my $file  = shift || die " Need a filename to split.\n";
my $parts = shift || "4";

my $splitter = Text::Parts->new( file => $file );

$splitter->write_files(
  'file%d.txt',
  num  => $parts,
  code => \&do_after_split
);

sub do_after_split {
  my ( $filename, $f );
  $filename = shift;    # e.g. 'path/to/name1.txt'
  open( $f, ">>", $filename ) || die "Cannot open $filename for append.\n";
  print $f "\n";
  close $f;
}

exit(0);

-----
 Splitting data1 into 10 parts:

-----
 Files created:
 11 file1.txt
  7 file10.txt
  8 file2.txt
  8 file3.txt
  8 file4.txt
  8 file5.txt
  8 file6.txt
  7 file7.txt
  8 file8.txt
  7 file9.txt
 80 total

-----
 Sample result, edges of part 1:
1
2
3
---
9
10
11


-----
 Sample result, edges of part 10:
74
75
76
---
78
79
80

Best wishes ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

splitting the files

Hi, I'am using HP-UX.I have a input file which has 102 drop statements in it.I'am using csplit to split the files.The upper limit is 99 only.I'am using the -n 102 option.It says "suffix size not vaild".Any suggestions how to do it using csplit? Thanx in advance, sounder. (1 Reply)
Discussion started by: sounder123
1 Replies

2. Shell Programming and Scripting

Splitting large files

Hi Unix gurus, We have a masterfile which is to be split into smallerfiles with names as masterfile00,masterfile01,masterfile03...etal I was able to split the file using the "Split" cmd but as masterfileaa,masterfileab.. Is it posiible to change the default suffix? or is there any other... (2 Replies)
Discussion started by: Rvbs
2 Replies

3. UNIX for Advanced & Expert Users

splitting the files

Hi, How can i split the big file by the lines?. For eg. I wanna split the file from the line 140 to 1700. (9 Replies)
Discussion started by: sharif
9 Replies

4. Shell Programming and Scripting

Splitting input files into multiple files through AWK command

Hi, I needs to split *.txt files from single directory depends on the some mutltiple input values. i have wrote the code like below for file in *.txt do grep -i -h "value1|value2" $file > $file; done. My requirment is more input values needs to be given in grep; let us say 50... (3 Replies)
Discussion started by: arund_01
3 Replies

5. UNIX for Dummies Questions & Answers

splitting the files

Hi, I have some files with 2 million odd records which i need to split into chunks of 0.5 millions. I have the file sorted with a key column in order. The same key value can appear as 4 or 5 records in the file. Hence after splitting we are checking whether all the key values are present in the... (5 Replies)
Discussion started by: dnat
5 Replies

6. Shell Programming and Scripting

Splitting files from one file

Hi, I have an input file like: 111 abcdefgh asdfghjk dfghjkl 222 aaaaaaa bbbbbb 333 djfhfgjktitjhgfkg 444 djdhfjkhfjkghjkfg hsbfjksdbhjkgherjklg fjkhfjklsahjgh fkrjkgnj I want to read this input file and make separate output files with the header as numric value like "111"... (9 Replies)
Discussion started by: saltysumi
9 Replies

7. UNIX for Dummies Questions & Answers

Splitting Files Help

Hi Gurus, Lets say i have a file with some 30 records... How can i split that file into 3 files Also it shud be dynamic in the sense.. I wouldnt mind if file 1 has 15, file 2 has 10 and file 3 has 5.... Please help.. Thanks (6 Replies)
Discussion started by: saggiboy10
6 Replies

8. Shell Programming and Scripting

Splitting files into 100 files with field value

I want a script to split my file upon the last field (15) As file A,b,c,.......,01 C,v,n,.......,02 C,r,v,........,01 F,s,a,........,03 X,y,d,........,99 To make output 01.txt A,b,c,.......,01 C,r,v,........,01 02.txt C,v,n,.......,02 (12 Replies)
Discussion started by: teefa
12 Replies

9. UNIX for Dummies Questions & Answers

Splitting log files

I am trying to split my IRSSI logs into weekly and monthly .log files. My log format looks like this: --- Day changed Fri Mar 04 2016 00:11 <Jack> Test --- Day changed Sat Mar 05 2016 00:11 <Jack> Test --- Day changed Sun Mar 06 2016 15:20 <Jack> Test The script I have been playing... (2 Replies)
Discussion started by: Stacked
2 Replies

10. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies
split(n)						       Tcl Built-In Commands							  split(n)

__________________________________________________________________________________________________________________________________________________

NAME
split - Split a string into a proper Tcl list SYNOPSIS
split string ?splitChars? _________________________________________________________________ DESCRIPTION
Returns a list created by splitting string at each character that is in the splitChars argument. Each element of the result list will con- sist of the characters from string that lie between instances of the characters in splitChars. Empty list elements will be generated if string contains adjacent characters in splitChars, or if the first or last character of string is in splitChars. If splitChars is an empty string then each character of string becomes a separate element of the result list. SplitChars defaults to the standard white-space char- acters. EXAMPLES
Divide up a USENET group name into its hierarchical components: split "comp.lang.tcl.announce" . -> comp lang tcl announce See how the split command splits on every character in splitChars, which can result in information loss if you are not careful: split "alpha beta gamma" "temp" -> al {ha b} {} {a ga} {} a Extract the list words from a string that is not a well-formed list: split "Example with {unbalanced brace character" -> Example with {unbalanced brace character Split a string into its constituent characters split "Hello world" {} -> H e l l o { } w o r l d PARSING RECORD-ORIENTED FILES Parse a Unix /etc/passwd file, which consists of one entry per line, with each line consisting of a colon-separated list of fields: ## Read the file set fid [open /etc/passwd] set content [read $fid] close $fid ## Split into records on newlines set records [split $content " "] ## Iterate over the records foreach rec $records { ## Split into fields on colons set fields [split $rec ":"] ## Assign fields to variables and print some out... lassign $fields userName password uid grp longName homeDir shell puts "$longName uses [file tail $shell] for a login shell" } SEE ALSO
join(n), list(n), string(n) KEYWORDS
list, split, string Tcl split(n)
All times are GMT -4. The time now is 11:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy