Splitting files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting files
# 1  
Old 07-02-2012
Splitting files

Hello all

I have a file which has around 80 million records, I want to split it to 12 equal files, I tried using the split command but it is allowing me to split according to number of lines or by size. Is there a way i can split the file into 12 files without worrying about the number of lines or size.
# 2  
Old 07-02-2012
Quote:
Originally Posted by Sri3001
Hello all

I want to split it to 12 equal files
In which context? You already ignored the option of equal lines and size?
# 3  
Old 07-02-2012
To be clear: How can you be sure you get 12 files and not 13 or 11 if you do not provide parameters to split? split works on number of bytes or number of lines, csplit works on context (strings in the file, for example). csplit and split have defaults. These WILL NOT produce 12 files for you.
# 4  
Old 07-02-2012
My Apologies for the ambiguity on the query ,

1. I need 12 files no matter whatever is the size of the file
2. I think I cant use split function as I have to give either the number of lines per file or the size of each file in case of split.

I just want to know if there is any code that splits the file into n number of file , 'n' being any number.

Hope I am clear now.
# 5  
Old 07-02-2012
So, you could do some math to get the no-of-lines required to split the file in n parts..

Code:
required_parts=12
total_lines=$(wc -l < your_file)
lines_to_be_splited=$(echo \($t+$n-1\)/$n | bc)

split -l $lines_to_be_splited your_file

Is that you desired for?
# 6  
Old 07-03-2012
Doing a math would me my last option.. Is there any piece of code that does this, If there is none i would be left with no other option other than doing something which you have told
# 7  
Old 07-03-2012
Hi.

Here is a shell script that calls a perl script to split a file. The contents of the split results are not uniform in number, but they seem somewhat balanced. It works on small datasets like the sample, but whether it would work on a very large dataset is unknown -- for example, it may be limited by memory. This is a quickly written solution, and if you are curious, you'd need to look over the documentation and code, and / or contact the author the perl module Text::Parts:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate perl Text::Parts for splitting file.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
edges() { head -3 $1;pe "---";tail -3 $1; pe ; }
C=$HOME/bin/context && [ -f $C ] && $C

FILE=${1-data1}
pl " Edges of $(wc -l <$FILE) lines in input data file $FILE:"
# head -5 $FILE ; pe "---" ; tail -5 $FILE
edges $FILE

pl " Sample perl script to split file:"
cat p1

# Remove debris form previous runs, try various PARTS.
rm -f file*.txt
PARTS=6
PARTS=3
PARTS=10
pl " Splitting $FILE into $PARTS parts:"
./p1 $FILE $PARTS

pl " Files created:"
wc -l file*.txt

pl " Sample result, edges of part 1:"
edges file1.txt
pl " Sample result, edges of part $PARTS:"
edges file$PARTS.txt

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39

-----
 Edges of 80 lines in input data file data1:
1
2
3
---
78
79
80


-----
 Sample perl script to split file:
#!/usr/bin/env perl

# @(#) p1	Split file into n parts, Text::Parts.
# See:
# http://search.cpan.org/~ktat/Text-Parts-0.15/lib/Text/Parts.pm

use strict;
use warnings;
use Text::Parts;

my $file  = shift || die " Need a filename to split.\n";
my $parts = shift || "4";

my $splitter = Text::Parts->new( file => $file );

$splitter->write_files(
  'file%d.txt',
  num  => $parts,
  code => \&do_after_split
);

sub do_after_split {
  my ( $filename, $f );
  $filename = shift;    # e.g. 'path/to/name1.txt'
  open( $f, ">>", $filename ) || die "Cannot open $filename for append.\n";
  print $f "\n";
  close $f;
}

exit(0);

-----
 Splitting data1 into 10 parts:

-----
 Files created:
 11 file1.txt
  7 file10.txt
  8 file2.txt
  8 file3.txt
  8 file4.txt
  8 file5.txt
  8 file6.txt
  7 file7.txt
  8 file8.txt
  7 file9.txt
 80 total

-----
 Sample result, edges of part 1:
1
2
3
---
9
10
11


-----
 Sample result, edges of part 10:
74
75
76
---
78
79
80

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies

2. UNIX for Dummies Questions & Answers

Splitting log files

I am trying to split my IRSSI logs into weekly and monthly .log files. My log format looks like this: --- Day changed Fri Mar 04 2016 00:11 <Jack> Test --- Day changed Sat Mar 05 2016 00:11 <Jack> Test --- Day changed Sun Mar 06 2016 15:20 <Jack> Test The script I have been playing... (2 Replies)
Discussion started by: Stacked
2 Replies

3. Shell Programming and Scripting

Splitting files into 100 files with field value

I want a script to split my file upon the last field (15) As file A,b,c,.......,01 C,v,n,.......,02 C,r,v,........,01 F,s,a,........,03 X,y,d,........,99 To make output 01.txt A,b,c,.......,01 C,r,v,........,01 02.txt C,v,n,.......,02 (12 Replies)
Discussion started by: teefa
12 Replies

4. UNIX for Dummies Questions & Answers

Splitting Files Help

Hi Gurus, Lets say i have a file with some 30 records... How can i split that file into 3 files Also it shud be dynamic in the sense.. I wouldnt mind if file 1 has 15, file 2 has 10 and file 3 has 5.... Please help.. Thanks (6 Replies)
Discussion started by: saggiboy10
6 Replies

5. Shell Programming and Scripting

Splitting files from one file

Hi, I have an input file like: 111 abcdefgh asdfghjk dfghjkl 222 aaaaaaa bbbbbb 333 djfhfgjktitjhgfkg 444 djdhfjkhfjkghjkfg hsbfjksdbhjkgherjklg fjkhfjklsahjgh fkrjkgnj I want to read this input file and make separate output files with the header as numric value like "111"... (9 Replies)
Discussion started by: saltysumi
9 Replies

6. UNIX for Dummies Questions & Answers

splitting the files

Hi, I have some files with 2 million odd records which i need to split into chunks of 0.5 millions. I have the file sorted with a key column in order. The same key value can appear as 4 or 5 records in the file. Hence after splitting we are checking whether all the key values are present in the... (5 Replies)
Discussion started by: dnat
5 Replies

7. Shell Programming and Scripting

Splitting input files into multiple files through AWK command

Hi, I needs to split *.txt files from single directory depends on the some mutltiple input values. i have wrote the code like below for file in *.txt do grep -i -h "value1|value2" $file > $file; done. My requirment is more input values needs to be given in grep; let us say 50... (3 Replies)
Discussion started by: arund_01
3 Replies

8. UNIX for Advanced & Expert Users

splitting the files

Hi, How can i split the big file by the lines?. For eg. I wanna split the file from the line 140 to 1700. (9 Replies)
Discussion started by: sharif
9 Replies

9. Shell Programming and Scripting

Splitting large files

Hi Unix gurus, We have a masterfile which is to be split into smallerfiles with names as masterfile00,masterfile01,masterfile03...etal I was able to split the file using the "Split" cmd but as masterfileaa,masterfileab.. Is it posiible to change the default suffix? or is there any other... (2 Replies)
Discussion started by: Rvbs
2 Replies

10. Shell Programming and Scripting

splitting the files

Hi, I'am using HP-UX.I have a input file which has 102 drop statements in it.I'am using csplit to split the files.The upper limit is 99 only.I'am using the -n 102 option.It says "suffix size not vaild".Any suggestions how to do it using csplit? Thanx in advance, sounder. (1 Reply)
Discussion started by: sounder123
1 Replies
Login or Register to Ask a Question