Sponsored Content
Top Forums Shell Programming and Scripting Splitting a file into several smaller files using perl Post 302615521 by ramky79 on Thursday 29th of March 2012 03:06:28 PM
Old 03-29-2012
Splitting a file into several smaller files using perl

Hi,
I'm trying to split a large file into several smaller files
the script will have two input arguments argument1=filename and argument2=no of files to be split.

In my large input file I have a header followed by 100009 records
The first line is a header; I want this header in all my splitted files

Here is what I have done so far
Code:
#!/usr/bin/perl
use File::Basename;

$inputfile=@ARGV[0];
$nof=@ARGV[1];                              # nof - no of files to split
($filename,$dir,$ext) = fileparse($inputfile,'\..*');
$header=`cat $inputfile | head -1`;
$NOLIF=`cat $inputfile | wc -l`;         # NOLIF - no of lines in file
$NOARIF=$NOLIF-1;                         # NOARIF - no of actual records in file
$NORPF=$NOARIF/$not;                    # NORPF - no of records per file
$NNORPF=`printf "%1.f\n" $NORPF`;   # NNORPF - new no of records per file

$count=0;
$filenum=0;

while (<$inputfile>) {
if ( $count == 0 ) {
     $nfilename = $filename._.$filenum.$ext;
     open( FILE, ">> $nfilename" );
     print( FILE "$header\n" );
     print( FILE "$_" );
     $count++;
  #} elsif ( $count == $NUM_LINES ) {
  } elsif ( $count == $NNORPF ) {
     close( FILE );
     $count = 0;
     $file_num++;
  } else {
     # just write the line!
     print( FILE "$_" );
     $count++;
  }
}

Here is my challenge:
Say I'm splitting my large input file into 10 files
so the first 9 files should have 10001 records and last should have 10000 records.

how do i get this working.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a Larger File Into Mutiple Smaller ones.

Hello.. Iam in need to urgent help with the below. Have data-file with 40,567 and need to split them into multiple files with smaller line-count. Iam aware of "split" command with -l option which allows you to specify the no of lines in smaller files ,with the target file-name pattern... (1 Reply)
Discussion started by: madhubt_1982
1 Replies

2. UNIX for Dummies Questions & Answers

splitting the large file into smaller files

hi all im new to this forum..excuse me if anythng wrong. I have a file containing 600 MB data in that. when i do parse the data in perl program im getting out of memory error. so iam planning to split the file into smaller files and process one by one. can any one tell me what is the code... (1 Reply)
Discussion started by: vsnreddy
1 Replies

3. Shell Programming and Scripting

splitting text file into smaller ones

Hello We have a text file with 400,000 lines and need to split into multiple files each with 5000 lines ( will result in 80 files) Got an idea of using head and tail commands to do that with a loop but looked not efficient. Please advise the simple and yet effective way to do it. TIA... (3 Replies)
Discussion started by: prvnrk
3 Replies

4. Shell Programming and Scripting

perl help to split big verilog file into smaller ones for each module

Hi I have a big verilog file with multiple modules. Each module begin with the code word 'module <module-name>(ports,...)' and end with the 'endmodule' keyword. Could you please suggest the best way to split each of these modules into multiple files? Thank you for the help. Example of... (7 Replies)
Discussion started by: return_user
7 Replies

5. Shell Programming and Scripting

Help with splitting a large text file into smaller ones

Hi Everyone, I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one... (2 Replies)
Discussion started by: lord_butler
2 Replies

6. Shell Programming and Scripting

How to split a file into smaller files

Hi, I have a big text file with m columns and n rows. The format is like: STF123450001000200030004STF123450005000600070008STF123450009001000110012 STF234560345002208330154STF234590705620600070080STF234567804094562357688 STF356780001000200030004STF356780005000600070080STF356780800094562657687... (2 Replies)
Discussion started by: wintersnow2011
2 Replies

7. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

8. Shell Programming and Scripting

Splitting a file and creating new files using Perl script

Hi All, I am new to Scripting language. I want to split a file and create several subfiles using Perl script. Example : File format : Sourcename ID Date Nbr SU IMYFDJ 9/17/2012 5552159976555 SU BWZMIG 9/14/2012 1952257857887 AR PEHQDF 11/26/2012 ... (13 Replies)
Discussion started by: Deepak9870
13 Replies

9. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

10. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies
rl(1)								   User Commands							     rl(1)

NAME
rl - Randomize Lines. SYNOPSIS
rl [OPTION]... [FILE]... DESCRIPTION
rl reads lines from a input file or stdin, randomizes the lines and outputs a specified number of lines. It does this with only a single pass over the input while trying to use as little memory as possible. -c, --count=N Select the number of lines to be returned in the output. If this argument is omitted all the lines in the file will be returned in random order. If the input contains less lines than specified and the --reselect option below is not specified a warning is printed and all lines are returned in random order. -r, --reselect When using this option a single line may be selected multiple times. The default behaviour is that any input line will only be selected once. This option makes it possible to specify a --count option with more lines than the file actually holds. -o, --output=FILE Send randomized lines to FILE instead of stdout. -d, --delimiter=DELIM Use specified character as a "line" delimiter instead of the newline character. -0, --null Input lines are terminated by a null character. This option is useful to process the output of the GNU find -print0 option. -n, --line-number Output lines are numbered with the line number from the input file. -q, --quiet, --silent Be quiet about any errors or warnings. -h, --help Show short summary of options. -v, --version Show version of program. EXAMPLES
Some simple demonstrations of how rl can help you do everyday tasks. Play a random sound after 4 minutes (perfect for toast): sleep 240 ; play `find /sounds -name '*.au' -print | rl --count=1` Play the 15 most recent .mp3 files in random order. ls -c *.mp3 | head -n 15 | rl | xargs --delimiter=' ' play Roll a dice: seq 6 | rl --count 2 Roll a dice 1000 times and see which number comes up more often: seq 6 | rl --reselect --count 1000 | sort | uniq -c | sort -n Shuffle the words of a sentence: echo -n "The rain in Spain stays mainly in the plain." | rl --delimiter=' ';echo Find all movies and play them in random order. find . -name '*.avi' -print0 | rl -0 | xargs -n 1 -0 mplayer Because -0 is used filenames with spaces (even newlines and other unusual characters) in them work. BUGS
The program currently does not have very smart memory management. If you feed it huge files and expect it to fully randomize all lines it will completely read the file in memory. If you specify the --count option it will only use the memory required for storing the specified number of lines. Improvements on this area are on the TODO list. The program uses the rand() system random function. This function returns a number between 0 and RAND_MAX, which may not be very large on some systems. This will result in non-random results for files containing more lines than RAND_MAX. Note that if you specify multiple input files they are randomized per file. This is a different result from when you cat all the files and pipe the result into rl. COPYRIGHT
Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Arthur de Jong. This is free software; see the license for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Version 0.2.7 Jul 2008 rl(1)
All times are GMT -4. The time now is 01:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy