Splitting a file into several smaller files using perl
Hi,
I'm trying to split a large file into several smaller files
the script will have two input arguments argument1=filename and argument2=no of files to be split.
In my large input file I have a header followed by 100009 records
The first line is a header; I want this header in all my splitted files
Here is what I have done so far
Here is my challenge:
Say I'm splitting my large input file into 10 files
so the first 9 files should have 10001 records and last should have 10000 records.
Hello..
Iam in need to urgent help with the below.
Have data-file with 40,567
and need to split them into multiple files with smaller line-count.
Iam aware of "split" command with -l option which allows you to specify the no of lines in smaller files ,with the target file-name pattern... (1 Reply)
hi all
im new to this forum..excuse me if anythng wrong.
I have a file containing 600 MB data in that. when i do parse the data in perl program im getting out of memory error.
so iam planning to split the file into smaller files and process one by one.
can any one tell me what is the code... (1 Reply)
Hello
We have a text file with 400,000 lines and need to split into multiple files each with 5000 lines ( will result in 80 files)
Got an idea of using head and tail commands to do that with a loop but looked not efficient.
Please advise the simple and yet effective way to do it.
TIA... (3 Replies)
Hi
I have a big verilog file with multiple modules. Each module begin with the code word 'module <module-name>(ports,...)'
and end with the
'endmodule' keyword.
Could you please suggest the best way to split each of these modules into multiple files?
Thank you for the help.
Example of... (7 Replies)
Hi Everyone,
I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one... (2 Replies)
Hi,
I have a big text file with m columns and n rows. The format is like:
STF123450001000200030004STF123450005000600070008STF123450009001000110012
STF234560345002208330154STF234590705620600070080STF234567804094562357688
STF356780001000200030004STF356780005000600070080STF356780800094562657687... (2 Replies)
I will simplify the explaination a bit, I need to parse through a 87m file -
I have a single text file in the form of :
<NAME>house........
SOMETEXT
SOMETEXT
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
. (6 Replies)
Hi All,
I am new to Scripting language.
I want to split a file and create several subfiles using Perl script.
Example :
File format :
Sourcename ID Date Nbr
SU IMYFDJ 9/17/2012 5552159976555
SU BWZMIG 9/14/2012 1952257857887
AR PEHQDF 11/26/2012 ... (13 Replies)
Hi Everyone,
I'm new here and I was checking this old post:
/shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html
(cannot paste link because of lack of points)
I need to do something like this but understand very little of perl.
I also check... (4 Replies)
Hello,
I have some large text files that look like,
putrescine
Mrv1583 01041713302D
6 5 0 0 0 0 999 V2000
2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies
LEARN ABOUT SUSE
rl
rl(1) User Commands rl(1)NAME
rl - Randomize Lines.
SYNOPSIS
rl [OPTION]... [FILE]...
DESCRIPTION
rl reads lines from a input file or stdin, randomizes the lines and outputs a specified number of lines. It does this with only a single
pass over the input while trying to use as little memory as possible.
-c, --count=N
Select the number of lines to be returned in the output. If this argument is omitted all the lines in the file will be returned in
random order. If the input contains less lines than specified and the --reselect option below is not specified a warning is printed
and all lines are returned in random order.
-r, --reselect
When using this option a single line may be selected multiple times. The default behaviour is that any input line will only be
selected once. This option makes it possible to specify a --count option with more lines than the file actually holds.
-o, --output=FILE
Send randomized lines to FILE instead of stdout.
-d, --delimiter=DELIM
Use specified character as a "line" delimiter instead of the newline character.
-0, --null
Input lines are terminated by a null character. This option is useful to process the output of the GNU find -print0 option.
-n, --line-number
Output lines are numbered with the line number from the input file.
-q, --quiet, --silent
Be quiet about any errors or warnings.
-h, --help
Show short summary of options.
-v, --version
Show version of program.
EXAMPLES
Some simple demonstrations of how rl can help you do everyday tasks.
Play a random sound after 4 minutes (perfect for toast):
sleep 240 ; play `find /sounds -name '*.au' -print | rl --count=1`
Play the 15 most recent .mp3 files in random order.
ls -c *.mp3 | head -n 15 | rl | xargs --delimiter='
' play
Roll a dice:
seq 6 | rl --count 2
Roll a dice 1000 times and see which number comes up more often:
seq 6 | rl --reselect --count 1000 | sort | uniq -c | sort -n
Shuffle the words of a sentence:
echo -n "The rain in Spain stays mainly in the plain."
| rl --delimiter=' ';echo
Find all movies and play them in random order.
find . -name '*.avi' -print0 | rl -0 | xargs -n 1 -0 mplayer
Because -0 is used filenames with spaces (even newlines and other unusual characters) in them work.
BUGS
The program currently does not have very smart memory management. If you feed it huge files and expect it to fully randomize all lines it
will completely read the file in memory. If you specify the --count option it will only use the memory required for storing the specified
number of lines. Improvements on this area are on the TODO list.
The program uses the rand() system random function. This function returns a number between 0 and RAND_MAX, which may not be very large on
some systems. This will result in non-random results for files containing more lines than RAND_MAX.
Note that if you specify multiple input files they are randomized per file. This is a different result from when you cat all the files and
pipe the result into rl.
COPYRIGHT
Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Arthur de Jong.
This is free software; see the license for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Version 0.2.7 Jul 2008 rl(1)