Modification of perl script to split a large file into chunks of 5000 chracters
I have a perl script which splits a large file into chunks.The script is given below
Code:
use strict;
use warnings;
open (FH, "<monolingual.txt") or die "Could not open source file. $!";
my $i = 0;
while (1) {
my $chunk;
print "process part $i\n";
open(OUT, ">part$i.log") or die "Could not open destination file";
$i ++;
if (!eof(FH)) {
read(FH, $chunk, 5000);
print OUT $chunk;
}
if (!eof(FH)) {
$chunk = <FH>;
print OUT $chunk;
}
close(OUT);
last if eof(FH);
}
I want the script to create chunks of 5000 characters or a bit less but not more than that.
How do I modify the chunk size to ensure that each chunk is of 5000 characters. When I run it some chunks are more than 5000 characters.
Many thanks for your kind help
Hi,
I have a large file(csv format) that I need to split into 2 files. The file looks something like
Original_file.txt
first name, family name, address
a, b, c,
d, e, f,
and so on for over 100,00 lines
I need to create two files from this one file. The condition is i need to ensure... (4 Replies)
Greetings all:
I am still new to Unix environment and I need help with the following requirement.
I have a large sequential file sorted on a field (say store#) that is being split into several smaller files, one for each store. That means if there are 500 stores, there will be 500 files. This... (1 Reply)
HI,
i've to split a large file which inputs seems like :
Input file name_file.txt
00001|AAAA|MAIL|DATEOFBIRTHT|.......
00001|AAAA|MAIL|DATEOFBIRTHT|.......
00002|BBBB|MAIL|DATEOFBIRTHT|.......
00002|BBBB|MAIL|DATEOFBIRTHT|.......
00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Hi,
I have file: data.log.1
### s1
main.build.3495
main.build.199
main.build.3408
###s2
main.build.3495
main.build.3408
main.build.199
I want to read this file and store in two arrays in Perl.
I have following command, which is working fine on command prompt.
perl -n -e... (1 Reply)
Hi guys,
i have a question about spliting a binary file into 2 chunks.
First chunk with all high bytes and the second one with all low bytes.
What unix tools can i use? And how can this be performed?
I looked in manpages of split and dd but this does not help.
Thanks (2 Replies)
I have a 3 GB text file that I would like to split. How can I do this?
It's a giant comma-separated list of numbers. I would like to make it into about 20 files of ~100 MB each, with a custom header and footer. The file can only be split on commas, but they're plentiful.
Something like... (3 Replies)
I had a text file(comma seperated values) which contains as below
196237,ram,25-May-06,ram.kiran@xyz.com,204183,Pavan,4-Jun-07,Pavan.Desai@xyz.com,237107,ram Chandra,15-Mar-10,ram.krishna@xyz.com ... (3 Replies)
Hi,
I need to split a large array "@sharedArray" into 10 small arrays.
The arrays should be like @sharedArray1,@sharedArray2,@sharedArray3...so on..
Can anyone help me with the logic to do so :(:confused: (6 Replies)
Dears,
Need you help with the below file manipulation. I want to split the file into 8 smaller files but without cutting/disturbing the entries (meaning every small file should start with a entry and end with an empty line). It will be helpful if you can provide a one liner command for this... (12 Replies)
Trying to split a 35gb file into 1000mb parts. My research shows I should you this. split -b 1000m file.txt and my return is "split: cannot open 'crunch1.txt' for reading: No such file or directory" so I tried split -b 1000m Documents/Wordlists/file.txt and I get nothing other than the curser just... (3 Replies)
Discussion started by: sub terra
3 Replies
LEARN ABOUT REDHAT
fcopy
fcopy(n) Tcl Built-In Commands fcopy(n)
__________________________________________________________________________________________________________________________________________________NAME
fcopy - Copy data from one channel to another.
SYNOPSIS
fcopy inchan outchan ?-size size? ?-command callback?
_________________________________________________________________DESCRIPTION
The fcopy command copies data from one I/O channel, inchan to another I/O channel, outchan. The fcopy command leverages the buffering in
the Tcl I/O system to avoid extra copies and to avoid buffering too much data in main memory when copying large files to slow destinations
like network sockets.
The fcopy command transfers data from inchan until end of file or size bytes have been transferred. If no -size argument is given, then the
copy goes until end of file. All the data read from inchan is copied to outchan. Without the -command option, fcopy blocks until the copy
is complete and returns the number of bytes written to outchan.
The -command argument makes fcopy work in the background. In this case it returns immediately and the callback is invoked later when the
copy completes. The callback is called with one or two additional arguments that indicates how many bytes were written to outchan. If an
error occurred during the background copy, the second argument is the error string associated with the error. With a background copy, it
is not necessary to put inchan or outchan into non-blocking mode; the fcopy command takes care of that automatically. However, it is nec-
essary to enter the event loop by using the vwait command or by using Tk.
You are not allowed to do other I/O operations with inchan or outchan during a background fcopy. If either inchan or outchan get closed
while the copy is in progress, the current copy is stopped and the command callback is not made. If inchan is closed, then all data
already queued for outchan is written out.
Note that inchan can become readable during a background copy. You should turn off any fileevent handlers during a background copy so
those handlers do not interfere with the copy. Any I/O attempted by a fileevent handler will get a "channel busy" error.
Fcopy translates end-of-line sequences in inchan and outchan according to the -translation option for these channels. See the manual entry
for fconfigure for details on the -translation option. The translations mean that the number of bytes read from inchan can be different
than the number of bytes written to outchan. Only the number of bytes written to outchan is reported, either as the return value of a syn-
chronous fcopy or as the argument to the callback for an asynchronous fcopy.
EXAMPLE
This first example shows how the callback gets passed the number of bytes transferred. It also uses vwait to put the application into the
event loop. Of course, this simplified example could be done without the command callback. proc Cleanup {in out bytes {error {}}} {
global total
set total $bytes
close $in
close $out
if {[string length $error] != 0} { # error occurred during the copy
} } set in [open $file1] set out [socket $server $port] fcopy $in $out -command [list Cleanup $in $out] vwait total
The second example copies in chunks and tests for end of file in the command callback proc CopyMore {in out chunk bytes {error {}}} {
global total done
incr total $bytes
if {([string length $error] != 0) || [eof $in] { set done $total close $in close $out
} else { fcopy $in $out -command [list CopyMore $in $out $chunk] -size $chunk
} } set in [open $file1] set out [socket $server $port] set chunk 1024 set total 0 fcopy $in $out -command [list CopyMore $in $out
$chunk] -size $chunk vwait done
SEE ALSO
eof(n), fblocked(n), fconfigure(n)
KEYWORDS
blocking, channel, end of line, end of file, nonblocking, read, translation
Tcl 8.0 fcopy(n)