Splitting a file into several smaller files using perl Post: 302616033

Sponsored Content

Top Forums Shell Programming and Scripting Splitting a file into several smaller files using perl Post 302616033 by Corona688 on Friday 30th of March 2012 11:23:16 AM

03-30-2012

Registered User

Code:

$ cat fsplit.sh

#!/bin/sh

if [ "$#" -lt 2 ] || [ ! -f "$1" ]
then
        echo "usage:  $0 inputfile numfiles" >&2
        exit 1
fi

awk -v NFILES="$2" -v FNAME="file_%d.dat" '
        # Do not print the first file -- just count lines
        NR==FNR { next }

        # First line of the second read through the file.
        FNR==1 {        HEADER=$0
                        MAXLINES=sprintf("%d", (NR-1)/NFILES);
                        LINES=MAXLINES
                        next    }

        # skip to the next file and print header if exceeded maxlines
        (LINES >= MAXLINES) {
                        LINES=0;        FILE++;
                        print HEADER > sprintf(FNAME,FILE);     }

        # Print all lines into the current file
        { print > sprintf(FNAME, FILE); LINES++ }

# Yes, we give awk the same file twice.  On the first read, it just counts
# lines.  On the second, it decides which lines go into what file.
' "$1" "$1"

$ cat data
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
1 jhj jgu gv 36
2 dut jhg hg 54
3 gkl jkl hv 67
4 fjh gfh hg 45
5 hgl hgk hg 73
6 hkj hg yg 79
1 jhj jgu gv 36
2 dut jhg hg 54
3 gkl jkl hv 67
4 fjh gfh hg 45
5 hgl hgk hg 73
6 hkj hg yg 79
1 jhj jgu gv 36
2 dut jhg hg 54
3 gkl jkl hv 67
4 fjh gfh hg 45
5 hgl hgk hg 73
6 hkj hg yg 79
5 hgl hgk hg 73

$ tail *.dat
==> file_1.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
1 jhj jgu gv 36
2 dut jhg hg 54

==> file_10.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
5 hgl hgk hg 73

==> file_2.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
3 gkl jkl hv 67
4 fjh gfh hg 45

==> file_3.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
5 hgl hgk hg 73
6 hkj hg yg 79

==> file_4.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
1 jhj jgu gv 36
2 dut jhg hg 54

==> file_5.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
3 gkl jkl hv 67
4 fjh gfh hg 45

==> file_6.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
5 hgl hgk hg 73
6 hkj hg yg 79

==> file_7.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
1 jhj jgu gv 36
2 dut jhg hg 54

==> file_8.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
3 gkl jkl hv 67
4 fjh gfh hg 45

==> file_9.dat <==
BASENAME STREETTYPE PREFIX SUFFIX HOUSENUMBER
5 hgl hgk hg 73
6 hkj hg yg 79

$

Note the files match out of order because 10 doesn't sort alphabetically later than 9. Try %02d instead of %d to get numbers with leading zeroes that are always 2 digits.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a Larger File Into Mutiple Smaller ones.

Hello.. Iam in need to urgent help with the below. Have data-file with 40,567 and need to split them into multiple files with smaller line-count. Iam aware of "split" command with -l option which allows you to specify the no of lines in smaller files ,with the target file-name pattern...

2. UNIX for Dummies Questions & Answers

splitting the large file into smaller files

hi all im new to this forum..excuse me if anythng wrong. I have a file containing 600 MB data in that. when i do parse the data in perl program im getting out of memory error. so iam planning to split the file into smaller files and process one by one. can any one tell me what is the code...

3. Shell Programming and Scripting

splitting text file into smaller ones

Hello We have a text file with 400,000 lines and need to split into multiple files each with 5000 lines ( will result in 80 files) Got an idea of using head and tail commands to do that with a loop but looked not efficient. Please advise the simple and yet effective way to do it. TIA...

4. Shell Programming and Scripting

perl help to split big verilog file into smaller ones for each module

Hi I have a big verilog file with multiple modules. Each module begin with the code word 'module <module-name>(ports,...)' and end with the 'endmodule' keyword. Could you please suggest the best way to split each of these modules into multiple files? Thank you for the help. Example of...

5. Shell Programming and Scripting

Help with splitting a large text file into smaller ones

Hi Everyone, I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one...

6. Shell Programming and Scripting

How to split a file into smaller files

Hi, I have a big text file with m columns and n rows. The format is like: STF123450001000200030004STF123450005000600070008STF123450009001000110012 STF234560345002208330154STF234590705620600070080STF234567804094562357688 STF356780001000200030004STF356780005000600070080STF356780800094562657687...

7. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . .

8. Shell Programming and Scripting

Splitting a file and creating new files using Perl script

Hi All, I am new to Scripting language. I want to split a file and create several subfiles using Perl script. Example : File format : Sourcename ID Date Nbr SU IMYFDJ 9/17/2012 5552159976555 SU BWZMIG 9/14/2012 1952257857887 AR PEHQDF 11/26/2012 ...

9. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check...

10. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ...

LEARN ABOUT CENTOS

trace-cmd-split

TRACE-CMD-SPLIT(1)														TRACE-CMD-SPLIT(1)

NAME

       trace-cmd-split - split a trace.dat file into smaller files

SYNOPSIS

       trace-cmd split [OPTIONS] [start-time [end-time]]

DESCRIPTION

       The trace-cmd(1) split is used to break up a trace.dat into small files. The start-time specifies where the new file will start at. Using
       trace-cmd-report(1) and copying the time stamp given at a particular event, can be used as input for either start-time or end-time. The
       split will stop creating files when it reaches an event after end-time. If only the end-time is needed, use 0.0 as the start-time.

       If start-time is left out, then the split will start at the beginning of the file. If end-time is left out, then split will continue to the
       end unless it meets one of the requirements specified by the options.

OPTIONS

       -i file
	   If this option is not specified, then the split command will look for the file named trace.dat. This options will allow the reading of
	   another file other than trace.dat.

       -o file
	   By default, the split command will use the input file name as a basis of where to write the split files. The output file will be the
	   input file with an attached '.#' to the end: trace.dat.1, trace.dat.2, etc.

	       This option will change the name of the base file used.

	       -o file	will create file.1, file.2, etc.

       -s seconds
	   This specifies how many seconds should be recorded before the new file should stop.

       -m milliseconds
	   This specifies how many milliseconds should be recorded before the new file should stop.

       -u microseconds
	   This specifies how many microseconds should be recorded before the new file should stop.

       -e events
	   This specifies how many events should be recorded before the new file should stop.

       -p pages
	   This specifies the number of pages that should be recorded before the new file should stop.

	       Note: only one of *-p*, *-e*, *-u*, *-m*, *-s* may be specified at a time.

	       If *-p* is specified, then *-c* is automatically set.

       -r
	   This option causes the break up to repeat until end-time is reached (or end of the input if end-time is not specified).

	       trace-cmd split -r -e 10000

	       This will break up trace.dat into several smaller files, each with at most
	       10,000 events in it.

       -c
	   This option causes the above break up to be per CPU.

	       trace-cmd split -c -p 10

	       This will create a file that has 10 pages per each CPU from the input.

SEE ALSO

       trace-cmd(1), trace-cmd-record(1), trace-cmd-report(1), trace-cmd-start(1), trace-cmd-stop(1), trace-cmd-extract(1), trace-cmd-reset(1),
       trace-cmd-list(1), trace-cmd-listen(1)

AUTHOR

       Written by Steven Rostedt, <rostedt@goodmis.org[1]>

RESOURCES

       git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

COPYING

       Copyright (C) 2010 Red Hat, Inc. Free use of this software is granted under the terms of the GNU Public License (GPL).

NOTES

	1. rostedt@goodmis.org
	   mailto:rostedt@goodmis.org

								    06/11/2014							TRACE-CMD-SPLIT(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a Larger File Into Mutiple Smaller ones.

Discussion started by: madhubt_1982

2. UNIX for Dummies Questions & Answers

splitting the large file into smaller files

Discussion started by: vsnreddy

3. Shell Programming and Scripting

splitting text file into smaller ones

Discussion started by: prvnrk

4. Shell Programming and Scripting

perl help to split big verilog file into smaller ones for each module

Discussion started by: return_user