Split and name files as 15 minute periods


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split and name files as 15 minute periods
# 1  
Old 06-22-2011
Split and name files as 15 minute periods

Hi,

I have a number of large files (up to 4GB) that I wish to split in 96 parts e.g. one for each 15 minutes of the day. The split can be random so I am using

split -b 42M filename.csv

I want to name each of the resultant files as a distinct 15 minute period of the day e.g.
A20110619.0000-20110619.0015_transaction.csv
A20110619.0015-20110619.0030_transaction.csv

etc.

I am completly stuck and any help would be greatly appreciated.

Thanks
Kieran
# 2  
Old 06-22-2011
What's your system? What's your shell? If you have GNU date, which usually comes with Linux, this may be possible in the shell, otherwise you'll likely need a language like Perl.

Code:
NOW_STR=$(date +'%Y-%m-%d') # Or manually set NOW_STR to 2011-01-11 for Jan 11
# -d needs GNU date.
T=$(date -d $NOW_STR +%s) # Start of the day, in epoch seconds
OFF=0

for FILE in x[a-z][a-z] # Matches xaa, xab, etc. that split makes in alphabetical order.
do
        L1="$(date -d "@$T" "+%Y%m%d.%H%M")"
        L2="$(date -d "@$((T+(60*15)))" "+%Y%m%d.%H%M")"
        echo mv "$FILE" A"${L1}-${L2}_transaction.csv"
        ((T += (60*15) )) # Add 15 minutes
        ((OFF += 15))
done

I am confused by your use of split though. You say it's random, but you split at a fixed size?

Last edited by Corona688; 06-22-2011 at 01:25 PM..
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 06-22-2011
Thanks for the reply.

I am working on AIX 5.3 with ksh. I will try that code.

With regard to split. There is a timestamp in each row of the file but I don't necessarily need to split the file so that the row with timestamp 00:05 goes into the 00:00-00:15 file. I just want to split the file up into 96 even sized files for each 15 minute period of the day. Hence the fixed split size.

Thanks again
Kieran
# 4  
Old 06-22-2011
Quote:
Originally Posted by ksexton
I am working on AIX 5.3 with ksh. I will try that code.
It won't work. You don't have GNU date.

I'll try to write something in perl.

---------- Post updated at 10:47 AM ---------- Previous update was at 10:42 AM ----------

Quote:
Originally Posted by ksexton
With regard to split. There is a timestamp in each row of the file but I don't necessarily need to split the file so that the row with timestamp 00:05 goes into the 00:00-00:15 file. I just want to split the file up into 96 even sized files for each 15 minute period of the day. Hence the fixed split size.
But that will split strictly on 42 megabyte boundaries, even if that happens to be in the middle of a line.
# 5  
Old 06-22-2011
I just saw that with the split which means I have to fix up each file and probably loose a row.
Is there a better way to split so I get full rows. How difficult is it to split the file based on the timestamp in the row which is of the form:
05/01/2011 00:02:30

I don't have GNU date as you say.

Thanks again for all your help.
# 6  
Old 06-22-2011
Code:
#!/usr/bin/perl
# Use:  script.pl YYYY-MM-DD file1 file2 ...
use Time::Local;
use POSIX qw(mktime strftime);

# arg 1 must be YYYY-MM-DD
my ($y, $mon, $d)=split("-", shift);
$mon--; # perl counts months 0-11

my $start=timelocal(0,0,0,$d,$mon,$y);

print "y=$y, mon=$mon, d=$d\n";

while($arg=shift)
{
        $fname=sprintf("A%s-%s.csv\n",
                strftime("%Y%m%d.%H%M",localtime($start)),
                strftime("%Y%m%d.%H%M", localtime($start+(15*60))));

        system("echo mv ${arg} ${fname}");

        $start += ((15*60));
}

exit 0;

---------- Post updated at 11:50 AM ---------- Previous update was at 11:15 AM ----------

This may do it. It reads the file by itself and prints rows into different files as appropriate.

Code:
#!/usr/bin/perl
use Time::Local;
use POSIX qw(mktime strftime);

my $start=0, $end=0, $FMT="%Y%m%d.%H%M";

while($line=<STDIN>)
{
        ($mdy, $hms, $rest)=split(" ", $line);
        ($m, $d, $y)=split("/", $mdy);
        ($hour, $min, $sec)=split(":", $hms);

        $ldate=timelocal($sec, $min, $hour, $d, $m-1,$y);
        $min=$min-($min%15);    # chunks of 15 minutes

        if($ldate >= $end)
        {
                $start=timelocal($sec,$min,$hour,$d,$m-1,$y);
                $end=$start + (15*60);
                $fname=sprintf("%s-%s.csv",
                        strftime("$FMT", localtime($start)),
                        strftime("$FMT", localtime($end)));

                FILE && close(FILE);
                open(FILE, ">$fname");
        }

        print FILE "$line";
}

FILE && close(FILE);
exit 0;

Assumes the row begins with M/D/Y H:M:S ...

It'll take the file name from the row data and nothing else.

Use it like ./script.pl < hugefile

It may not be the fastest thing in the universe.
This User Gave Thanks to Corona688 For This Post:
# 7  
Old 06-23-2011
Hi,

This looks great. I do get an error when I run it:
> ./script.pl < test.csv
Month '-1' out of range 0..11 at ./script.pl line 13

If I change m-1 to be just m in line 13 I get an error with the day
> ./script.pl < test.csv
Day '' out of range 1..31 at ./script.pl line 13

---------- Post updated at 08:07 AM ---------- Previous update was at 04:37 AM ----------

Hi,

I am actually ok with the first piece of code that uses the fixed size split. I can delete the first 'broken' row from each file. The effect will be minimal for what I need to do.

Thanks for all your help,

Kieran
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies

2. UNIX for Beginners Questions & Answers

Split and Rename Split Files

Hello, I need to split a file by number of records and rename each split file with actual filename pre-pended with 3 digit split number. What I have tried is the below command with 2 digit numeric value split -l 3 -d abc.txt F (# Will Produce split Files as F00 F01 F02) How to produce... (19 Replies)
Discussion started by: techedipro
19 Replies

3. UNIX for Advanced & Expert Users

Find files modified in previous minute only

Hi, How can I get files which are modified only in last minute ? it should not display 2 minutes back filels -la -rw-rw-r-- 1 stuser st 51 Dec 3 09:22 a.csv -rw-rw-r-- 1 stiser st 50 Dec 3 09:25 b.csv -rw-rw-r-- 1 stuser st 53 Dec 3 09:33 c.csv When I run command at 9:34am then I... (7 Replies)
Discussion started by: sbjv
7 Replies

4. Post Here to Contact Site Administrators and Moderators

Copy files from ServerA to ServerB and sleep for minute

I will get files into ServerA and sleep for 1 minute and go on.. I need a script to copy files from ServerA to ServerB without missing files or copying half files then sleep for minute. For next minute new files has to copied and old files has to be removed from ServerB (1 Reply)
Discussion started by: Srilatha Punna
1 Replies

5. Shell Programming and Scripting

Periods turn into spaces for some reason

Hey all, I've come across a problem that I can't solve. Its as simple as it can get. I want to substitute spaces in a string with dots e.g. i am a string i.am.a.string I've tried examples from the web that should work but they don't. A period seems to become a space for some reason :s ... (5 Replies)
Discussion started by: basherlemon
5 Replies

6. Shell Programming and Scripting

Take minute per minute from a log awk

Hi, I've been trying to develop a script that performs the parsing of a log every 1 minute and then generating some statistics. I'm fairly new to programming and this is why I come to ask if I can lend a hand. this is my log: xxxx 16/04/2012 17:00:52 - xxxx714 - E234 - Time= 119 ms.... (8 Replies)
Discussion started by: jockx
8 Replies

7. UNIX for Advanced & Expert Users

get the file extension having multiple periods

How to get the file extension having multiple periods suppose i have a file samplewprk.txt.dat I need to retrieve txt.dat from the file. I worked all iam getting is either txt or pyd Could anyone help me in providing the solution? Thanks, in advance (3 Replies)
Discussion started by: chinku
3 Replies

8. UNIX for Dummies Questions & Answers

backslashing periods

Hello, I have a script which configures a system, the configuration is currently manual and error prone. I am writting a script, which currently uses hard-coded values. I don't know how to take an IP, e.g. 123.456.789.111, and backslash the periods, so I can pass it to an `exec perl... (0 Replies)
Discussion started by: Bloke
0 Replies

9. UNIX for Dummies Questions & Answers

split files into specified number of output files

Hi everyone, I have some large text files that I need to split into a specific number of files of equal size. As far as I know (and I don't really know that much :)) the split command only lets you specify the number of lines or bytes. The files are all of a different size, so the number of... (4 Replies)
Discussion started by: Migrainegirl
4 Replies
Login or Register to Ask a Question