Splitting a file into chunks of 1TB


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting a file into chunks of 1TB
# 1  
Old 08-09-2013
Splitting a file into chunks of 1TB

Hi


I have a file with different filesystems with there sizes. I need to split them in chucks of 1TB.

The file looks like

Code:
vf_MTLHQNASF07_Wkgp2 187428400  10601AW1
vf_MTLHQNASF07_Wkgp2 479504596  10604AW1
vf_MTLHQNASF07_Wkgp2 19940      10605AID
vf_MTLHQNASF07_Wkgp2 1242622044 10605AW1
vf_MTLHQNASF07_Wkgp2 412696     10605AWP
vf_MTLHQNASF07_Wkgp2 87813204   10607AW1
vf_MTLHQNASF07_Wkgp2 31712      10607AW2
vf_MTLHQNASF07_Wkgp2 305420     ADMIN
vf_MTLHQNASF07_Wkgp2 8396       ANNUAL
vf_MTLHQNASF07_Wkgp2 94668      BO_Reports
vf_MTLHQNASF07_Wkgp2 4896484    Board Material
vf_MTLHQNASF07_Wkgp2 4992       CMONGEAU
vf_MTLHQNASF07_Wkgp2 64944      CORR
vf_MTLHQNASF07_Wkgp2 185218932  CS4AW1
vf_MTLHQNASF07_Wkgp2 15423520   CS4AW2
vf_MTLHQNASF07_Wkgp2 63368560   CS4AW3
vf_MTLHQNASF07_Wkgp2 2735968    CS4AW4

What I have is

Code:
cat /tmp/file |awk '{print; sum+=$2;if (sum>=1024000000){t=int(sum/1024000000)*1024000000; printf RS}}'

However it does not split them correctly

Code:
vf_MTLHQNASF07_Wkgp2 187428400  10601AW1
vf_MTLHQNASF07_Wkgp2 479504596  10604AW1
vf_MTLHQNASF07_Wkgp2 19940      10605AID
vf_MTLHQNASF07_Wkgp2 1242622044 10605AW1

vf_MTLHQNASF07_Wkgp2 412696     10605AWP
vf_MTLHQNASF07_Wkgp2 87813204   10607AW1
vf_MTLHQNASF07_Wkgp2 31712      10607AW2
vf_MTLHQNASF07_Wkgp2 305420     ADMIN
vf_MTLHQNASF07_Wkgp2 8396       ANNUAL
vf_MTLHQNASF07_Wkgp2 94668      BO_Reports
vf_MTLHQNASF07_Wkgp2 4896484    Board Material
vf_MTLHQNASF07_Wkgp2 4992       CMONGEAU
vf_MTLHQNASF07_Wkgp2 64944      CORR
vf_MTLHQNASF07_Wkgp2 185218932  CS4AW1

vf_MTLHQNASF07_Wkgp2 15423520   CS4AW2
vf_MTLHQNASF07_Wkgp2 63368560   CS4AW3
vf_MTLHQNASF07_Wkgp2 2735968    CS4AW4

Any help please
# 2  
Old 08-09-2013
something like that?
Code:
awk '{sum+=$2; if(sum>=1024000000) {sum=$2; print RS}}1' myFile

# 3  
Old 08-12-2013
This works but why the 2 lines in between? One would suffice
# 4  
Old 08-12-2013
drop the RS.
# 5  
Old 08-12-2013
Here is a solution the this problem that uses the metropolis algorithm to maximize the utilization of each group (ie get each group as close to the 1TB limit as possible).

Code:
function score_solution(show_result) {
    st=ut=gc=0
    for (i=1;i<=n;i++) {
       if(st+F[S[i]] > s) {
           ut+=st; st=0; gc++
           if (show_result) print ""
       }
       if (show_result) print S[i]
       st+=F[S[i]]
    }
    if (show_result) printf "\nGroups: %d Utilisation: %f\n", gc+(st?1:0), ut/(gc*s) > "/dev/stderr"
    return ut/(gc*s);
}
{ F[$0]=$2>s?s:$2; S[++n]=$0; ts+=$2}
END {
    srand()
    if(ts > s) {
       t = 10000;
       sc = oscore = -1;
       while (t > 0.003 && sc < 1.0) {
          while (oscore == sc) {
              a = int(rand() * n) + 1
              b = int(rand() * n) + 1
              # Swap two random lines
              tn=S[a]
              S[a]=S[b]
              S[b]=tn
              sc = score_solution(0)
           }
           t *= 0.9995;
           diff = sc - oscore;

           if (oscore == -1 || diff > 0 ||
            (((double)rand()) > exp(diff/-t))) {
                # Yes, have this one
                oscore = sc;
            } else {
                 # No thanks, swap them back
                 sc = oscore
                 S[b]=S[a]
                 S[a]=tn
            }
        }
    }
    score_solution(1)
}' infile

One output for the given datafile:

Code:
vf_MTLHQNASF07_Wkgp2 1242622044 10605AW1

vf_MTLHQNASF07_Wkgp2 479504596  10604AW1
vf_MTLHQNASF07_Wkgp2 87813204   10607AW1
vf_MTLHQNASF07_Wkgp2 15423520   CS4AW2
vf_MTLHQNASF07_Wkgp2 8396       ANNUAL
vf_MTLHQNASF07_Wkgp2 305420     ADMIN
vf_MTLHQNASF07_Wkgp2 185218932  CS4AW1
vf_MTLHQNASF07_Wkgp2 4896484    Board Material
vf_MTLHQNASF07_Wkgp2 31712      10607AW2
vf_MTLHQNASF07_Wkgp2 187428400  10601AW1
vf_MTLHQNASF07_Wkgp2 63368560   CS4AW3

vf_MTLHQNASF07_Wkgp2 19940      10605AID
vf_MTLHQNASF07_Wkgp2 2735968    CS4AW4
vf_MTLHQNASF07_Wkgp2 412696     10605AWP
vf_MTLHQNASF07_Wkgp2 4992       CMONGEAU
vf_MTLHQNASF07_Wkgp2 94668      BO_Reports
vf_MTLHQNASF07_Wkgp2 64944      CORR

Groups: 3 Utilisation: 1.000000

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split large file into smaller files without disturbing the entry chunks

Dears, Need you help with the below file manipulation. I want to split the file into 8 smaller files but without cutting/disturbing the entries (meaning every small file should start with a entry and end with an empty line). It will be helpful if you can provide a one liner command for this... (12 Replies)
Discussion started by: Kamesh G
12 Replies

2. Shell Programming and Scripting

Modification of perl script to split a large file into chunks of 5000 chracters

I have a perl script which splits a large file into chunks.The script is given below use strict; use warnings; open (FH, "<monolingual.txt") or die "Could not open source file. $!"; my $i = 0; while (1) { my $chunk; print "process part $i\n"; open(OUT, ">part$i.log") or die "Could... (4 Replies)
Discussion started by: gimley
4 Replies

3. Shell Programming and Scripting

Deleting duplicated chunks in a file using awk/sed

Hi all, I'd always appreciate all helps from this site. I would like to delete duplicated chunks of strings on the same row(?). One chunk is comprised of four lines such as: path name starting point ending point voltage number I would like to delete duplicated chunks on the same... (5 Replies)
Discussion started by: jypark22
5 Replies

4. Shell Programming and Scripting

Reverse sort on delimited chunks within a file

Hello, I have a large file in which data of names is sorted according to their homographs. The database has the following structure:Each set of homographs with their corresponding equivalents in Devanagari is separated out from the next set by a hard return. An example will make this... (12 Replies)
Discussion started by: gimley
12 Replies

5. Shell Programming and Scripting

awk for splitting file in constant chunks

Hi gurus, I wanted to split main file in 20 files with 2500 lines in each file. My main file conatins total 2500*20 lines. Following awk I made, but it is breaking with error. awk '{ for (i = 1; i <= 20; i++) { starts=2500*$i-1; ends=2500*$i; NR>=starts && NR<=ends {f=My$i".txt"; print >> f;... (10 Replies)
Discussion started by: mukesh.lalwani
10 Replies

6. UNIX for Dummies Questions & Answers

Awk: Print out overlapping chunks of file - rows 0-20,10-30,20-40 etc.

First time poster, but the forum has saved my bacon more times than... Lots. Anyway, I have a text file, and wanted to use Awk (or any other sensible program) to print out overlapping sections, or arbitrary length. To describe by example, for file 1 2 3 4 5 etc... I want the out put... (3 Replies)
Discussion started by: matfald
3 Replies

7. AIX

Can't backup more than ~1TB.

Hello all, I create a backup of a file system with hostA # find . | backup -iqvf /backup/hostA.fsA.backup After a while I get the following error message: backup medium write error: File too large Check backup media and rerun the backup /backup is a NFS mount and is the backup device... (5 Replies)
Discussion started by: petervg
5 Replies

8. Solaris

Multiple Backups to USB 1TB Drives using dd

First of all, great web site! I have been using it for a while but just registered today. It's been a great resource for me. Now, on to my issue.;) I'm geographically separated from six (Sun v245s) development servers that I have been asked to backup and restore as development is done and... (11 Replies)
Discussion started by: ShawnD41
11 Replies

9. Shell Programming and Scripting

Split file into chunks of low & high byte

Hi guys, i have a question about spliting a binary file into 2 chunks. First chunk with all high bytes and the second one with all low bytes. What unix tools can i use? And how can this be performed? I looked in manpages of split and dd but this does not help. Thanks (2 Replies)
Discussion started by: basta
2 Replies

10. Shell Programming and Scripting

remove chunks of text from file

All, So, I have an ldif file that contains about 6500 users worth of data. Some users have a block of text I'd like to remove, while some don't. Example (block of text in question is the block starting with "authAuthority: ;Kerberosv5"): User with text block: # username, users,... (7 Replies)
Discussion started by: staze
7 Replies
Login or Register to Ask a Question