Need help on merging


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help on merging
# 1  
Old 12-26-2014
Need help on merging

I have a total of 100 files (variable size of each file) with total size of 328950 bytes. I want to merge those 100 files into 4 files with each size be close to equal size i.e (328950/4 ~= 82238) but do not want to break any file. Any unix sheel script help will be really helpful.
# 2  
Old 12-26-2014
Try:
Code:
size=$(cat file* | wc -l)
cat file* | split -l $((size/4)) -

That will create 4 files with roughly the same number of lines..

--
You can split into equal byte sizes using wc -c option and split -b but that may break multibyte characters..

Last edited by Scrutinizer; 12-26-2014 at 07:33 AM..
# 3  
Old 12-26-2014
Thankis for reply. I tried the same. But the problem is it breaks the files. for example if I have 100 files which I have spliited though my customer number. When I am merging those 100 files into 4 files, I don't want a customer number to be present in multiple file.for example :

in the file1 data is present as where 12345 is my customer number.
Code:
12|12345|09876
.
.
12|12345|78901

when merged:
it should not present in both merge file say xaa or xab. It can only be present in one file.

Last edited by Scrutinizer; 12-26-2014 at 08:16 AM.. Reason: code tags
# 4  
Old 12-26-2014
OK I see, that is what you mean with "break a file" ...

If there are not too many files and if your file names do not contain spaces you could try this crude approach, which may be good enough for your application:

Code:
awk '
  BEGIN {
    n=4
    m=1
  }

  FNR==1 {
    if(NR>1) {
      A[m]+=sz
      f="outfile" m
      printf "%s",s>f
      m=1
      for(i=2; i<=n; i++)
        if(A[m]>A[i])
          m=i
      s=x
      sz=0
    }
  }

  {
    s=s $0 ORS
    sz+=length
  }

  END {
    if(NR>1)printf "%s",s>f
  }
' $(ls -drS file*)

It reverse sorts the files on size first and then for each file tries to put it in the emptiest bucket.

Last edited by Scrutinizer; 12-26-2014 at 09:44 AM..
# 5  
Old 12-28-2014
Can you please explain the script you have posted so that I can materialize it for my logic.
# 6  
Old 12-28-2014
Sure:

Code:
awk '
  BEGIN {
    n=4                      # set number of buckets
    m=1                      # initialize emptiest bucket
  }

  FNR==1 {                   # if a new file is starting to be read (FNR is line number per file)
    if(NR>1) {               # if it is not the very first file (NR is total line number for all files)
      A[m]+=sz               # add the size of the previous file to the emptiest bucket
      f="outfile" m          # specify the bucket output file name
      printf "%s",s>f        # print to last file from memory to emptiest bucket file
      m=1                    # set emptiest bucket to 1
      for(i=2; i<=n; i++)    # for every the other bucket
        if(A[m]>A[i])        # if the minimum bucket is fuller then that bucket
          m=i                # make that bucket the new minimum
      s=x                    # clear the file memory
      sz=0                   # clear its size
    }
  }

  {
    s=s $0 ORS               # Add next line of file to memory
    sz+=length               # Add the number of characters on that line to the size
  }

  END {
    if(NR>1)printf "%s",s>f  # print the last file in memory to the emptiest bucket file
  }
' $(ls -drS file*)           # read the files in reverse sorted size order

Hope this helps
This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 12-29-2014
I am getting error like below:

Code:
awk: Cannot find or open file 16.
 The source line number is 4

I have taken a sample of 5 files with name like test1 to test5. test1 looks like below:

Code:
12|09876|12345 78907|111.46|A|1234567
12|09876|12345 12345|111.46|A|1234567

can you please let me know about the error and how to rectify it? the permission is also good.

Last edited by Scrutinizer; 12-29-2014 at 02:24 AM.. Reason: code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Interval merging

I do have a file (file1) with a specified interval of 500 counts file1 0 500 500 1000 1000 1500 1500 2000 2000 2500 2500 3000 3000 3500 3500 4000 4000 4500 4500 5000 5000 5500 5500 6000 6000 6500 6500 7000 7000 7500 7500 8000 (3 Replies)
Discussion started by: Kanja
3 Replies

2. Shell Programming and Scripting

Merging 2 Arrays

I am trying to create a script that combines 2 arrays: #!/bin/bash read -a unix #(a c e g) read -a test #(b d f) #now I want to merge ${unix} with ${test}, one after another such that the result would be: (abcdefg) #I've tried quite a few options and can't seem to make it work (5 Replies)
Discussion started by: pbmitch
5 Replies

3. Shell Programming and Scripting

Merging

Hi, I have searched the forums for a solution but I haven't found a perfect answer, and I'm a bit of a novice, so I hope someone can help: I have 2 files: file1: Chr1 139311 1/1:37,3,0:19 Chr1 139350 1/1:67,6,0:19 Chr1 139404 1/1:0,0,0:7 Chr1 152655 0/1:0,0,0:3 Chr1 152718... (2 Replies)
Discussion started by: ljk
2 Replies

4. Shell Programming and Scripting

Merging two files with merging line by line

Hi, I have two files and i want to merge it like, file1.txt --------- abc cde efg file2.txt ------- 111 222 333 Output file should be, -------------- abc 111 (2 Replies)
Discussion started by: rbalaj16
2 Replies

5. Shell Programming and Scripting

merging

Hi all, I have 2 files. I want to merge a portion or column in file 2 into file 1. file 1 - not tab or space delimited B_1 gihgjfhdj| hgfkddlldjljldjlddl B_2 gihgjddshjgfhs| hgfkddlldjljldjlddl B_3 gihgjfhdj| hgfkddlldjljldjlddlhgjdhdhjdhjhdjhdjhgdj file2 -... (7 Replies)
Discussion started by: Lucky Ali
7 Replies

6. Shell Programming and Scripting

merging two files

Friends, os: redhat enterprise linux/SCO UNIX5.0 I have two files and I would like to merge on given key value. Now I have tried with join commd but it does not supporte multiple delimiters. and if records length is not fixed. join -a1 5 -a2 1 -t -o file1 file2 > outname Can any... (7 Replies)
Discussion started by: vakharia Mahesh
7 Replies

7. UNIX for Dummies Questions & Answers

Merging files

Hi i have two files say file 1 contents are A B C D E I have file2 contents are B E F G C K I want to have new file like A B (4 Replies)
Discussion started by: ssuresh1999
4 Replies

8. Shell Programming and Scripting

merging files

Thanks in advance I have 2 files having key field in each.I would like to join both on common key.I have used join but not sucessful. The files are attached here . what i Want in the output is on the key field SLS OFFR . I have used join commd but not successful. File one ======= SNO ... (6 Replies)
Discussion started by: vakharia Mahesh
6 Replies

9. Shell Programming and Scripting

Merging arrays

Hi all, I need some help in merging arrays. I have two arrays and using korn shell Array1 AB23 AB24 Array2 CD00 CD01 CD02 Elements from array 1 should always alternate with elements of arrays 2 i.e the result should look like AB23CD00 AB24CD01 AB23CD02 Any help is appreciated.... (4 Replies)
Discussion started by: jakSun8
4 Replies

10. Shell Programming and Scripting

Merging Help

Hi Gurus, I need a help in merging the files. I have nearly 7 files and the files will have time stamp in it. I need to merger these files condition is it is not necessary that all the 7 files has to be there. suppose if i have only 3 files availabe out of these 7 then i need to merge... (3 Replies)
Discussion started by: kumarc
3 Replies
Login or Register to Ask a Question