Splitting help required


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting help required
# 1  
Old 02-20-2015
Splitting help required

I need a help regarding splitting of a big file into 4 files by 2nd fields. But need to ensure that, same 2nd field should not be present in more than one spitted files.
# 2  
Old 02-20-2015
Hi, please show a representative sample of input, desired output, attempts at a solution and specify what OS and versions are being used.
# 3  
Old 02-20-2015
The OS is HPUX version : 11

Sample Input:
Code:
01|00010103|00025.00
01|00010103|00045.00
01|00010103|00080.00
01|00010103|00067.00
01|00010103|00067.95
01|00010287|00025.00
01|00010287|00045.00
01|00010287|00080.00
01|00010287|00067.00
01|00010299|00025.00
01|00010299|00045.00
01|00010299|00080.00
01|00010299|00067.00
01|00010299|00067.95
01|00010500|00025.00
01|00010500|00067.95
01|00010724|00025.00
01|00010724|00045.00

Sample outputs:
file1:
Code:
01|00010103|00025.00
01|00010103|00045.00
01|00010103|00080.00
01|00010103|00067.00
01|00010103|00067.95

Code:
file2
01|00010287|00025.00
01|00010287|00045.00
01|00010287|00080.00
01|00010287|00067.00

file3
Code:
01|00010299|00025.00
01|00010299|00045.00
01|00010299|00080.00
01|00010299|00067.00
01|00010299|00067.95

file4
Code:
01|00010500|00025.00
01|00010500|00067.95
01|00010724|00025.00
01|00010724|00045.00

The logic will be:
1. the total number of lines should be 1000 in 1st 3 splitted file rest will go to 4th one. but if the 2nd field value present in line number 999 and 1001 is same, it will continue, unless it encounter new 2nd field.
2. The total number of file is 4.

---------- Post updated at 08:37 AM ---------- Previous update was at 01:21 AM ----------

I searched and found below code, it can be helpful. But it don't rectrict number of files to 4. Can some please do it.
Code:
awk 'BEGIN {
        FS = OFS = "|"
        fn = "F" ++fc
}
c >= 1000 && $2 != last {
        close fn
        fn = "F" ++fc
        c = 0
}
{       print > fn
        last = $2
        c++
  
}' file.txt

# 4  
Old 02-20-2015
Limiting the file count to four isn't that difficult, but I doubt you'll be too happy as not every aspect has been taken into account. What if the input has 10000 lines? Three files 1000 lines each, one with 7000?
# 5  
Old 02-20-2015
Hi, try this approach which tries to distribute the records over n files (and therefore you do not need to provide a maximum number of records), while keeping records with the same $2 together..

Code:
awk -v  n=4 -v fname=outfile '
  BEGIN {
    FS=OFS="|"
    m=1
  }
  p!=$2 {
    if(NR>1) {
      A[m]+=sz
      f=fname m
      printf "%s",s>f
    }
    m=1
    for(i=2; i<=n; i++)
      if(A[m]>A[i])
        m=i
    s=x
    sz=0
    p=$2
  }
  {
    s=s $0 ORS 
    sz++
  }
  END {
    if(NR>1)printf "%s",s>f
  }
' file

Code:
$ for i in outfile{1..4}; do echo "$i"; cat "$i"; done
outfile1
01|00010103|00025.00
01|00010103|00045.00
01|00010103|00080.00
01|00010103|00067.00
01|00010103|00067.95
outfile2
01|00010287|00025.00
01|00010287|00045.00
01|00010287|00080.00
01|00010287|00067.00
outfile3
01|00010299|00025.00
01|00010299|00045.00
01|00010299|00080.00
01|00010299|00067.00
01|00010299|00067.95
outfile4
01|00010500|00025.00
01|00010500|00067.95
01|00010724|00025.00
01|00010724|00045.00


Last edited by Scrutinizer; 02-21-2015 at 09:06 AM..
# 6  
Old 02-20-2015
Rudi, This is exactly what I want. I need 3 files with 1000 lines and 1 file with rest of the line. Can you please restrict the filecount to 4?
# 7  
Old 02-21-2015
Your snippet needed just a small modification (and a sysntax correction) to achieve exactly that:
Code:
awk     'BEGIN          {FS = OFS = "|"
                         fn = "F" ++fc}

         c >= 1000 &&
           $2 != last &&
           fc < 4       {close(fn)
                         fn = "F" ++fc
                         c = 0}

                        {print > fn
                         last = $2 
                         c++}
        ' file
ls -la F*
-rw-rw-r-- 1 usr grp  21042 Feb 21 13:36 F1
-rw-rw-r-- 1 usr grp  21021 Feb 21 13:36 F2
-rw-rw-r-- 1 usr grp  21042 Feb 21 13:36 F3
-rw-rw-r-- 1 usr grp 126966 Feb 21 13:36 F4

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a file

I have a source file where a record starts with # entry-id: followed by a number. i want to move the record in file 1 iff the second row is "dn: uid=.*,ou=perm,dc=mssb,dc=com" and the record to file 2 if contents of second row anything else # entry-id: 1 dn:... (6 Replies)
Discussion started by: r_t_1601
6 Replies

2. Shell Programming and Scripting

Help with splitting of file

Hi, I'm beginner in UNIX I would like to split file in separate files depending on Pattern. Input file looks like:-A B C brfbeg A B C brfbeg A B C brfbeg . . n so on (7 Replies)
Discussion started by: Rohit_Mokal
7 Replies

3. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

4. Shell Programming and Scripting

splitting value

Hello, i want to take a one word from my file. -- myfile.txt -- test blablabla suPHP_ConfigPath /home/performe/etc blablabla etc. bla bla. -- myfile.txt -- How can i take performe from this file ? Thank you. (7 Replies)
Discussion started by: SAYGIN
7 Replies

5. Shell Programming and Scripting

need help in splitting the file

hi i have file with the format as below. header and associated company details company,accno,accname,amount abc,123,checking,100 abc,234,saving,200 company,accno,accname,amount def,678,checking,100 def,222,saving,200 company,accno,accname,amount dfdf,567,checking,100... (4 Replies)
Discussion started by: dsdev_123
4 Replies

6. Shell Programming and Scripting

help in splitting

Hi experts, In the lines below I am trying to copy only the words which come after "//" into an array short nloh; // Comments int age; // Age of the person Please help me in achieving this After the code, the output of the array should Comments Age of the person (6 Replies)
Discussion started by: ramakanth_burra
6 Replies

7. Shell Programming and Scripting

Getting required fields from a test file in required fromat in unix

My data is something like shown below. date1 date2 aaa bbbb ccccc date3 date4 dddd eeeeeee ffffffffff ggggg hh I want the output like this date1date2 aaa eeeeee I serached in the forum but didn't find the exact matching solution. Please help. (7 Replies)
Discussion started by: rdhanek
7 Replies

8. Shell Programming and Scripting

Need help in splitting the file

Hi, I got a file which may have 100 - 500 rows with header and trailer... based on the total number of real rows ( excluding header and trailer) I want to break the file in 3 or 4 files .. Coded like this .. but giving error in the AWK in 8th line of below code. awk 'NR > 5 {print line}... (3 Replies)
Discussion started by: Vaddadi
3 Replies

9. UNIX for Dummies Questions & Answers

Splitting a line

I have a series of .txt files, that contain lines of text separated by the following string ==================== In some of the .txt files, the string ends with the word Document, leaving the string ==================Document. I would like to be able to split any such line and move the word... (16 Replies)
Discussion started by: spindoctor
16 Replies

10. UNIX for Advanced & Expert Users

Splitting the input value

Hi, I have a Variable Char field coming whose value looks like below 562432569856 I need to extract 2 characters from it till the end and put it elsewhere say 56 24 32 56 98 56 in this case. Any input is appreciated. (2 Replies)
Discussion started by: thumsup9
2 Replies
Login or Register to Ask a Question