Split file into n parts.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split file into n parts.
# 1  
Old 11-04-2013
Split file into n parts.

Hi all:

I have a 5-column tab-separated file.
The only thing that I want to do with it is to split it.
However, I want to split it with a 80/20 proportion -- randomized, if possible.
I know that something like :

Code:
awk '{print $0 ""> "file" NR}' RS='' input-file

will work, but it only splits into equallly sized files -- and does not randomize.

Does anyone know if that is possible using awk/grep?
# 2  
Old 11-04-2013
look into awk's 'srand' and 'rand' functions....
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 11-04-2013
Thanks:
Something like this would randomize the lines --

Code:
$ awk -v N=`cat FILE | wc -l` 'rand()<numberoflines/N' FILE


But it only prints a defined "number of lines".
Is it possible to split this instead into two files -- one that is 80% of the content and the second which is 20% of the content of the file?
# 4  
Old 11-04-2013
a bit verbose, but.....
output will be in files p1 and p2
Code:
awk -f ow.awk myFile
or
awk -v p1=60 -v p2=40 -f ow.awk myFile

ow.awk:
Code:
function genrand(n)
{
  return(int(n*rand())+1)
}

BEGIN {
  srand()
  if (!p1) p1=80
  if (!p2) p2=20
  perc[p1]="p1"
  perc[p2]="p2"
}
{ a[FNR]=$0;fnr=FNR }
END {
  for(i=1;i<=fnr;i++){
    g=int(genrand(fnr))
    if (!(g in a))
      i--
    else {
      out=((i/fnr)*100 <= p1)?perc[p1]:perc[p2]
      print a[g] >> out
      close(out)
      delete a[g]
    }
  }
}

This User Gave Thanks to vgersh99 For This Post:
# 5  
Old 11-04-2013
Try also
Code:
 sort -R file | split -l $(($(wc -l <file) *8/10))

These 2 Users Gave Thanks to RudiC For This Post:
# 6  
Old 11-04-2013
if just want to split then you can try like this also

Code:
split -l $[ $(wc -l file |cut -d ' ' -f1) * 70 / 100  ] file output_prefix

go through man split
but it's not random Smilie
This User Gave Thanks to Akshay Hegde For This Post:
# 7  
Old 11-04-2013
If approximately 80/20 is sufficient:
Code:
awk 'BEGIN {srand()} {print > (rand() < .8 ? "f1" : "f2")}' file

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split the all files in to 8 parts in a folder

Hi, I have different files and i need to split the files in that folder split in to 8 parts with equal number of lines....! any fastest way of doing this in awk. for an example i have a file called "BillingDetails_BaseFile.csv" with total line count 65536 and i need to split in to 8 parts... (1 Reply)
Discussion started by: Raghuram717
1 Replies

2. Shell Programming and Scripting

Split line in 4 parts

Hi Guys, I have file A.txt 1 2 3 4 5 6 7 8 9 10 11 Want Output :- 1 2 3 (3 Replies)
Discussion started by: pareshkp
3 Replies

3. UNIX for Dummies Questions & Answers

Split a file into parts only if the first field is different

Hi, I have a file like this: aaa 123 aaa 223 aaa 225 bbb 332 bbb 423 bbb 6755 bbb 324 ccc 112 ccc 234 ccc 897 Which I need to split into several files, something like split -l 3 but the way that the lines with the same names would only go into one file: (7 Replies)
Discussion started by: coppuca
7 Replies

4. UNIX for Dummies Questions & Answers

How To Split A File In Two Rar Parts?

I Am connected to Whatbox.ca Seed Box Via SSH!! i have a file named avicii.mp3. I Want to split it into two rar parts as Apart1.rar and Apart2.rar So That When i Download Both the parts to My PC And Extract Them They Come out As Whole Avicii.mp3. There is also one more problem!! When I Rar A... (18 Replies)
Discussion started by: anime12345
18 Replies

5. Shell Programming and Scripting

Incrementing parts of ten digits number by parts

I have number in file which contains date and serial number: 2013101000. The last two digits are serial number (00). So maximum of serial number is 100. After reaching 100 it becomes 00 with incrementing 10 which is day with max 31. after reaching 31 it becomes 00 and increments 10... (31 Replies)
Discussion started by: Natalie
31 Replies

6. Shell Programming and Scripting

Combine two parts of a file

Hello All, I have a file like this APPLY ( 'INSERT INTO brdcst_media_cntnt ( cntnt_id ,brdcst_media_cntnt_cd ,cntnt_prvdr_cd ,data_src_type_cd ,cntnt_titl_nm ,cntnt_desc ,batch_dt ,batch_id ) VALUES ( :cntnt_id (3 Replies)
Discussion started by: nnani
3 Replies

7. Shell Programming and Scripting

extract certain parts from a file

I have a logfile from which i need to extract certain pattern based on the time but the problem here is the time is not same for all days. Input file: Mon 12:34:56 abvjingjgg Mon 12:34:57 ofjhjgjhgh . . . Mon 22:30:00 kkfng . . . Mon 23:12:23 kjgsdafhkljf . . . Tue 01:04:54... (8 Replies)
Discussion started by: gpk_newbie
8 Replies

8. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

9. Shell Programming and Scripting

getting parts of a file

Hello, I'm trying to retreive certain bits of info from a file. the file contains a list like this info1:info2:info3:info4 info1:info2:info3:info4 info1:info2:info3:info4 info1:info2:info3:info4 how do i pick out only info2 or only info3 without the others? Thanks (11 Replies)
Discussion started by: bebop1111116
11 Replies

10. UNIX for Dummies Questions & Answers

cksum parts of a file

Every time we build an executable the date and time are put into the file, I need to run checksum on just the working lines.(IE, no header files) Is this even possible, if so how would I go about it? I am using a HP-UX server any help you can give me will be greatly appreciated. Thanks (6 Replies)
Discussion started by: crazykelso
6 Replies
Login or Register to Ask a Question