Split file into n parts.

11-04-2013
Split file into n parts.

Hi all:

I have a 5-column tab-separated file.
The only thing that I want to do with it is to split it.
However, I want to split it with a 80/20 proportion -- randomized, if possible.
I know that something like :

awk '{print $0 ""> "file" NR}' RS='' input-file

will work, but it only splits into equallly sized files -- and does not randomize.

Does anyone know if that is possible using awk/grep?
11-04-2013
look into awk's 'srand' and 'rand' functions....
11-04-2013
Something like this would randomize the lines --

$ awk -v N=`cat FILE | wc -l` 'rand()<numberoflines/N' FILE

But it only prints a defined "number of lines".
Is it possible to split this instead into two files -- one that is 80% of the content and the second which is 20% of the content of the file?
11-04-2013
a bit verbose, but.....
output will be in files p1 and p2
awk -f ow.awk myFile
awk -v p1=60 -v p2=40 -f ow.awk myFile

function genrand(n)

  if (!p1) p1=80
  if (!p2) p2=20
{ a[FNR]=$0;fnr=FNR }
    if (!(g in a))
    else {
      out=((i/fnr)*100 <= p1)?perc[p1]:perc[p2]
      print a[g] >> out
      delete a[g]

11-04-2013
Try also
 sort -R file | split -l $(($(wc -l <file) *8/10))

11-04-2013
if just want to split then you can try like this also

split -l $[ $(wc -l file |cut -d ' ' -f1) * 70 / 100  ] file output_prefix

go through man split
but it's not random Smilie
11-04-2013
If approximately 80/20 is sufficient:
awk 'BEGIN {srand()} {print > (rand() < .8 ? "f1" : "f2")}' file

