how to choose random columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to choose random columns
# 8  
Old 04-24-2011
Hi, mira.

Providing an example would help.

Given this simple model of data in "row,column" notation:
Code:
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 1,10 
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9 2,10 
3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 3,10 
4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 4,9 4,10

what would be your expected results of, say 2 runs, one for 2 and one for 3 of your random choices. This will help us know what your intentions are.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 9  
Old 04-24-2011
ok, lets say if I need 2 random columns it can be any two e.g.

Code:
 
1,6  1,10
2,6  2,10
3,6  3,10
4,6  4,10

and if I say I need 3 random columns, it can be any three, but what is important is it should be random, may be by using some function for choosing columns randomly

e.g. output will be :

Code:
 
1,1  1,4  1,7
2,1  2,4  2,7
3,1  3,4  3,7
4,1  4,4  4,7

Thanks! drl.. Smilie
# 10  
Old 04-24-2011
What languages do you have available? I'm pondering a solution in C.

---------- Post updated at 01:27 PM ---------- Previous update was at 12:20 PM ----------

Code:
/**
 *	rc.c	picks random columns from whitespace-separated input from stdin.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

const char ofs=' ';		// output field separator
const char *fs=" \r\n\t";	// input field seperator

int main(int argc, char *argv[])
{
	size_t size;	// buf[] size in bytes
	char *buf=malloc(size=65536L);
	int pos=0;	// How much's been read into buf[]

	int coln=0, colmax;	// How many columns read, max columns in cols[]
	char **cols=malloc(sizeof(char *) * (colmax=512L));

	int choose=1;	// How many columns to choose

	if(argc != 2)
	{
		fprintf(stderr, "Usage:  %s columns < datafile\n",argv[0]);
		return(1);
	}

	if((sscanf(argv[1], "%d", &choose) != 1) || (choose <= 0))
	{
		fprintf(stderr, "Bad count '%s'\n", argv[1]);
		return(1);
	}

	srand(time(NULL)^getpid());	// Make results random

	// Read until end of file
	while(fgets(buf+pos, size-pos, stdin) != NULL)
	{
		int c;
		char *tok;

		pos += strlen(buf+pos);// Find the end of line
		if(pos <= 0) continue; // Don't bother checking empty line
		// Check the end of line for \n
		if(buf[pos-1] != '\n')
		{	// Didn't get entire line, make buffer bigger
			// then get the rest
			buf=realloc(buf, size += size>>1);
			continue;
		}

		// Break into columns across whitespace
		tok=strtok(buf, fs);
		do
		{
			// Check if we have enough room for columns.
			// Add more if necessary.
			if(colmax <= coln)
				cols=realloc(cols, sizeof(char *)*
					(colmax+=(colmax>>1)));

			cols[coln++]=tok;
			tok=strtok(NULL, fs);
		} while(tok != NULL);

		for(c=0; (c<choose)&&(coln>0); c++)
		{
			int m=rand()%coln;
			char *pick=cols[m];
			cols[m]=cols[--coln];	// Remove from list

			if(c != 0)	putc(ofs, stdout);
			fputs(pick, stdout);
		}

		putc('\n', stdout);

		// Reset everything for next line
		coln=0;		pos=0;
	}

	return(0);
}

should handle very large lines and thousands of columns without problem.
This User Gave Thanks to Corona688 For This Post:
# 11  
Old 04-24-2011
Hi.

With standard utilities:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate extraction of random number of columns.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for i;do printf "%s" "$i";done; printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C seq rl tr sed cut

N=${1-3}
FILE=data1

# Generate N numbers in random sequence, range 1 - number-of-columns
cols=$( seq 1 10 | rl -c $N | tr '\n' ',' | sed 's/.$//' )

pl " Random columns: $cols"

cut -d" " -f"$cols" $FILE

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.7 (lenny) 
GNU bash 3.2.39
seq (GNU coreutils) 6.10
rl 0.2.7
tr (GNU coreutils) 6.10
GNU sed version 4.1.5
cut (GNU coreutils) 6.10

-----
 Random columns: 7,9,2
1,2 1,7 1,9
2,2 2,7 2,9
3,2 3,7 3,9
4,2 4,7 4,9

and choosing a different N:
Code:
% ./s1 5

... omitted

-----
 Random columns: 8,2,9,1,5
1,1 1,2 1,5 1,8 1,9
2,1 2,2 2,5 2,8 2,9
3,1 3,2 3,5 3,8 3,9
4,1 4,2 4,5 4,8 4,9

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Which Product to Choose?

Okay, I have an Asus A8NSLI board with an Athlon 64 and I dunno, maybe 8gig Ram and Windows has crashed for the last time so I've finally had enough and I'll make it a Unix machine. I have a new 1Tera drive and I'm all set to go. Which brand of Unix/Linux can you advise me to go for? The... (3 Replies)
Discussion started by: abrogard
3 Replies

2. Shell Programming and Scripting

Script that will random choose an IP address

Hi, I need to write a bash script that will random choose and login into these below ip addresses. 192.168.116.130 192.168.116.131 192.168.116.132 192.168.116.133 I'm new into scripting and I need to enhance my logic. Below is what i did ... (4 Replies)
Discussion started by: Milon
4 Replies

3. Shell Programming and Scripting

Selecting random columns from large dataset in UNIX

Dear folks I have a large data set which contains 400K columns. I decide to select 50K determined columns from the whole 400K columns. Is there any command in unix which could do this process for me? I need to also mention that I store all of the columns id in one file which may help to select... (5 Replies)
Discussion started by: sajmar
5 Replies

4. Shell Programming and Scripting

Need to generate a file with random data. /dev/[u]random doesn't exist.

Need to use dd to generate a large file from a sample file of random data. This is because I don't have /dev/urandom. I create a named pipe then: dd if=mynamed.fifo do=myfile.fifo bs=1024 count=1024 but when I cat a file to the fifo that's 1024 random bytes: cat randomfile.txt >... (7 Replies)
Discussion started by: Devyn
7 Replies

5. Shell Programming and Scripting

choosing random columns from a file

Hello, I want to choose random columns from big file. for example: My file contain around 21000 columns and I want to randomly extract 4000 columns from this file. Anybody has a solution (may be one liner or a function in perl or awk) for this? Thanks, R (2 Replies)
Discussion started by: ryan9011
2 Replies

6. What is on Your Mind?

Which Tablet to Choose?

Currently in the process of looking for a tablet. Which one is best? Thanks Benjamin Mauerberger (9 Replies)
Discussion started by: hlinks12
9 Replies

7. Ubuntu

expect script for random password and random commands

Hi I am new to expect. Please if any one can help on my issue its really appreciable. here is my issue: I want expect script for random passwords and random commands generation. please can anyone help me? Many Thanks in advance (0 Replies)
Discussion started by: vanid
0 Replies

8. Shell Programming and Scripting

choose y or n

Hi, I have written a choice based shell script some thing like this: if (y) execute code .... fi else if(n) terminating the problem with the above scripting is it will work as far as the options are y or n. but i want to reiterate the same code when the user inputs something else... (1 Reply)
Discussion started by: sunrexstar
1 Replies

9. Shell Programming and Scripting

choose random text between constant string.. using awk?

Hallo I have maybe a little bit advanced request.... I need to choose one random part betwen %.... so i have this.. % text1 text1 text1 text1 text1 text1 text1 text1 text1 % text2 text2 text2 text2 text2 % text3 text3 text3 tetx3 % this choose text between % awk ' /%/... (8 Replies)
Discussion started by: sandwich
8 Replies
Login or Register to Ask a Question