08-06-2009
How to Pick Random records from a large file
Hi,
I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this?
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
How can i pick distinct records which consists of duplicate data from a ASCII file using UNIX commands (3 Replies)
Discussion started by: ss4u
3 Replies
2. Shell Programming and Scripting
Hello,
I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file.
What will be the besat and fastest way to extract the ne file.
sample file format :--... (2 Replies)
Discussion started by: learner16s
2 Replies
3. Shell Programming and Scripting
Lets say I want to pick a random file when I do an "ls" command. I don't have set number of files in each directory.
ls | head -1
This gives me the first one in each directory, is there a way to do the same but pick a random one. (3 Replies)
Discussion started by: elbombillo
3 Replies
4. Shell Programming and Scripting
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies
5. Shell Programming and Scripting
I have a file that needs to be parsed into multiple files every time there line contains a number 1. the problem i face is the lines are random and the file size is random. an example is that on line 4, 65, 187, 202 & 209 are number 1's so there has to be file breaks between all those to create 4... (6 Replies)
Discussion started by: darbs121
6 Replies
6. Shell Programming and Scripting
Need to use dd to generate a large file from a sample file of random data. This is because I don't have /dev/urandom.
I create a named pipe then:
dd if=mynamed.fifo do=myfile.fifo bs=1024 count=1024
but when I cat a file to the fifo that's 1024 random bytes:
cat randomfile.txt >... (7 Replies)
Discussion started by: Devyn
7 Replies
7. Shell Programming and Scripting
Hello All,
I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using
sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies
8. Shell Programming and Scripting
Hi,
Do anybody knows how to use awk or any command to random print out 1000 number which start from range 1 to 150000?
I know that "rand" in awk can do similar random selection.
But I have no idea how to write a code that can random pick 1000 number from range 1 to 150000 :confused:
... (1 Reply)
Discussion started by: perl_beginner
1 Replies
9. Shell Programming and Scripting
I have a file, named records.txt, containing large number of records, around 0.5 million records in format below:
28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2
28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2
...
Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies
10. Shell Programming and Scripting
Dear folks
I have a large data set which contains 400K columns. I decide to select 50K determined columns from the whole 400K columns. Is there any command in unix which could do this process for me? I need to also mention that I store all of the columns id in one file which may help to select... (5 Replies)
Discussion started by: sajmar
5 Replies
LEARN ABOUT CENTOS
tcftest
TCFTEST(1) Tokyo Cabinet TCFTEST(1)
NAME
tcftest - test cases of the fixed-length database API
DESCRIPTION
The command `tcftest' is a utility for facility test and performance test. This command is used in the following format. `path' specifies
the path of a database file. `rnum' specifies the number of iterations. `width' specifies the width of the value of each record. `lim-
siz' specifies the limit size of the database file.
tcftest write [-mt] [-nl|-nb] [-rnd] path rnum [width [limsiz]]
Store records with keys of 8 bytes. They change as `00000001', `00000002'...
tcftest read [-mt] [-nl|-nb] [-wb] [-rnd] path
Retrieve all records of the database above.
tcftest remove [-mt] [-nl|-nb] [-rnd] path
Remove all records of the database above.
tcftest rcat [-mt] [-nl|-nb] [-pn num] [-dai|-dad|-rl] path rnum [limsiz]]
Store records with partway duplicated keys using concatenate mode.
tcftest misc [-mt] [-nl|-nb] path rnum
Perform miscellaneous test of various operations.
tcftest wicked [-mt] [-nl|-nb] path rnum
Perform updating operations selected at random.
Options feature the following.
-mt : call the function `tcfdbsetmutex'.
-nl : enable the option `FDBNOLCK'.
-nb : enable the option `FDBLCKNB'.
-rnd : select keys at random.
-wb : use the function `tcfdbget4' instead of `tcfdbget2'.
-pn num : specify the number of patterns.
-dai : use the function `tcfdbaddint' instead of `tcfdbputcat'.
-dad : use the function `tcfdbadddouble' instead of `tcfdbputcat'.
-rl : set the length of values at random.
-ru : perform random operation on random key.
This command returns 0 on success, another on failure.
SEE ALSO
tcfmttest(1), tcfmgr(1), tcfdb(3), tokyocabinet(3)
Man Page 2012-08-18 TCFTEST(1)