A script that processes a sample of a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting A script that processes a sample of a file
# 1  
Old 06-07-2013
A script that processes a sample of a file

hi all, I need some help in regards of how to process just a sample from a large .txt file

I have a large file from many new lines (say above 200.000 new lines), I need a script that process just a sample of it, say 10.000 bur a random sample (taking rows from top top to the the bottom)

Could someone help? I will be happy, if you could also enable me to give the number of sample as an input - for example, if I need 20.000 instead of 10.000 to give the 20.000 as input

Thanks a lot!
# 2  
Old 06-07-2013
Code:
awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' file

Should give you ca 10% of the file
This User Gave Thanks to Jotne For This Post:
# 3  
Old 06-07-2013
Could you please let me know how do you know that it will give ca 10%, for example; if I know the number of rows before hand; how I should play with the

Code:
(rand() <= .01)

parameter, which I think does the trick?

Thank you very much!
# 4  
Old 06-07-2013
I found this using google. It seem that if you change the parameter you get more and less data.
# 5  
Old 06-07-2013
Quote:
Originally Posted by Jotne
Code:
awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}' file

Should give you ca 10% of the file
According to the standards, awk's rand() function returns a pseudo-random number x such that 0 < x < 1. If we assume that the pseudo-random number is uniformly distributed (which is not required by the standards), then (rand() <= .01) should be true approximately 1% of the time. For 10%, the test would be ICODE](rand() <= .1)[/ICODE] (again, assuming rand() produces a uniformly distributed set of values).

Note that when using "random" numbers, there is certainly no guarantee that using (rand() <= 10000/200000) as the test in Jotne's script will print exactly 10,000 lines from a 200,000 line file.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sample shell script to add a user

Sample shell script to add a user: Today i learn how to add a user to linux server with a password. #!/bin/bash # Script to add a user to Linux system if ; then read -p "Enter username : " username read -s -p "Enter password : " password egrep "^$username" /etc/passwd >/dev/null if ;... (0 Replies)
Discussion started by: ulaxmi
0 Replies

2. Shell Programming and Scripting

Sample Script

Below is the code. Its the 1st line of a file. How can I remove the bracket and display like below. 123 web int 1 09:30:45 2013 I dont want to use AWK or SED or PERL. I need to use only the bash shell scripting commands to do it. (3 Replies)
Discussion started by: ghosh_tanmoy
3 Replies

3. Shell Programming and Scripting

Any Sample ksh script to validate all the database objects

Hi, I need a sample script to validate all the database objects like 1. table structure(columns, datatypes,key contraints if any) 2. synonyms 3. grants 4. indexes ....etc thank you! (2 Replies)
Discussion started by: srikanth_sagi
2 Replies

4. Shell Programming and Scripting

Creating a larger .xml file from a template(sample file)

Dear All, I have a template xml file like below. ....Some---Header....... <SignalPreference> ... <SignalName>STRING</SignalName> ... </SignalPreference> ......Some formatting text....... <SignalPreference> ......... ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

5. Shell Programming and Scripting

Typo in sample script from book?

Hello, I'm new to this forum, and I apologize in advance if I did something wrong here. I am pretty stumped here as I am still getting the error message, "./comc1.sh: test: argument expected." after executing the script itself. Here's the script file I modified: I tried executing line 4... (1 Reply)
Discussion started by: ConcealedKnight
1 Replies

6. Virtualization and Cloud Computing

HPVM log file location and a sample

1. Can somebody tell me the log file location of HPVM where all the events of guest OS are reported ? 2. And if possible a log file with important events in it ? (1 Reply)
Discussion started by: thegeek
1 Replies

7. Shell Programming and Scripting

How to generate sample records from a file

i have a file having 30 million records.i want to generate a file having say 5% of total records in another file. the records in the new file shud be randomly generated. (1 Reply)
Discussion started by: Nishithinfy
1 Replies

8. Shell Programming and Scripting

sample of script that control a daemon

Hi everybody, Does somebody has a sample of script that control a daemon? for example use loop until the daemon is on and if is not on do something else? Thanks Pier (0 Replies)
Discussion started by: pierrelaval
0 Replies

9. Shell Programming and Scripting

Sample Unix script file to convert .xml to .csv

Dear all, Can you send me a script file the changes .xml to .csv file. Thanks, Srinivasa (4 Replies)
Discussion started by: srinivasaphani
4 Replies
Login or Register to Ask a Question