Randomize a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Randomize a file
# 1  
Old 05-20-2013
Randomize a file

Hi,

I have a large file that looks like this:

Code:
@FCC189PACXX:2:1101:1420:2139/1
AGCGAGACTCCGTCTCAAAAAGAAAAAATTTTTCAAAATATTGCAATGGGCTTGTAATTTCTGCTTAAATGTCAGGAGGTCTGAGCCATT
+
bbbeeeceggggghiiiiiiiiiihfihihiiihhhghiihhihifhihiihhhhhhhhiiigfggggdceeeeebdcc^``bbcbccbb
@FCC189PACXX:2:1101:1368:2220/1
AGCTGTTGTCCTCCTCATTCTCGCAATCGCTGCAGTGCTCATGGCCGTTCCCGTTCCCTTCCATCATCTGCGGCAGCGGAGGTGCCCGCG
+
___eeceegacegeghihhdf`ggfhi`ebeWeccegcfdgihih_cceefcgghedf_\\`ddZbYZ^RZX`]^BBBBBBBBBBBBBBB
@FCC189PACXX:2:1101:2390:2054/1
NAAGTGGTAAATAATAAGCCTCCTGACTTGTGGTTGTAGCCTCATTTCTTGGTGTATGACAATTGACACTTCTGTAGAAAACCTGCTGGC
+
BP\cccecgggggiiiiiiiiiiiiiiiihhhhhiiiiiiiiihiiiiiiihgfhigiihiegifffhiiiiiehggggggddeeeddcc
@FCC189PACXX:2:1101:2484:2119/1
CAAAATCACTAAAATGCCCTGCCAGTCATTACAACCAGAACCAATAAACACCCCAACACACACAAAACAACAGTTGAAGGCATCCCTGGG
+
bbbeeeeeggggfiiiiiiiiiihifhihiiiihiihihiiihhiiiiiiiiifiiiiiiiigggeeeeccccddb`_bb`abccccc^a
@FCC189PACXX:2:1101:2450:2153/1
AAGCTTTCTTGAAATTAAGTATGTCATAACCTTCATTTCTGTTATGTGTAGCTGGCAGAGAGAGACAAGAATAAGAAACTTTGGAGGGCG
+
bbbeeeeegggggiiiiiiihihiiihiiiiiiiihiiiiihhhhihihhhiiihiihhihihhhhhiifghcbfdfggggggeedecba

These four lines would be a feature

Code:
@FCC189PACXX:2:1101:2450:2153/1
AAGCTTTCTTGAAATTAAGTATGTCATAACCTTCATTTCTGTTATGTGTAGCTGGCAGAGAGAGACAAGAATAAGAAACTTTGGAGGGCG
+
bbbeeeeegggggiiiiiiihihiiihiiiiiiiihiiiiihhhhihihhhiiihiihhihihhhhhiifghcbfdfggggggeedecba

Asssuming you can randomize each feature (every four lines). Not sure if I am clear here.

Thanks
# 2  
Old 05-20-2013
Try this:

Code:
sed 's/@/\x00/g' infile | shuf -z | sed 's/\x00/@/g' > outfile

# 3  
Old 05-22-2013
hi thanks for the code however I think my file is too big. I get this message:

Code:
shuf: memory exhausted


is there a way around it? my file is like 7 GB
# 4  
Old 05-23-2013
Are all the records the same length? In your sample data they all appear to be 216 characters long.

If so you might be able to use this perl program, it takes 1 argument which is the name of your input file and outputs the random shuffle of the records.

Code:
#!/usr/bin/perl
my $rsz = 216;
my $fsize = -s $ARGV[0];
my $record;

my @recs = (0..($fsize/$rsz)-1);

fisher_yates_shuffle(\@recs);

open(fp, $ARGV[0]) || die;
for (my $c = 0; $c < $#recs; $c++ ) {
        seek(fp, $recs[$c]*$rsz, 0);
        read(fp, $record, $rsz);
        print $record;
}
close(fp);

sub fisher_yates_shuffle {
    my $array = shift;
    my $i;
    for ($i = @$array; --$i; ) {
        my $j = int rand ($i+1);
        next if $i == $j;
        @$array[$i,$j] = @$array[$j,$i];
    }
}


Last edited by Chubler_XL; 05-23-2013 at 06:38 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script (sh file) logic to compare contents of one file with another file and output to file

Shell script logic Hi I have 2 input files like with file 1 content as (file1) "BRGTEST-242" a.txt "BRGTEST-240" a.txt "BRGTEST-219" e.txt File 2 contents as fle(2) "BRGTEST-244" a.txt "BRGTEST-244" b.txt "BRGTEST-231" c.txt "BRGTEST-231" d.txt "BRGTEST-221" e.txt I want to get... (22 Replies)
Discussion started by: pottic
22 Replies

2. Shell Programming and Scripting

Randomize columns in CSV file

Hi there, friends! Writing exams again! This time my wish would be to randomize certain columns in a csv file. Given a file containing records consisting of 3 columns tab-separated: A B C A B C A B C I would love to get the columns of each record in random order...separated by a tab as... (12 Replies)
Discussion started by: eldeingles
12 Replies

3. Shell Programming and Scripting

Compare 2 text file with 1 column in each file and write mismatch data to 3rd file

Hi, I need to compare 2 text files with around 60000 rows and 1 column. I need to compare these and write the mismatch data to 3rd file. File1 - file2 = file3 wc -l file1.txt 58112 wc -l file2.txt 55260 head -5 file1.txt 101214200123 101214700300 101250030067 101214100500... (10 Replies)
Discussion started by: Divya Nochiyil
10 Replies

4. Shell Programming and Scripting

Randomize a matrix

--please have a look at my third post in this thread! there I explained it more clearly-- Hey guys. I posted a complex problem few days back. No reply! :| Here is simplified question: I have a matrix with 0/1: * col1 col2 col3 row1 1 0 1 row2 0 0 ... (5 Replies)
Discussion started by: @man
5 Replies

5. Shell Programming and Scripting

Match list of strings in File A and compare with File B, C and write to a output file in CSV format

Hi Friends, I'm a great fan of this forum... it has helped me tone my skills in shell scripting. I have a challenge here, which I'm sure you guys would help me in achieving... File A has a list of job ids and I need to compare this with the File B (*.log) and File C (extend *.log) and copy... (6 Replies)
Discussion started by: asnandhakumar
6 Replies

6. Shell Programming and Scripting

Randomize letters

Hi, Is there a tool somewhat parallel to rev, but which randomizes instead of reverses? I've tried rl, but I can only get it to randomize words. I was hoping for something like this echo "hello" | ran leolh less simpler solutions are also welcome. Sorry if the question is... (21 Replies)
Discussion started by: jeppe83
21 Replies
Login or Register to Ask a Question