Sponsored Content
Top Forums Shell Programming and Scripting Very big text file - Too slow! Post 302515291 by fedonMan on Tuesday 19th of April 2011 05:25:24 PM
Old 04-19-2011
Quote:
Originally Posted by Corona688
When I asked you why they had to be random, you said they didn't, that's just the only way you knew how to read lines. So I didn't use random lines.

If you've changed your mind, you can get 10,000 random non-duplicate lines with
Code:
sort -R <inputfile | head -n 10000 | awk -v FS='>' '{
        # Get stuff after second >, and convert it all to uppercase
        $0=toupper($3);
        # Substitute every non-alpha char with space
        gsub(/[^A-Z]/, " ");
        # Split on spaces, into array A
        split($0, A, " ");
        # count words
        for(key in A)   count[A[key]]++;
}
END {
        # print words
        for(key in count)
                printf("%s\t%d\n", key, count[key]);
}' < filename

hehe i didn't mean i don't want randomness, i just don't know how to do it without sed.Smilie

Hm, it displays an error as -R is an invalid option. Also, if inputfile is the input file... what is the purpose of filename?
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies

2. AIX

How to send big files over slow network?

Hi, I am trying to send oracle archives over WAN and it is taking hell a lot of time. To reduce the time, I tried to gzip the files and send over to the other side. That seems to reduce the time. Does anybody have experienced this kind of problem and any possible ways to reduce the time. ... (1 Reply)
Discussion started by: giribt
1 Replies

3. Shell Programming and Scripting

Cut big text file into 2

I have a big text file. I want to cut it into 2 pieces at known point or I know the pattern of the contents from where it can separate the files. Is there any quick command/solution? (4 Replies)
Discussion started by: sandy221
4 Replies

4. UNIX for Dummies Questions & Answers

How to slow down text output?

I found some ascii art that is animated (vt100) and would like to view it properly. However, when I try using 'cat', the file is done almost the instant I press enter. How can I view the file in a slower fashion (similar to the days of 2400baud, for example)? (2 Replies)
Discussion started by: Fangs McWolf
2 Replies

5. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

6. Shell Programming and Scripting

Helping in parsing subset of text from a big results file

Hi All, I need some help to effectively parse out a subset of results from a big results file. Below is an example of the text file. Each block that I need to parse starts with "reading sequence file 10.codon" (next block starts with another number) and ends with **p-Value(s)**. I have given... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

7. UNIX for Advanced & Expert Users

sed working slow on big files

HI Experts , I'm using the following code to remove spaces appearing at the end of the file. sed "s/*$//g" <filename> > <new_filename> mv <new_filename> <filename> this is working fine for volumes upto 20-25 GB. for the bigger files it is taking more time that it is required... (5 Replies)
Discussion started by: sumoka
5 Replies

8. UNIX and Linux Applications

tabbed text editor without big libraries

I am looking for a tabbed text editor without a big library like gnome, kde, and gtk, I know about gedit, kate with extensions, geany, and bluefish. I would prefer it to be like gedit and be really light weight. So if anyone knows of a text editor that doesn't require those big libraries please let... (3 Replies)
Discussion started by: cokedude
3 Replies

9. Shell Programming and Scripting

Improve script - slow process with big files

Gents, Please can u help me to improve this script to be more faster, it works perfectly but for big files take a lot time to end the job.. I see the problem is in the step (while) and in this part the script takes a lot time.. Please if you can find a best way to do will be great. ... (13 Replies)
Discussion started by: jiam912
13 Replies
rl(1)								   User Commands							     rl(1)

NAME
rl - Randomize Lines. SYNOPSIS
rl [OPTION]... [FILE]... DESCRIPTION
rl reads lines from a input file or stdin, randomizes the lines and outputs a specified number of lines. It does this with only a single pass over the input while trying to use as little memory as possible. -c, --count=N Select the number of lines to be returned in the output. If this argument is omitted all the lines in the file will be returned in random order. If the input contains less lines than specified and the --reselect option below is not specified a warning is printed and all lines are returned in random order. -r, --reselect When using this option a single line may be selected multiple times. The default behaviour is that any input line will only be selected once. This option makes it possible to specify a --count option with more lines than the file actually holds. -o, --output=FILE Send randomized lines to FILE instead of stdout. -d, --delimiter=DELIM Use specified character as a "line" delimiter instead of the newline character. -0, --null Input lines are terminated by a null character. This option is useful to process the output of the GNU find -print0 option. -n, --line-number Output lines are numbered with the line number from the input file. -q, --quiet, --silent Be quiet about any errors or warnings. -h, --help Show short summary of options. -v, --version Show version of program. EXAMPLES
Some simple demonstrations of how rl can help you do everyday tasks. Play a random sound after 4 minutes (perfect for toast): sleep 240 ; play `find /sounds -name '*.au' -print | rl --count=1` Play the 15 most recent .mp3 files in random order. ls -c *.mp3 | head -n 15 | rl | xargs --delimiter=' ' play Roll a dice: seq 6 | rl --count 2 Roll a dice 1000 times and see which number comes up more often: seq 6 | rl --reselect --count 1000 | sort | uniq -c | sort -n Shuffle the words of a sentence: echo -n "The rain in Spain stays mainly in the plain." | rl --delimiter=' ';echo Find all movies and play them in random order. find . -name '*.avi' -print0 | rl -0 | xargs -n 1 -0 mplayer Because -0 is used filenames with spaces (even newlines and other unusual characters) in them work. BUGS
The program currently does not have very smart memory management. If you feed it huge files and expect it to fully randomize all lines it will completely read the file in memory. If you specify the --count option it will only use the memory required for storing the specified number of lines. Improvements on this area are on the TODO list. The program uses the rand() system random function. This function returns a number between 0 and RAND_MAX, which may not be very large on some systems. This will result in non-random results for files containing more lines than RAND_MAX. Note that if you specify multiple input files they are randomized per file. This is a different result from when you cat all the files and pipe the result into rl. COPYRIGHT
Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Arthur de Jong. This is free software; see the license for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Version 0.2.7 Jul 2008 rl(1)
All times are GMT -4. The time now is 04:19 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy