How to Pick Random records from a large file Post: 302341798

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

how to pick distinct records..........

How can i pick distinct records which consists of duplicate data from a ASCII file using UNIX commands

2. Shell Programming and Scripting

Extract data from large file 80+ million records

Hello, I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file. What will be the besat and fastest way to extract the ne file. sample file format :--...

3. Shell Programming and Scripting

Pick random file from ls command.

Lets say I want to pick a random file when I do an "ls" command. I don't have set number of files in each directory. ls | head -1 This gives me the first one in each directory, is there a way to do the same but pick a random one.

4. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,...

5. Shell Programming and Scripting

Parse large file on line count (random lines)

I have a file that needs to be parsed into multiple files every time there line contains a number 1. the problem i face is the lines are random and the file size is random. an example is that on line 4, 65, 187, 202 & 209 are number 1's so there has to be file breaks between all those to create 4...

6. Shell Programming and Scripting

Need to generate a file with random data. /dev/[u]random doesn't exist.

Need to use dd to generate a large file from a sample file of random data. This is because I don't have /dev/urandom. I create a named pipe then: dd if=mynamed.fifo do=myfile.fifo bs=1024 count=1024 but when I cat a file to the fifo that's 1024 random bytes: cat randomfile.txt >...

7. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record...

8. Shell Programming and Scripting

[Solved] Help with random pick 1000 number from range 1 to 150000

Hi, Do anybody knows how to use awk or any command to random print out 1000 number which start from range 1 to 150000? I know that "rand" in awk can do similar random selection. But I have no idea how to write a code that can random pick 1000 number from range 1 to 150000 :confused: ...

9. Shell Programming and Scripting

Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below: 28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2 28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2 ... Another file is a key file, named key.txt, which is the list of some numbers in the first column of...

10. Shell Programming and Scripting

Selecting random columns from large dataset in UNIX

Dear folks I have a large data set which contains 400K columns. I decide to select 50K determined columns from the whole 400K columns. Is there any command in unix which could do this process for me? I need to also mention that I store all of the columns id in one file which may help to select...

LEARN ABOUT CENTOS

alloc_hugepages

ALLOC_HUGEPAGES(2)					     Linux Programmer's Manual						ALLOC_HUGEPAGES(2)

NAME

       alloc_hugepages, free_hugepages - allocate or free huge pages

SYNOPSIS

       void *alloc_hugepages(int key, void *addr, size_t len,
			     int prot, int flag);

       int free_hugepages(void *addr);

DESCRIPTION

       The  system calls alloc_hugepages() and free_hugepages() were introduced in Linux 2.5.36 and removed again in 2.5.54.  They existed only on
       i386 and ia64 (when built with CONFIG_HUGETLB_PAGE).  In Linux 2.4.20 the syscall numbers exist, but the calls fail with the error ENOSYS.

       On i386 the memory management hardware knows about ordinary pages (4 KiB) and huge pages (2 or 4 MiB).  Similarly  ia64	knows  about  huge
       pages of several sizes.	These system calls serve to map huge pages into the process's memory or to free them again.  Huge pages are locked
       into memory, and are not swapped.

       The key argument is an identifier.  When zero the pages are private, and not inherited by children.  When positive  the	pages  are  shared
       with other applications using the same key, and inherited by child processes.

       The addr argument of free_hugepages() tells which page is being freed: it was the return value of a call to alloc_hugepages().  (The memory
       is first actually freed when all users have released it.)  The addr argument of alloc_hugepages() is a hint, that the kernel may or may not
       follow.	Addresses must be properly aligned.

       The len argument is the length of the required segment.	It must be a multiple of the huge page size.

       The prot argument specifies the memory protection of the segment.  It is one of PROT_READ, PROT_WRITE, PROT_EXEC.

       The  flag  argument  is	ignored, unless key is positive.  In that case, if flag is IPC_CREAT, then a new huge page segment is created when
       none with the given key existed.  If this flag is not set, then ENOENT is returned when no segment with the given key exists.

RETURN VALUE

       On success, alloc_hugepages() returns the allocated virtual address, and free_hugepages() returns zero.	On  error,  -1	is  returned,  and
       errno is set appropriately.

ERRORS

       ENOSYS The system call is not supported on this kernel.

FILES

       /proc/sys/vm/nr_hugepages Number of configured hugetlb pages.  This can be read and written.

       /proc/meminfo  Gives  info  on  the  number  of	configured  hugetlb  pages  and  on  their  size  in  the three variables HugePages_Total,
       HugePages_Free, Hugepagesize.

CONFORMING TO

       These calls are specific to Linux on Intel processors, and should not be used in programs intended to be portable.

NOTES

       These system calls are gone; they existed only in Linux 2.5.36 through to 2.5.54.  Now the hugetlbfs file system can be used instead.  Mem-
       ory backed by huge pages (if the CPU supports them) is obtained by using mmap(2) to map files in this virtual file system.

       The maximal number of huge pages can be specified using the hugepages= boot parameter.

COLOPHON

       This  page is part of release 3.53 of the Linux man-pages project.  A description of the project, and information about reporting bugs, can
       be found at http://www.kernel.org/doc/man-pages/.

Linux								    2007-05-31							ALLOC_HUGEPAGES(2)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

how to pick distinct records..........

Discussion started by: ss4u

2. Shell Programming and Scripting

Extract data from large file 80+ million records

Discussion started by: learner16s

3. Shell Programming and Scripting

Pick random file from ls command.

Discussion started by: elbombillo

4. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Discussion started by: kam66