Extract certain columns from big data Post: 302821485

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to cut some data from big file

How to cut data from big file my file around 30 gb I tried "head -50022172 filename > newfile.txt ,and tail -5454283 newfile.txt. It's slowy. afer that I tried sed -n '46467831,50022172p' filename > newfile.txt ,also slow Please recommend me , faster command to cut some data from...

2. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18...

3. Shell Programming and Scripting

Transpose columns to Rows : Big data

Hi, I did read a few posts on the subjects, tried out a few solutions, but did not solve my problem. https://www.unix.com/302121568-post11.html https://www.unix.com/shell-programming-scripting/137953-large-file-columns-into-rows-etc-4.html Please help. Problem very similar to the second link...

4. Shell Programming and Scripting

Sort a big data file

Hello, I have a big data file (160 MB) full of records with pipe(|) delimited those fields. I`m sorting the file on the first field. I'm trying to sort with "sort" command and it brings me 6 minutes. I have tried with some transformation methods in perl but it results "Out of memory". I was...

5. Red Hat

Linux in Big Data projects

Hey guys, we will be interested in learning from your experience in using Linux in Big Data projects. Has anyone used Hadoop, or MapR or Horton Works on Linux and any experiences you may have had on these. I am more interested in knowing if a certain distribution of Linux is better supported for...

6. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is...

7. What is on Your Mind?

Big Data for System Admins

Hello, I have been working as Solaris/Linux Admin since past 8 years. I am looking options for my profile change, but there is some limitation. I worked as 24x7 support for admin, server support, high availability, etc. But been worked on developing side and scripting part. When I search for Big...

8. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'...

9. Shell Programming and Scripting

Want to extract certain lines from big file

Hi All, I am trying to get some lines from a file i did it with while-do-loop. since the files are huge it is taking much time. now i want to make it faster. The requirement is the file will be having 1 million lines. The format is like below. ##transaction, , , ,blah, blah...

10. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ...

LEARN ABOUT NETBSD

recno

RECNO(3)						   BSD Library Functions Manual 						  RECNO(3)

NAME

     recno -- record number database access method

SYNOPSIS

     #include <sys/types.h>
     #include <db.h>

DESCRIPTION

     The routine dbopen() is the library interface to database files.  One of the supported file formats is record number files.  The general
     description of the database access methods is in dbopen(3), this manual page describes only the recno specific information.

     The record number data structure is either variable or fixed-length records stored in a flat-file format, accessed by the logical record num-
     ber.  The existence of record number five implies the existence of records one through four, and the deletion of record number one causes
     record number five to be renumbered to record number four, as well as the cursor, if positioned after record number one, to shift down one
     record.

     The recno access method specific data structure provided to dbopen() is defined in the <db.h> include file as follows:

     typedef struct {
	     u_long flags;
	     u_int cachesize;
	     u_int psize;
	     int lorder;
	     size_t reclen;
	     uint8_t bval;
	     char *bfname;
     } RECNOINFO;

     The elements of this structure are defined as follows:

     flags	 The flag value is specified by or'ing any of the following values:

		       R_FIXEDLEN   The records are fixed-length, not byte delimited.  The structure element reclen specifies the length of the
				    record, and the structure element bval is used as the pad character.  Any records, inserted into the database,
				    that are less than reclen bytes long are automatically padded.

		       R_NOKEY	    In the interface specified by dbopen(), the sequential record retrieval fills in both the caller's key and
				    data structures.  If the R_NOKEY flag is specified, the cursor routines are not required to fill in the key
				    structure.	This permits applications to retrieve records at the end of files without reading all of the
				    intervening records.

		       R_SNAPSHOT   This flag requires that a snapshot of the file be taken when dbopen() is called, instead of permitting any
				    unmodified records to be read from the original file.

     cachesize	 A suggested maximum size, in bytes, of the memory cache.  This value is only advisory, and the access method will allocate more
		 memory rather than fail.  If cachesize is 0 (no size is specified) a default cache is used.

     psize	 The recno access method stores the in-memory copies of its records in a btree.  This value is the size (in bytes) of the pages
		 used for nodes in that tree.  If psize is 0 (no page size is specified) a page size is chosen based on the underlying file system
		 I/O block size.  See btree(3) for more information.

     lorder	 The byte order for integers in the stored database metadata.  The number should represent the order as an integer; for example,
		 big endian order would be the number 4,321.  If lorder is 0 (no order is specified) the current host order is used.

     reclen	 The length of a fixed-length record.

     bval	 The delimiting byte to be used to mark the end of a record for variable-length records, and the pad character for fixed-length
		 records.  If no value is specified, newlines (``
'') are used to mark the end of variable-length records and fixed-length
		 records are padded with spaces.

     bfname	 The recno access method stores the in-memory copies of its records in a btree.  If bfname is non-NULL, it specifies the name of
		 the btree file, as if specified as the file name for a dbopen() of a btree file.

     The data part of the key/data pair used by the recno access method is the same as other access methods.  The key is different.  The data
     field of the key should be a pointer to a memory location of type recno_t, as defined in the <db.h> include file.	This type is normally the
     largest unsigned integral type available to the implementation.  The size field of the key should be the size of that type.

     Because there can be no meta-data associated with the underlying recno access method files, any changes made to the default values (e.g.,
     fixed record length or byte separator value) must be explicitly specified each time the file is opened.

     In the interface specified by dbopen(), using the put interface to create a new record will cause the creation of multiple, empty records if
     the record number is more than one greater than the largest record currently in the database.

ERRORS

     The recno access method routines may fail and set errno for any of the errors specified for the library routine dbopen(3) or the following:

     EINVAL		An attempt was made to add a record to a fixed-length database that was too large to fit.

SEE ALSO

     btree(3), dbopen(3), hash(3), mpool(3)

     Michael Stonebraker, Heidi Stettner, Joseph Kalash, Antonin Guttman, and Nadene Lynn, "Document Processing in a Relational Database System",
     Memorandum No. UCB/ERL M82/32, May 1982.

BUGS

     Only big and little endian byte order is supported.

BSD
								  April 17, 2003							       BSD

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to cut some data from big file

Discussion started by: almanto

2. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

Discussion started by: patrick87

3. Shell Programming and Scripting

Transpose columns to Rows : Big data

Discussion started by: genehunter

4. Shell Programming and Scripting

Sort a big data file

Discussion started by: rubber08