Extract certain columns from big data Post: 302821475

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to cut some data from big file

How to cut data from big file my file around 30 gb I tried "head -50022172 filename > newfile.txt ,and tail -5454283 newfile.txt. It's slowy. afer that I tried sed -n '46467831,50022172p' filename > newfile.txt ,also slow Please recommend me , faster command to cut some data from...

2. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18...

3. Shell Programming and Scripting

Transpose columns to Rows : Big data

Hi, I did read a few posts on the subjects, tried out a few solutions, but did not solve my problem. https://www.unix.com/302121568-post11.html https://www.unix.com/shell-programming-scripting/137953-large-file-columns-into-rows-etc-4.html Please help. Problem very similar to the second link...

4. Shell Programming and Scripting

Sort a big data file

Hello, I have a big data file (160 MB) full of records with pipe(|) delimited those fields. I`m sorting the file on the first field. I'm trying to sort with "sort" command and it brings me 6 minutes. I have tried with some transformation methods in perl but it results "Out of memory". I was...

5. Red Hat

Linux in Big Data projects

Hey guys, we will be interested in learning from your experience in using Linux in Big Data projects. Has anyone used Hadoop, or MapR or Horton Works on Linux and any experiences you may have had on these. I am more interested in knowing if a certain distribution of Linux is better supported for...

6. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is...

7. What is on Your Mind?

Big Data for System Admins

Hello, I have been working as Solaris/Linux Admin since past 8 years. I am looking options for my profile change, but there is some limitation. I worked as 24x7 support for admin, server support, high availability, etc. But been worked on developing side and scripting part. When I search for Big...

8. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'...

9. Shell Programming and Scripting

Want to extract certain lines from big file

Hi All, I am trying to get some lines from a file i did it with while-do-loop. since the files are huge it is taking much time. now i want to make it faster. The requirement is the file will be having 1 million lines. The format is like below. ##transaction, , , ,blah, blah...

10. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ...

LEARN ABOUT DEBIAN

h5fromtxt

H5FROMTXT(1)							      h5utils							      H5FROMTXT(1)

NAME

       h5fromtxt - convert text input to an HDF5 file

SYNOPSIS

       h5fromtxt [OPTION]... [HDF5FILE]

DESCRIPTION

       h5fromtxt takes a series of numbers from standard input and outputs a multi-dimensional numeric dataset in an HDF5 file.

       HDF5  is a free, portable binary format and supporting library developed by the National Center for Supercomputing Applications at the Uni-
       versity of Illinois in Urbana-Champaign.  A single h5 file can contain multiple data sets; by default, h5fromtxt creates a  dataset  called
       "data",	but  this  can	be  changed  via  the -d option, or by using the syntax HDF5FILE:DATASET.  The -a option can be used to append new
       datasets to an existing HDF5 file.

       All characters besides the numbers (and associated decimal points, etcetera) in the input are ignored.  By default, the data is assumed	to
       be a two-dimensional MxN dataset where M is the number of rows (delimited by newlines) and N is the number of columns.  In this case, it is
       an error for the number of columns to vary between rows.  If M or N is 1 then the data is written as a one-dimensional dataset.

       Alternatively, you can specify the dimensions of the data explicitly via the -n size option, where size is e.g.	"2x2x2".   In  this  case,
       newlines  are ignored and the data is taken as an array of the given size stored in row-major ("C") order (where the last index varies most
       quickly as you step through the data).  e.g. a 2x2x2 array would be have the elements listed  in  the  order:  (0,0,0),	(0,0,1),  (0,1,0),
       (0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1).

       A simple example is:

	   h5fromtxt foo.h5 <<EOF
	   1 2 3 4
	   5 6 7 8
	   EOF

       which reads in a 2x4 space-delimited array from standard input.

OPTIONS

       -h     Display help on the command-line options and usage.

       -V     Print the version number and copyright info for h5fromtxt.

       -v     Verbose output.

       -a     If  the  HDF5  output file already exists, append the data as a new dataset rather than overwriting the file (the default behavior).
	      An existing dataset of the same name within the file is overwritten, however.

       -n size
	      Instead of trying to infer the dimensions of the array from the rows and columns of the input, treat the data as a sequence of  num-
	      bers  in row-major order forming an array of dimensions size.  size is of the form MxNxLx... (with M, N, L being numbers) and may be
	      of any dimensionality.

       -T     Transpose the input when it is written, reversing the dimensions.

       -d name
	      Write to dataset name in the output; otherwise, the output dataset is called "data"  by  default.   Alternatively,  use  the  syntax
	      HDF5FILE:DATASET.

BUGS

       Send bug reports to S. G. Johnson, stevenj@alum.mit.edu.

AUTHORS

       Written by Steven G. Johnson.  Copyright (c) 2005 by the Massachusetts Institute of Technology.

h5utils 							   March 9, 2002						      H5FROMTXT(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to cut some data from big file

Discussion started by: almanto

2. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

Discussion started by: patrick87

3. Shell Programming and Scripting

Transpose columns to Rows : Big data

Discussion started by: genehunter

4. Shell Programming and Scripting

Sort a big data file

Discussion started by: rubber08