How to remove a subset of data from a large dataset based on values on one line Post: 302576002

10 More Discussions You Might Find Interesting

1. Programming

I have C++ exe file( no source code) and need to run many large dataset under unix, b

I have C++ exe file( no source code) and need to run many large dataset under unix, but how to know the memeroy usage for one dataset?http://www.codeproject.com/script/Forums/Images/New.gif I think "top" is not good and if using the profiler, it seems no free download, any ideas?

2. Shell Programming and Scripting

remove a specific line in a LARGE file

Hi guys, i have a really big file, and i want to remove a specific line. sed -i '5d' fileThis doesn't really work, it takes a lot of time... The whole script is supposed to remove every word containing less than 5 characters and currently looks like this: #!/bin/bash line="1"...

3. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ...

4. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is...

5. Shell Programming and Scripting

Find line number of bad data in large file

Hi Forum. I was trying to search the following scenario on the forum but was not able to. Let's say that I have a very large file that has some bad data in it (for ex: 0.0015 in the 12th column) and I would like to find the line number and remove that particular line. What's the easiest...

6. UNIX for Advanced & Expert Users

How to extract subset file from dataset?

Hello I have a data set which looks like this : progeny sire dam gender 12 1 3 M 13 2 4 F 14 2 5 F 15 6 5 ...

7. Shell Programming and Scripting

How to read file line by line and compare subset of 1st line with 2nd?

8. Shell Programming and Scripting

Selecting random columns from large dataset in UNIX

Dear folks I have a large data set which contains 400K columns. I decide to select 50K determined columns from the whole 400K columns. Is there any command in unix which could do this process for me? I need to also mention that I store all of the columns id in one file which may help to select...

9. Shell Programming and Scripting

Reoccuring peak values in large data file and print the line..

Hi i have some large data files that contain several fields and rows the data in a field have a numeric value that is in a sine wave pattern what i would like todo is locate each peak and pick the highest value and print that complete line. the data looks something like this it is field nr4 which...

10. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ...

LEARN ABOUT DEBIAN

h5totxt

H5TOTXT(1)							      h5utils								H5TOTXT(1)

NAME

       h5totxt - generate comma-delimited text from 2d slices of HDF5 files

SYNOPSIS

       h5totxt [OPTION]... [HDF5FILE]...

DESCRIPTION

       h5totxt is a utility to generate comma-delimited text (and similar formats) from one-, two-, or more-dimensional slices of numeric datasets
       in HDF5 files.  This way, the data can easily be imported into spreadsheets and similar programs for analysis and visualization.

       HDF5 is a free, portable binary format and supporting library developed by the National Center for Supercomputing Applications at the  Uni-
       versity of Illinois in Urbana-Champaign.  A single h5 file can contain multiple data sets; by default, h5totxt takes the first dataset, but
       this can be changed via the -d option, or by using the syntax HDF5FILE:DATASET.

       By default, the entire dataset is dumped to the output.	in row-major order.  For 3d datasets, this corresponds to a sequence of yz slices,
       in order of increasing x, separated by blank lines.  If -T is specified, outputs in the transposed (column-major) order instead

       Often,  however,  you  want  only a one- or two-dimensional slice of multi-dimensional data.  To do this, you specify coordinates in one or
       more slice dimensions, via the -xyzt options.

       The most basic usage is something like 'h5totxt foo.h5', which will output comma-delimited text to stdout from the data in foo.h5.

OPTIONS

       -h     Display help on the command-line options and usage.

       -V     Print the version number and copyright info for h5totxt.

       -v     Verbose output.

       -o file
	      Send text output to file rather than to stdout (the default).

       -s sep Use the string sep to separate columns of the output rather than a comma (the default).

       -x ix, -y iy, -z iz, -t it
	      This tells h5totxt to use a particular slice of a multi-dimensional dataset.  e.g.  -x causes a yz plane (of a  3d  dataset)  to	be
	      used,  at an x index of ix (where the indices run from zero to one less than the maximum index in that direction).  Here, x/y/z cor-
	      respond to the first/second/third dimensions of the HDF5 dataset. The -t option specifies a slice in the last  dimension,  whichever
	      that might be.  See also the -0 option to shift the origin of the x/y/z slice coordinates to the dataset center.

       -0     Shift  the  origin  of  the x/y/z slice coordinates to the dataset center, so that e.g. -0 -x 0 (or more compactly -0x0) returns the
	      central x plane of the dataset instead of the edge x plane.  (-t coordinates are not affected.)

       -T     Transpose the data (interchange the dimension ordering).	By default, no transposition is done.

       -. numdigits
	      Output numdigits digits after the decimal point (defaults to 16).

       -d name
	      Use dataset name from the input files; otherwise, the first  dataset  from  each	file  is  used.   Alternatively,  use  the  syntax
	      HDF5FILE:DATASET,  which allows you to specify a different dataset for each file.  You can use the h5ls command (included with hdf5)
	      to find the names of datasets within a file.

BUGS

       Send bug reports to S. G. Johnson, stevenj@alum.mit.edu.

AUTHORS

       Written by Steven G. Johnson.  Copyright (c) 2005 by the Massachusetts Institute of Technology.

h5utils 							   March 9, 2002							H5TOTXT(1)