How to remove a subset of data from a large dataset based on values on one line Post: 302576277

10 More Discussions You Might Find Interesting

1. Programming

I have C++ exe file( no source code) and need to run many large dataset under unix, b

I have C++ exe file( no source code) and need to run many large dataset under unix, but how to know the memeroy usage for one dataset?http://www.codeproject.com/script/Forums/Images/New.gif I think "top" is not good and if using the profiler, it seems no free download, any ideas?

2. Shell Programming and Scripting

remove a specific line in a LARGE file

Hi guys, i have a really big file, and i want to remove a specific line. sed -i '5d' fileThis doesn't really work, it takes a lot of time... The whole script is supposed to remove every word containing less than 5 characters and currently looks like this: #!/bin/bash line="1"...

3. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ...

4. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is...

5. Shell Programming and Scripting

Find line number of bad data in large file

Hi Forum. I was trying to search the following scenario on the forum but was not able to. Let's say that I have a very large file that has some bad data in it (for ex: 0.0015 in the 12th column) and I would like to find the line number and remove that particular line. What's the easiest...

6. UNIX for Advanced & Expert Users

How to extract subset file from dataset?

Hello I have a data set which looks like this : progeny sire dam gender 12 1 3 M 13 2 4 F 14 2 5 F 15 6 5 ...

7. Shell Programming and Scripting

How to read file line by line and compare subset of 1st line with 2nd?

8. Shell Programming and Scripting

Selecting random columns from large dataset in UNIX

Dear folks I have a large data set which contains 400K columns. I decide to select 50K determined columns from the whole 400K columns. Is there any command in unix which could do this process for me? I need to also mention that I store all of the columns id in one file which may help to select...

9. Shell Programming and Scripting

Reoccuring peak values in large data file and print the line..

Hi i have some large data files that contain several fields and rows the data in a field have a numeric value that is in a sine wave pattern what i would like todo is locate each peak and pick the highest value and print that complete line. the data looks something like this it is field nr4 which...

10. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ...

LEARN ABOUT DEBIAN

h5tovtk

H5TOVTK(1)							      h5utils								H5TOVTK(1)

NAME

       h5tovtk - convert datasets in HDF5 files to VTK format

SYNOPSIS

       h5tovtk [OPTION]... [HDF5FILE]...

DESCRIPTION

       h5tovtk	is a program to generate VTK data files from multidimensional datasets in HDF5 files.  VTK, the Visualization ToolKit, is an open-
       source, freely available software system for 3D computer graphics, image processing,  and  visualization.   VTK	itself	is  a  programming
       library, but it is also the basis for a number of end-user graphical visualization programs.

       HDF5  is a free, portable binary format and supporting library developed by the National Center for Supercomputing Applications at the Uni-
       versity of Illinois in Urbana-Champaign.  A single h5 file can contain multiple datasets; by default, h5tovtk takes the first dataset,  but
       this can be changed via the -d option, or by using the syntax HDF5FILE:DATASET.

       1d/2d/3d  datasets are converted into 3d VTK datasets.  Normally, a single scalar VTK dataset is output, but vectors and fields can be out-
       put via the -o option below.

       A typical invocation is of the form 'h5tovtk foo.h5', which will output a VTK data file foo.vtk from the data in foo.h5.

OPTIONS

       -h     Display help on the command-line options and usage.

       -V     Print the version number and copyright info for h5tovtk.

       -v     Verbose output.

       -o file
	      Save all the input datasets to a single VTK file.  If there is only one dataset, it is output to a VTK scalar dataset; if there  are
	      three datasets, they are output as a VTK vector dataset; all other numbers of datasets are combined into a VTK field dataset.

	      Otherwise,  the  default behavior is to save each dataset to a separate VTK file, with the .h5 suffix of the input filename replaced
	      by .vtk in the output filename.

	      Only three-dimensional datasets may be written to the VTK file.  If you have a four (or more) dimensional data set,  then  you  must
	      take  a  three-dimensional "slice" of the multi-dimensional data.  To do this, you specify coordinates in one (or more) slice dimen-
	      sion(s), via the -xyzt options.

       -1, -2, -4
	      Use 1 , 2, or 4 bytes to store each data point in the output file.  Fewer bytes require less storage and memory, but  will  decrease
	      the  resolution in the values.  -1 will break up the data values into one of 256 possible values (on a linear scale from the minimum
	      to the maximum value in your data), -2 will allow 65536 possible values, and -4 (the default) will use 4-byte floating-point numbers
	      for an "exact" representation.

       -a     Output in ASCII format; otherwise, VTK's more compact, but less readable and somewhat less portable binary format is used.

       -n     For  binary  output  (see  -a  above),  by default the data is written in bigendian byte order, which is normally the order that VTK
	      expects.	However, some external tools and a few VTK classes use the native byte ordering instead (which may not be bigendian),  and
	      the -n option causes h5tovtk to output binary data in the native ordering.

       -m min, -M max
	      When  -1	or -2 are used, the input data are converted to a linear integer scale.  Normally, the bottom and top of this scale corre-
	      spond to the minimum and maximum values in the data.  Using the -m and -M options, you can make the bottom and top of the scale cor-
	      respond  to  min	and  max  instead, respectively.  Data values below or above this range will be treated as if they were min or max
	      respectively.  See also the -Z option.

       -Z     For -1 or -2 output, center the linear integer scale on the value zero in the data.

       -r     Invert the output values (map the minimum to the maximum and vice versa).

       -x ix, -y iy, -z iz, -t it
	      This tells h5tovtk to use a particular slice of a multi-dimensional dataset.  e.g.  -x uses the subset (with one less dimension)	at
	      an  x index of ix (where the indices run from zero to one less than the maximum index in that direction).  Here, x/y/z correspond to
	      the first/second/third dimensions of the HDF5 dataset. The -t option specifies a slice in the last dimension, whichever  that  might
	      be.  See also the -0 option to shift the origin of the x/y/z slice coordinates to the dataset center.

       -0     Shift  the  origin  of  the x/y/z slice coordinates to the dataset center, so that e.g. -0 -x 0 (or more compactly -0x0) returns the
	      central x plane of the dataset instead of the edge x plane.  (-t coordinates are not affected.)

       -d name
	      Use dataset name from the input files; otherwise, the first  dataset  from  each	file  is  used.   Alternatively,  use  the  syntax
	      HDF5FILE:DATASET,  which allows you to specify a different dataset for each file.  You can use the h5ls command (included with hdf5)
	      to find the names of datasets within a file.

BUGS

       Send bug reports to S. G. Johnson, stevenj@alum.mit.edu.

AUTHORS

       Written by Steven G. Johnson.  Copyright (c) 2005 by the Massachusetts Institute of Technology.

h5utils 							   March 9, 2002							H5TOVTK(1)