How to subset data? Post: 302790005

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is...

2. UNIX for Dummies Questions & Answers

Help with subset and if-then statements

Hello everyone. I'm new to the boards, I hope I can get and possibly give some help through these forums. I need some help. I have two CSV files, let's call them File A and File B. This is the structure for File A: ID, VAR1, VAR2, VAR3 - VAR50 (where the VAR 1-VAR50 are either 0 or 1) ...

3. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric...

4. UNIX for Dummies Questions & Answers

how to get a subset of such a file

Dear all, I have a file lik below: n of row=420, n of letters in each row=100000 like below: there is no space between the letters. what I want is: the 75000th letter to the 85000th letter in each row. how to do that? thanks a lot! ...

5. Shell Programming and Scripting

Creating subset of compilation errors

I am compiling a fortran program using gfortran and the result looks as below I want to write a bash or awk script that will scan the information and output only problems within a range of line numbers Example: If I specify the file createmodl.f08, start line 1000 and end line 1100, I will...

6. Shell Programming and Scripting

Detecting subset of a word

Each line of the file has some words exactly same letters as of the first one. But has zero or more "_+" inserted. I am interested in those words and remove the other cases. Example: abcde abcd_+e abcd_+de fghig fghigi fghi_+g klmn klmn I want to get this: abcde abcd_+e fghig fghi_+g ...

7. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ...

8. Shell Programming and Scripting

How-to check if file1 a subset of file2 ?

I need to know if file1 is a subset of file2 i.e all the contents of file1 are present in file2 or not. Here is how i would do it. Read line by line file1 and grep every line in file2 in a for loop. any failing grep would means that it is not a subset. Is there a quicker or easier way...

9. Shell Programming and Scripting

How to check if file2 is a subset of file1?

In-order to check and print if file2 is a subset of file one i do the below. var1=$(cat //tmp/file1 | sort -u | wc) var2=$(cat /tmp/file2 /tmp/file1 | sort -u | wc) if ; then echo "file2 is a subset of file1 becoz var1 and var2 have the same values." fi However, i get the following error ...

10. UNIX for Beginners Questions & Answers

Cannot subset ranges from another range set

Ca21chr2_C_albicans_SC5314 2159343 2228327 Ca21chr2_C_albicans_SC5314 636587 638608 Ca21chr2_C_albicans_SC5314 5286 50509 Ca21chr2_C_albicans_SC5314 634021 636276 Ca21chr2_C_albicans_SC5314 1886545 1900975 Ca21chr2_C_albicans_SC5314 610758 613544...

LEARN ABOUT OPENDARWIN

funtable

funtable(1)							SAORD Documentation						       funtable(1)

NAME

       funtable - copy selected rows from a Funtools file to a FITS binary table

SYNOPSIS

       funtable [-a] [-i|-z] [-m] [-s cols] <iname> <oname> [columns]

OPTIONS

	 -a    # append to existing output file as a table extension
	 -i    # for image data, only generate X and Y columns
	 -m    # for tables, write a separate file for each region
	 -s "col1 ..." # columns on which to sort
	 -z    # for image data, output zero-valued pixels

DESCRIPTION

       funtable selects rows from the specified FITS Extension (binary table only) of a FITS file, or from a non-FITS raw event file, and writes
       those rows to a FITS binary table file. It also will create a FITS binary table from an image or a raw array file.

       The first argument to the program specifies the FITS file, raw event file, or raw array file.  If "stdin" is specified, data are read from
       the standard input. Use Funtools Bracket Notation to specify FITS extensions, and filters.  The second argument is the output FITS file.
       If "stdout" is specified, the FITS binary table is written to the standard output.  By default, all columns of the input file are copied to
       the output file.  Selected columns can be output using an optional third argument in the form:

	 "column1 column1 ... columnN"

       The funtable program generally is used to select rows from a FITS binary table using Table Filters and/or Spatial Region Filters.  For
       example, you can copy only selected rows (and output only selected columns) by executing in a command such as:

	 [sh] funtable "test.ev[pha==1&&pi==10]" stdout "x y pi pha" | fundisp stdin
		X	Y     PHA	 PI
	  ------- ------- ------- ---------
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10
		1      10	1	 10

       The special column $REGION can be specified to write the region id of each row:

	 [sh $] funtable "test.ev[time-(int)time>=.99&&annulus(0 0 0 10 n=3)]" stdout 'x y time $REGION' | fundisp stdin
		 X	  Y		     TIME     REGION
	  -------- -------- --------------------- ----------
		 5	 -6	      40.99000000	   3
		 4	 -5	      59.99000000	   2
		-1	  0	     154.99000000	   1
		-2	  1	     168.99000000	   1
		-3	  2	     183.99000000	   2
		-4	  3	     199.99000000	   2
		-5	  4	     216.99000000	   2
		-6	  5	     234.99000000	   3
		-7	  6	     253.99000000	   3

       Here only rows with the proper fractional time and whose position also is within one of the three annuli are written.

       Columns can be excluded from display using a minus sign before the column:

	 [sh $] funtable "test.ev[time-(int)time>=.99]" stdout "-time" | fundisp stdin
		 X	  Y	 PHA	     PI 	 DX	     DY
	  -------- -------- -------- ---------- ----------- -----------
		 5	 -6	   5	     -6        5.50	  -6.50
		 4	 -5	   4	     -5        4.50	  -5.50
		-1	  0	  -1	      0       -1.50	   0.50
		-2	  1	  -2	      1       -2.50	   1.50
		-3	  2	  -3	      2       -3.50	   2.50
		-4	  3	  -4	      3       -4.50	   3.50
		-5	  4	  -5	      4       -5.50	   4.50
		-6	  5	  -6	      5       -6.50	   5.50
		-7	  6	  -7	      6       -7.50	   6.50

       All columns except the time column are written.

       In general, the rules for activating and de-activating columns are:

       o   If only exclude columns are specified, then all columns but the exclude columns will be activated.

       o   If only include columns are specified, then only the specified columns are activated.

       o   If a mixture of include and exclude columns are specified, then all but the exclude columns will be active; this last case is ambiguous
	   and the rule is arbitrary.

       In addition to specifying columns names explicitly, the special symbols + and - can be used to activate and de-activate all columns. This
       is useful if you want to activate the $REGION column along with all other columns.  According to the rules, the syntax "$REGION" only acti-
       vates the region column and de-activates the rest. Use "+ $REGION" to activate all columns as well as the region column.

       Ordinarily, only the selected table is copied to the output file.  In a FITS binary table, it sometimes is desirable to copy all of the
       other FITS extensions to the output file as well. This can be done by appending a '+' sign to the name of the extension in the input file
       name. For example, the first command below copies only the EVENT table, while the second command copies other extensions as well:

	 [sh] funtable "/proj/rd/data/snr.ev[EVENTS]" events.ev
	 [sh] funtable "/proj/rd/data/snr.ev[EVENTS+]" eventsandmore.ev

       If the input file is an image or a raw array file, then funtable will generate a FITS binary table from the pixel values in the image. Note
       that it is not possible to specify the columns to output (using command-line argument 3). Instead, there are two ways to create such a
       binary table from an image. By default, a 3-column table is generated, where the columns are "X", "Y", and "VALUE". For each pixel in the
       image, a single row (event) is generated with the "X" and "Y" columns assigned the dim1 and dim2 values of the image pixel, respectively
       and the "VALUE" column assigned the value of the pixel. With sort of table, running funhist on the "VALUE" column will give the same
       results as running funhist on the original image.

       If the -i ("individual" rows) switch is specified, then only the "X" and "Y" columns are generated. In this case, each positive pixel
       value in the image generates n rows (events), where n is equal to the integerized value of that pixel (plus 0.5, for floating point data).
       In effect, -i approximately recreates the rows of a table that would have been binned into the input image. (Of course, this is only
       approximately correct, since the resulting x,y positions are integerized.)

       If the -s [col1 col2 ... coln] ("sort") switch is specified, the output rows of a binary table will be sorted using the specified columns
       as sort keys. The sort keys must be scalar columns and also must be part of the output file (i.e. you cannot sort on a column but not
       include it in the output). This facility uses the _sort program (included with funtools), which must be accessible via your path.

       For binary tables, the -m ("multiple files") switch will generate a separate file for each region in the filter specification i.e. each
       file contains only the rows from that region. Rows which pass the filter but are not in any region also are put in a separate file.

       The separate output file names generated by the -m switch are produced automatically from the root output file to contain the region id of
       the associated region. (Note that region ids start at 1, so that the file name associated with id 0 contains rows that pass the filter but
       are not in any given region.) Output file names are generated as follows:

       o   A $n specification can be used anywhere in the root file name (suitably quoted to protect it from the shell) and will be expanded to be
	   the id number of the associated region. For example:

	     funtable -m input.fits'[cir(512,512,1);cir(520,520,1)...]' 'foo.goo_$n.fits'

	   will generate files named foo.goo_0.fits (for rows not in any region but still passing the filter), foo.goo_1.fits (rows in region id
	   #1, the first region), foo.goo_2.fits (rows in region id #2), etc. Note that single quotes in the output root are required to protect
	   the '$' from the shell.

       o   If $n is not specified, then the region id will be placed before the first dot (.) in the filename. Thus:

	     funtable -m input.fits'[cir(512,512,1);cir(520,520,1)...]' foo.evt.fits

	   will generate files named foo0.evt.fits (for rows not in any region but still passing the filter), foo1.evt.fits (rows in region id
	   #1), foo2.evt.fits (rows in region id #2), etc.

       o   If no dot is specified in the root output file name, then the region id will be appended to the filename. Thus:

	     funtable -m input.fits'[cir(512,512,1);cir(520,520,1)...]' 'foo_evt'

	   will generate files named foo_evt0 (for rows not in any region but still passing the filter), foo_evt1 (rows in region id #1), foo_evt2
	   (rows in region id #2), etc.

       The multiple file mechanism provide a simple way to generate individual source data files with a single pass through the data.

       By default, a new FITS file is created and the binary table is written to the first extension.  If the -a (append) switch is specified,
       the table is appended to an existing FITS file as a BINTABLE extension.	Note that the output FITS file must already exist.

       If the -z ("zero" pixel values) switch is specified and -i is not specified, then pixels having a zero value will be output with their
       "VALUE" column set to zero. Obviously, this switch does not make sense when individual events are output.

SEE ALSO

       See funtools(7) for a list of Funtools help pages

version 1.4.2							  January 2, 2008						       funtable(1)