Sponsored Content
Top Forums Shell Programming and Scripting File transformation - what is most efficient method Post 302366084 by 1superdork on Wednesday 28th of October 2009 07:29:30 PM
Old 10-28-2009
File transformation - what is most efficient method

I've done quite a bit of searching on this but cannot seem to find exactly what I'm looking for. Say I have a | delimited input file with 6 columns and I need to change the value of a few columns and create an output file. With my limited knowledge I can do this with many lines of code but want some expert opinions on what would be the most efficient as I'll be working with some large files.

Example input row:
12345|employee1|customer2|d. gibbins|20091028|10000

column1 - leave as is
column2 - if employee is in lookup1.txt, set to Y else N
column3 - if customer is in lookup2.txt, set to Y else N
column4 - upcase
column5 - change date format to MM/DD/YYYY
column6 - add explicit decimal

Example output row:
12345|Y|N|D. GIBBINS|10/28/2009|100.00

I'm not looking for exact syntax, but a general idea of commands you would use and/or workflow.
Thanks!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need more efficient log file grep

I'm writing a script that at one point needs to check the contents of another script's log file to determine how to proceed. An example record from the log file is: "mcref04152006","060417","ANTH0415","282","272","476,983.37","465,268.44","loaded" I want my script to return this record if: ... (3 Replies)
Discussion started by: Glenn Arndt
3 Replies

2. Shell Programming and Scripting

file name transformation

I've got a multitude of text data files that carry exactly the same kind of data. Unfortunately some of them have a different filename format some are: 'category'_'month'-'year'_act.txt an example being: daf_Apr-1961_act.txt and some are: 'category'_ 'year'-'month'_act.txt an... (16 Replies)
Discussion started by: vrms
16 Replies

3. UNIX for Dummies Questions & Answers

efficient raid file server

I need to put together a RAID1 file server for use by Windoze systems. I've built zillions of windows systems from components. I was a HPUX SE for a long time at HP, but have been out of the game for years. I've got an old workhorse mobo FIC PA-2013 with a 450 MHz K6 III+ I could use, but I'd... (2 Replies)
Discussion started by: pcmacd
2 Replies

4. Shell Programming and Scripting

XML file transformation

Hi all, I have to transform a XML file like this: <?xml version="1.0"?> <vocabulary> <voc_id>102</voc_id> <name>Vocabulary Name</name> <description>Voc description</description> <relations>3</relations> <hierarchy>5</hierarchy> <word> <word_id>1</word_id> ... (1 Reply)
Discussion started by: aLittleBeat
1 Replies

5. UNIX for Dummies Questions & Answers

Efficient way of extracting data from file

I am having a file, around 500 lines. which contains one letter words, two letters words,...and so on(up to 15 letter words and words are not seprated by line). I need to compare all 1 letter words with 3,4,5 and 6 letters word, all 2 letters words with 2,3,4 and 5 letters words and all 3 letters... (3 Replies)
Discussion started by: akhay_ms
3 Replies

6. Homework & Coursework Questions

Efficient Text File Writing

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Write a template main.c file via shell script to make it easier for yourself later. The issue here isn't writing... (2 Replies)
Discussion started by: george3isme
2 Replies

7. UNIX for Dummies Questions & Answers

file transformation using fixed width file

Hi Gurus! I need to make some file transformations. Please help. This is my input file. It has four columns with fixed width. 1 aaa bbbb cccc 2 eee dddd jjjj 3 fff gggg jjjj 4 hhh iiii cccc 5 kkk llll cccc 6 mmm nnnn oooo 7 ppp qqqq xxxx 8 rrr ... (1 Reply)
Discussion started by: kokoro
1 Replies

8. Shell Programming and Scripting

Efficient population of array from text file

Hi, I am trying to populate an array with data from a text file. I have a working method using awk but it is too slow and inefficent. See below. The text file has 70,000 lines. As awk is a line editor it reads each line of the file until it gets to the required line and then processes it.... (3 Replies)
Discussion started by: carlr
3 Replies

9. Shell Programming and Scripting

Efficient method of determining if a string is in a file.

Hi, I was hoping someone could suggest an alternative to code I currently have as mine takes up far too much processor time and it to slow. The situation: I have a programme that runs on some files just before they are zipped up and archived, the program appends a one line summary of the... (4 Replies)
Discussion started by: RECrerar
4 Replies

10. Shell Programming and Scripting

Most efficient method to extract values from text files

I have a list of files defined in a single file , one on each line.(No.of files may wary each time) eg. content of ETL_LOOKUP.dat /data/project/randomname /data/project/ramname /data/project/raname /data/project/radomname /data/project/raame /data/project/andomname size of these... (5 Replies)
Discussion started by: h0x0r21
5 Replies
funcone(1)							SAORD Documentation							funcone(1)

NAME
funcone - cone search of a binary table containing RA, Dec columns SYNOPSIS
funcone <switches> <iname> <oname> <ra[hdr]> <dec[hdr]> <radius[dr'"]> [columns] OPTIONS
-d deccol:[hdr] # Dec column name, units (def: DEC:d) -j # join columns from list file -J # join columns from list file, output all rows -l listfile # read centers and radii from a list -L listfile # read centers and radii from a list, output list rows -n # don't use cone limits as a filter -r racol:[hdr] # RA column name, units (def: RA:h) -x # append RA_CEN, DEC_CEN, RAD_CEN, CONE_KEY cols -X # append RA_CEN, DEC_CEN, RAD_CEN, CONE_KEY cols, output all rows DESCRIPTION
Funcone performs a cone search on the RA and Dec columns of a FITS binary table. The distance from the center RA, Dec position to the RA, Dec in each row in the table is calculated. Rows whose distance is less than the specified radius are output. The first argument to the program specifies the FITS file, raw event file, or raw array file. If "stdin" is specified, data are read from the standard input. Use Funtools Bracket Notation to specify FITS extensions, and filters. The second argument is the output FITS file. If "stdout" is specified, the FITS binary table is written to the standard output. The third and fourth required arguments are the RA and Dec center position. By default, RA is specified in hours while Dec is specified in degrees. You can change the units of either of these by appending the character "d" (degrees), "h" (hours) or "r" (radians). Sexagesimal notation is supported, with colons or spaces separating hms and dms. (When using spaces, please ensure that the entire string is quoted.) The fifth required argument is the radius of the cone search. By default, the radius value is given in degrees. The units can be changed by appending the character "d" (degrees), "r" (radians), "'" (arc minutes) or '"' (arc seconds). By default, all columns of the input file are copied to the output file. Selected columns can be output using an optional sixth argument in the form: "column1 column1 ... columnN" A seventh argument allows you to output selected columns from the list file when -j switch is used. Note that the RA and Dec columns used in the cone calculation must not be de-selected. Also by default, the RA and Dec column names are named "RA" and "Dec", and are given in units of hours and degrees respectively. You can change both the name and the units using the -r [RA] and/or -d [Dec] switches. Once again, one of "h", "d", or "r" is appended to the column name to specify units but in this case, there must be a colon ":" between the name and the unit specification. If the -l [listfile] switch is used, then one or more of the center RA, center Dec, and radius can be taken from a list file (which can be a FITS table or an ASCII column text file). In this case, the third (center RA), fourth (center Dec), and fifth (radius) command line argu- ments can either be a column name in the list file (if that parameter varies) or else a numeric value (if that parameter is static). When a column name is specified for the RA, Dec, or radius, you can append a colon followed by "h", "d", or "r" to specify units (also ' and " for radius). The cone search algorithm is run once for each row in the list, taking RA, Dec, and radius values from the specified columns or from static numeric values specified on the command line. When using a list, all valid rows from each iteration are written to a single output file. Use the -x switch to help delineate which line of the list file was used to produce the given output row(s). This switch causes the values for the center RA, Dec, radius, and row number to be appended to the output file, in columns called RA_CEN, DEC_CEN, RAD_CEN and CONE_KEY, respectively. Alternatively, the -j (join) switch will append all columns from the list row to the output row (essentially a join of the list row and input row), along with the CONE_KEY row number. These two switches are mutually exclusive. The -X and -J switches write out the same data as their lower case counterparts for each row satisfying a cone search. In addition, these switches also write out rows from the event file that do not satisfy any cone search. In such cases, that CONE_KEY column will be given a value of -1 and the center and list position information will be set to zero for the given row. Thus, all rows of the input event file are guaranteed to be output, with rows satisfying at least one cone search having additional search information. The -L switch acts similarly to the -l switch in that it takes centers from a list file. However, it also implicitly sets the -j switch, so that output rows are the join of the input event row and the center position row. In addition, this switch also writes out all center position rows for which no event satisfies the cone search criteria of that row. The CONE_KEY column will be given a value of -2 for cen- ter rows that were not close to any data row and the event columns will be zeroed out for such rows. In this way, all centers rows are guaranteed to be output at least once. If any of "all row" switches (-X, -J, or -L) are specified, then a new column named JSTAT is added to the output table. The positive values in this column indicate the center position row number (starting from 1) in the list file that this data row successful matched in a cone search. A value of -1 means that the data row did not match any center position. A value of -2 means that the center position was not matched by any data row. Given a center position and radius, the cone search algorithm calculates limit parameters for a box enclosing the specified cone, and only tests rows whose positions values lie within those limits. For small files, the overhead associated with this cone limit filtering can cause the program to run more slowly than if all events were tested. You can turn off cone limit filtering using the -n switch to see if this speeds up the processing (especially useful when processing a large list of positions). For example, the default cone search uses columns "RA" and "Dec" in hours and degrees (respectively) and RA position in hours, Dec and radius in degrees: funone in.fits out.fits 23.45 34.56 0.01 To specify the RA position in degrees: funcone in.fits out.fits 23.45d 34.56 0.01 To get RA and Dec from a list but use a static value for radius (and also write identifying info for each row in the list): funcone -x -l list.txt in.fits out.fits MYRA MYDec 0.01 User specified columns in degrees, RA position in hours (sexagesimal notation), Dec position in degrees (sexagesimal notation) and radius in arc minutes: funcone -r myRa:d -d myDec in.fits out.fits 12:30:15.5 30:12 15' SEE ALSO
See funtools(7) for a list of Funtools help pages version 1.4.2 January 2, 2008 funcone(1)
All times are GMT -4. The time now is 05:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy