Sponsored Content
Full Discussion: text manipulation
Top Forums Shell Programming and Scripting text manipulation Post 302547266 by unlx on Monday 15th of August 2011 03:23:36 AM
Old 08-15-2011
Java text manipulation

Hi All;
i need to do text processing :
I have a file:
file1.txt
>>>>>>>>>>>>
30 2 23 some
30 2 22 text
30 2 21 xyz
30 2 20 ttttt
30 2 19 ttttt-1
30 2 18 xryz
30 2 17 xyzr
30 2 16 xy111z
30 2 15 xanyyz
30 2 14 xzz
30 2 13 xyy
30 2 0 zzz-w
50 3 25 zzz-w
50 3 12 productw
50 3 10 xyz20
50 3 9 eeeee
50 3 8 rrrr-1-77
50 3 7 producti
50 3 5 xyz
50 3 4 xyz40
50 3 3 xyz30
50 3 2 xyz
50 3 1 asdf-2
50 3 21 xasdf
50 3 22 xy30
50 3 23 product-2
50 5 24 asdf-2
50 5 4 ttttt-1-77
50 3 19 ttttt-77
50 3 18 xyz77
50 3 17 xyz
50 3 15 prod-cc
60 1 2 aaa
60 1 5 bbb
60 1 10 ccc
>>>>>>>>>>>>>>>>>>

the processing required is to have an output which is :
sorting the lines according to( the 3rd column for each 1st and 2nd columns) knowing that the 3rd column sorting for each 1st,2nd column should start from 0 and end with 50 .

i mean :for each 1st and 2nd columns values: sort the 3rd column from 0 to 50 and if a value between 0 and 50 in 3rd column is missing we should fill the same 1st,2nd columns value as for the other values and the 3rd column with the missing value and the fourth column with Nothing.


example :for the pair (60 1 3rd string) in file1 above:
we have :
60 1 2 aaa
60 1 5 bbb
60 1 10 ccc

the output for only the ( 60 1 x string) should be:
60 1 0 nothing
60 1 1 nothing
60 1 2 aaa
60 1 3 nothing
60 1 4 nothing
60 1 5 bbb
......
60 1 6-9 nothing
.....
60 1 10 ccc
60 1 11 nothing
.....
....
60 1 50 nothing

please help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Text Manipulation.

Hi I have only ever used awk and sed for basic requirements up until now. I have had to break a log down for multiple purposes. Using awk, sed and a date script. I am left with this: (message id, time of msg attempt, message id, domain name, time of msg completion) ... (4 Replies)
Discussion started by: Icepick
4 Replies

2. UNIX for Dummies Questions & Answers

text manipulation

I am tryin to figure out how to extract interested text from file example.txt blah blah blah a: child1 blah a: child2 blah b: parent1 blah blah blah .... blah a: child21 blah a: child22 blah a: child23 blah b: parent2 this kinda text repeats .. number of children is... (6 Replies)
Discussion started by: rajkishore
6 Replies

3. UNIX for Dummies Questions & Answers

Help with text manipulation

Hi there, I have some text files in unix format that processed by a program in windows, and when I open them with less or vi in linux, a warn for opening binary file is prompted, and as shown in vi, between every two characters there was inserted a "^@". How can I fix this. Plus, there are over... (2 Replies)
Discussion started by: dustinwang2003
2 Replies

4. UNIX for Dummies Questions & Answers

Text Manipulation

Greetings. Iīm a biologist and I donīt have mucho knowledge on Unix/Linux, but I need to use Cygwin to change some documents from a GenBank format to a FASTA format. GenBank format goes somthing like this: LOCUS NM_013964 2568 bp mRNA linear PRI 26-APR-2009... (2 Replies)
Discussion started by: vanesa1230
2 Replies

5. Shell Programming and Scripting

[HELP] Text manipulation... [HELP]

I need to know how can I remove all word after comma on each line. Like: jjkj,iiuiui,ijlkjkij,ookoo kijljlj,jhhkj,ijijkijkj,oijkijj kjkljlkj,kjkjlkjlkj,opok,okop to jjkj, kijljlj, ... (5 Replies)
Discussion started by: slutb3
5 Replies

6. UNIX for Dummies Questions & Answers

text manipulation help

Hello again unix.com How can I extract from a large file in format: steve@aol.com steve hawkins Location of this member is bla bla bla sun@hotmail.com Sun Ying This member is using browser bla bla bla to another text in format: steve@aol.com steve hawkins sun@hotmail.com sun ying ... (5 Replies)
Discussion started by: galford
5 Replies

7. UNIX for Dummies Questions & Answers

Text Manipulation Help

Hello Unix.com, I have a text in format: john sara lee How can I make it: john:john john:john1 john:john12 john:john123 sara:sara sara:sara12 sara:sara123 and so on (2 Replies)
Discussion started by: galford
2 Replies

8. UNIX for Dummies Questions & Answers

Text manipulation help

Hello unix.com users, I have a ip file (line-by-line). How can I delete the ips that keep repeating by mark XXX.XXX.XXX.* ... I want to erase only the lines that keep repeating more than 2 times. Example: 1.2.3.1 1.2.3.2 1.2.3.3 I want to erase all ips blocks that are repeating by C... (1 Reply)
Discussion started by: galford
1 Replies

9. Shell Programming and Scripting

Text manipulation help

Hello again, I have a problem manipulating a large text document and there is no way I could edit this document by hand. Format is: Address : XXXX N 37 Ave, Hollywood, FL, 33021 Phone: XXX3190XXX Player: XXXXXX Character: Jaramillo DOB: June-14-1995 ----- Name: Alexandra Ticket... (3 Replies)
Discussion started by: galford
3 Replies

10. Shell Programming and Scripting

Help text manipulation

Hello Forum , I need a help about text manupulation. I have a text file and I have to manipulate this file. Let's say source.txt source.txt UNB+UNOC:3+O0013000005MAN MN RVS:91+0098006688:92+190304:2313+F004169241' UNH+8146848+DELJIT:D:96A:UN' BGM+307:::JIS_SYNCRO_FIRM+2019030423234101+9'... (8 Replies)
Discussion started by: cemokam65
8 Replies
funjoin(1)							SAORD Documentation							funjoin(1)

NAME
funjoin - join two or more FITS binary tables on specified columns SYNOPSIS
funjoin [switches] <ifile1> <ifile2> ... <ifilen> <ofile> OPTIONS
-a cols # columns to activate in all files -a1 cols ... an cols # columns to activate in each file -b 'c1:bvl,c2:bv2' # blank values for common columns in all files -bn 'c1:bv1,c2:bv2' # blank values for columns in specific files -j col # column to join in all files -j1 col ... jn col # column to join in each file -m min # min matches to output a row -M max # max matches to output a row -s # add 'jfiles' status column -S col # add col as status column -t tol # tolerance for joining numeric cols [2 files only] DESCRIPTION
funjoin joins rows from two or more (up to 32) FITS Binary Table files, based on the values of specified join columns in each file. NB: the join columns must have an index file associated with it. These files are generated using the funindex program. The first argument to the program specifies the first input FITS table or raw event file. If "stdin" is specified, data are read from the standard input. Subsequent arguments specify additional event files and tables to join. The last argument is the output FITS file. NB: Do not use Funtools Bracket Notation to specify FITS extensions and row filters when running funjoin or you will get wrong results. Rows are accessed and joined using the index files directly, and this bypasses all filtering. The join columns are specified using the -j col switch (which specifies a column name to use for all files) or with -j1 col1, -j2 col2, ... -jn coln switches (which specify a column name to use for each file). A join column must be specified for each file. If both -j col and -jn coln are specified for a given file, then the latter is used. Join columns must either be of type string or type numeric; it is illegal to mix numeric and string columns in a given join. For example, to join three files using the same key column for each file, use: funjoin -j key in1.fits in2.fits in3.fits out.fits A different key can be specified for the third file in this way: funjoin -j key -j3 otherkey in1.fits in2.fits in3.fits out.fits The -a "cols" switch (and -a1 "col1", -a2 "cols2" counterparts) can be used to specify columns to activate (i.e. write to the output file) for each input file. By default, all columns are output. If two or more columns from separate files have the same name, the second (and subsequent) columns are renamed to have an underscore and a numeric value appended. The -m min and -M max switches specify the minimum and maximum number of joins required to write out a row. The default minimum is 0 joins (i.e. all rows are written out) and the default maximum is 63 (the maximum number of possible joins with a limit of 32 input files). For example, to write out only those rows in which exactly two files have columns that match (i.e. one join): funjoin -j key -m 1 -M 1 in1.fits in2.fits in3.fits ... out.fits A given row can have the requisite number of joins without all of the files being joined (e.g. three files are being joined but only two have a given join key value). In this case, all of the columns of the non-joined file are written out, by default, using blanks (zeros or NULLs). The -b c1:bv1,c2:bv2 and -b1 'c1:bv1,c2:bv2' -b2 'c1:bv1,c2 - bv2' ... switches can be used to set the blank value for columns common to all files and/or columns in a specified file, respectively. Each blank value string contains a comma-separated list of col- umn:blank_val specifiers. For floating point values (single or double), a case-insensitive string value of "nan" means that the IEEE NaN (not-a-number) should be used. Thus, for example: funjoin -b "AKEY:???" -b1 "A:-1" -b3 "G:NaN,E:-1,F:-100" ... means that a non-joined AKEY column in any file will contain the string "???", the non-joined A column of file 1 will contain a value of -1, the non-joined G column of file 3 will contain IEEE NaNs, while the non-joined E and F columns of the same file will contain values -1 and -100, respectively. Of course, where common and specific blank values are specified for the same column, the specific blank value is used. To distinguish which files are non-blank components of a given row, the -s (status) switch can be used to add a bitmask column named "JFILES" to the output file. In this column, a bit is set for each non-blank file composing the given row, with bit 0 corresponds to the first file, bit 1 to the second file, and so on. The file names themselves are stored in the FITS header as parameters named JFILE1, JFILE2, etc. The -S col switch allows you to change the name of the status column from the default "JFILES". A join between rows is the Cartesian product of all rows in one file having a given join column value with all rows in a second file having the same value for its join column and so on. Thus, if file1 has 2 rows with join column value 100, file2 has 3 rows with the same value, and file3 has 4 rows, then the join results in 2*3*4=24 rows being output. The join algorithm directly processes the index file associated with the join column of each file. The smallest value of all the current columns is selected as a base, and this value is used to join equal-valued columns in the other files. In this way, the index files are traversed exactly once. The -t tol switch specifies a tolerance value for numeric columns. At present, a tolerance value can join only two files at a time. (A completely different algorithm is required to join more than two files using a tolerance, somethng we might consider implementing in the future.) The following example shows many of the features of funjoin. The input files t1.fits, t2.fits, and t3.fits contain the following columns: [sh] fundisp t1.fits AKEY KEY A B ----------- ------ ------ ------ aaa 0 0 1 bbb 1 3 4 ccc 2 6 7 ddd 3 9 10 eee 4 12 13 fff 5 15 16 ggg 6 18 19 hhh 7 21 22 fundisp t2.fits AKEY KEY C D ----------- ------ ------ ------ iii 8 24 25 ggg 6 18 19 eee 4 12 13 ccc 2 6 7 aaa 0 0 1 fundisp t3.fits AKEY KEY E F G ------------ ------ -------- -------- ----------- ggg 6 18 19 100.10 jjj 9 27 28 200.20 aaa 0 0 1 300.30 ddd 3 9 10 400.40 Given these input files, the following funjoin command: funjoin -s -a1 "-B" -a2 "-D" -a3 "-E" -b "AKEY:???" -b1 "AKEY:XXX,A:255" -b3 "G:NaN,E:-1,F:-100" -j key t1.fits t2.fits t3.fits foo.fits will join the files on the KEY column, outputting all columns except B (in t1.fits), D (in t2.fits) and E (in t3.fits), and setting blank values for AKEY (globally, but overridden for t1.fits) and A (in file 1) and G, E, and F (in file 3). A JFILES column will be output to flag which files were used in each row: AKEY KEY A AKEY_2 KEY_2 C AKEY_3 KEY_3 F G JFILES ------------ ------ ------ ------------ ------ ------ ------------ ------ -------- ----------- -------- aaa 0 0 aaa 0 0 aaa 0 1 300.30 7 bbb 1 3 ??? 0 0 ??? 0 -100 nan 1 ccc 2 6 ccc 2 6 ??? 0 -100 nan 3 ddd 3 9 ??? 0 0 ddd 3 10 400.40 5 eee 4 12 eee 4 12 ??? 0 -100 nan 3 fff 5 15 ??? 0 0 ??? 0 -100 nan 1 ggg 6 18 ggg 6 18 ggg 6 19 100.10 7 hhh 7 21 ??? 0 0 ??? 0 -100 nan 1 XXX 0 255 iii 8 24 ??? 0 -100 nan 2 XXX 0 255 ??? 0 0 jjj 9 28 200.20 4 SEE ALSO
See funtools(7) for a list of Funtools help pages version 1.4.2 January 2, 2008 funjoin(1)
All times are GMT -4. The time now is 01:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy