help! script to select line with greatest value 2 between columns Post: 302566973

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers help! script to select line with greatest value 2 between columns Post 302566973 by wolf_blue on Friday 21st of October 2011 03:12:42 PM

10-21-2011

Registered User

still have duplicates

Thanks.
It works but I realized that there are genes that are in the output file more than once because some isoforms happen to have the same length.
So now I would have to take that file and create a new one with only one gene per line.
I used the code below on the original file.

[Code]
nawk 'NR<2{next}{c=($NF-$(NF-1))}(!($1 in A))||(c>m[$1]&&($1 in A)){m[$1]=c;A[$1]=$0 FS m[$1]}END{for(i in A) print A[i]}' tst.txt

Output file:
test.txt

gene accession chr chr_st begin end length
TNFRSF18 NM_004195 chr1 - 1138887 1142089 3202
TNFRSF18 NM_148902 chr1 - 1138887 1142089 3202
TNFRSF18 NM_148901 chr1 - 1138887 1142089 3202
MIB2 NM_080875 chr1 + 1550794 1565990 15196
MIB2 NM_001170688 chr1 + 1550794 1565990 15196
MIB2 NM_001170687 chr1 + 1550794 1565990 15196
MIB2 NM_001170686 chr1 + 1550794 1565990 15196
CDK11A NM_024011 chr1 - 1634169 1655791 21622
CDK11A NM_033529 chr1 - 1634169 1655791 21622
WASH7P NR_024540 chr1 - 14361 29370 15009
FAM138F NR_026820 chr1 - 34610 36081 1471
FAM138A NR_026818 chr1 - 34610 36081 1471

So from this final file, how can I get it to make a file that has only one gene per line?
So I would want output.txt to be modified as:

Desired final file
gene accession chr chr_st begin end length
TNFRSF18 NM_004195 chr1 - 1138887 1142089 3202
MIB2 NM_080875 chr1 + 1550794 1565990 15196
CDK11A NM_024011 chr1 - 1634169 1655791 21622
WASH7P NR_024540 chr1 - 14361 29370 15009
FAM138F NR_026820 chr1 - 34610 36081 1471
FAM138A NR_026818 chr1 - 34610 36081 1471

I hope this is clearer.
Thanks!

Last edited by wolf_blue; 10-21-2011 at 04:27 PM..

wolf_blue

View Public Profile for wolf_blue

Find all posts by wolf_blue

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Select and display sum depending upon even columns

2. Shell Programming and Scripting

Select and display sum depending upon even columns

Select and display sum depending upon even columns i have a input as : 2898 | homy | pune | 7/4/09 1 :6298 | anna | chennai | 7/4/08 2 :3728 | gonna | kol | 8/2/10 3 :3987 | hogja | mumbai | 8/5/09 4 :6187 | galma | london | 9/5/01 5 :9167 | tamina | ny | 8/3/10 6 :3981 | dastan | bagh |...

3. Shell Programming and Scripting

[Solved] Select the columns which have value greater than particular number

i have a file of the form 9488 14392 1 1.8586e-07 5702 7729 1 1.8586e-07 9048 14018 1 1.8586e-07 5992 12556 1 1.8586e-07 9488 14393 1 1.8586e-07 9048 14019 1 1.8586e-07 5992 12557 1 1.8586e-07 9488 14394 ...

4. Shell Programming and Scripting

Select columns from a matrix given within a range in BASH

I have a huge matrix file which looks like this (example matrix): 1 2 3 5 4 5 6 7 7 6 8 9 1 2 4 2 7 6 5 1 3 2 1 9 As one can see, this matrix has 4 columns and 6 rows. But my original matrix has some 3 million rows and 6000 columns. For example, on this matrix I can define my task as...

5. Shell Programming and Scripting

Select lines where at least x columns above threshold value

I have a file with 20 columns. I'd like to retain only the lines for which the values in at least x columns, looking only at columns 6-20, are above a threshold. For example, I'd like to retain only the lines in the file below that have at least 8 columns (again, looking only at columns 6-20)...

6. Shell Programming and Scripting

Take greatest value from second column

Dear All, Please help me, I have file input like this, 1 2142 215 2162 217 2842 285 2862 287 4002 401 4022 403 4822 1 2142 215 2162 217 2842 285 2862 287 4002 401 4022 403 4882 1 4801

7. Shell Programming and Scripting

Comparing Select Columns from two CSV files in UNIX and create a third file based on comparision

Hi , I want to compare first 3 columns of File A and File B and create a new file File C which will have all rows from File B and will include rows that are present in File A and not in File B based on First 3 column comparison. Thanks in advance for your help. File A A,B,C,45,46...

8. Shell Programming and Scripting

Select all the even columns from a file

Hi, I can select all the even columns from a file like this: awk '{ for (i=1;i<=NF;i+=2) $i="" }1' file > new file How can I select the 1st and all the even columns using awk? Thanks!

9. Shell Programming and Scripting

How do I select certain columns with matching pattern and rest of the lines?

I want to select 2nd, 3rd columns if line has "key3" and print rest of the lines as is. # This is my sample input key1="val1" key2="val2" key3="val3" key4="val4" some text some text some text some text key1="val1" key2="val2" key3="val3" key4="val4" some text some text some text some...

10. UNIX for Beginners Questions & Answers

How to select rows that have opposite values (A vs B, or B vs A) on first two columns?

I have a dateset like this: Gly1 Gly2 2 1 0 Gly3 Gly4 3 4 5 Gly3 Gly5 1 3 2 Gly2 Gly1 3 6 2 Gly4 Gly3 2 2 1 Gly6 Gly4 4 2 1what I expected is: Gly1 Gly2 2 1 0 Gly2 Gly1 3 6 2 Gly3 Gly4 3 4 5 Gly4 Gly3 2 2 1 A vs B, or B vs A are the same...

LEARN ABOUT OPENDARWIN

cut

CUT(1)							    BSD General Commands Manual 						    CUT(1)

NAME

     cut -- select portions of each line of a file

SYNOPSIS

     cut -b list [-n] [file ...]
     cut -c list [file ...]
     cut -f list [-d delim] [-s] [file ...]

DESCRIPTION

     The cut utility selects portions of each line (as specified by list) from each file and writes them to the standard output.  If no file argu-
     ments are specified, or a file argument is a single dash ('-'), cut reads from from the standard input.  The items specified by list can be
     in terms of column position or in terms of fields delimited by a special character.  Column numbering starts from 1.

     The list option argument is a comma or whitespace separated set of increasing numbers and/or number ranges.  Number ranges consist of a num-
     ber, a dash ('-'), and a second number and select the fields or columns from the first number to the second, inclusive.  Numbers or number
     ranges may be preceded by a dash, which selects all fields or columns from 1 to the first number.	Numbers or number ranges may be followed
     by a dash, which selects all fields or columns from the last number to the end of the line.  Numbers and number ranges may be repeated, over-
     lapping, and in any order.  It is not an error to select fields or columns not present in the input line.

     The options are as follows:

     -b list
	     The list specifies byte positions.

     -c list
	     The list specifies character positions.

     -d delim
	     Use the first character of delim as the field delimiter character instead of the tab character.

     -f list
	     The list specifies fields, delimited in the input by a single tab character.  Output fields are separated by a single tab character.

     -n      Do not split multi-byte characters.

     -s      Suppress lines with no field delimiter characters.  Unless specified, lines with no delimiters are passed through unmodified.

ENVIRONMENT

     The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of cut if the -n option is specified.  Their effect is described in
     environ(7).

EXAMPLES

     Extract users' login names and shells from the system passwd(5) file as ``name:shell'' pairs:

	   cut -d : -f 1,7 /etc/passwd

     Show the names and login times of the currently logged in users:

	   who | cut -c 1-16,26-38

DIAGNOSTICS

     The cut utility exits 0 on success, and >0 if an error occurs.

SEE ALSO

     paste(1)

STANDARDS

     The cut utility conforms to IEEE Std 1003.2-1992 (``POSIX.2'').

HISTORY

     A cut command appeared in AT&T System III UNIX.

BUGS

     The -c option is a synonym for the -b option, which causes incorrect behaviour in locales that support multibyte characters.

     When operating on fields (-f option is specified), cut does not recognise multibyte characters, and the delim character is recognised in the
     middle of multibyte sequences.

BSD
								   June 6, 1993 							       BSD

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Select and display sum depending upon even columns

Discussion started by: adityamitra

2. Shell Programming and Scripting

Select and display sum depending upon even columns

Discussion started by: adityamitra

3. Shell Programming and Scripting

[Solved] Select the columns which have value greater than particular number

Discussion started by: vaibhavkorde

4. Shell Programming and Scripting

Select columns from a matrix given within a range in BASH

Discussion started by: shoaibjameel123

5. Shell Programming and Scripting

Select lines where at least x columns above threshold value

Discussion started by: pathunkathunk

6. Shell Programming and Scripting

Take greatest value from second column

Discussion started by: attila

7. Shell Programming and Scripting

Comparing Select Columns from two CSV files in UNIX and create a third file based on comparision

Discussion started by: ady_koolz

8. Shell Programming and Scripting

Select all the even columns from a file

Discussion started by: forU

9. Shell Programming and Scripting

How do I select certain columns with matching pattern and rest of the lines?

Discussion started by: kchinnam

10. UNIX for Beginners Questions & Answers

How to select rows that have opposite values (A vs B, or B vs A) on first two columns?

Discussion started by: nengcheng

LEARN ABOUT OPENDARWIN

cut