Sponsored Content
Top Forums UNIX for Dummies Questions & Answers help! script to select line with greatest value 2 between columns Post 302566973 by wolf_blue on Friday 21st of October 2011 03:12:42 PM
Old 10-21-2011
still have duplicates

Thanks.
It works but I realized that there are genes that are in the output file more than once because some isoforms happen to have the same length.
So now I would have to take that file and create a new one with only one gene per line.
I used the code below on the original file.

[Code]
nawk 'NR<2{next}{c=($NF-$(NF-1))}(!($1 in A))||(c>m[$1]&&($1 in A)){m[$1]=c;A[$1]=$0 FS m[$1]}END{for(i in A) print A[i]}' tst.txt

Output file:
test.txt

gene accession chr chr_st begin end length
TNFRSF18 NM_004195 chr1 - 1138887 1142089 3202
TNFRSF18 NM_148902 chr1 - 1138887 1142089 3202
TNFRSF18 NM_148901 chr1 - 1138887 1142089 3202
MIB2 NM_080875 chr1 + 1550794 1565990 15196
MIB2 NM_001170688 chr1 + 1550794 1565990 15196
MIB2 NM_001170687 chr1 + 1550794 1565990 15196
MIB2 NM_001170686 chr1 + 1550794 1565990 15196
CDK11A NM_024011 chr1 - 1634169 1655791 21622
CDK11A NM_033529 chr1 - 1634169 1655791 21622
WASH7P NR_024540 chr1 - 14361 29370 15009
FAM138F NR_026820 chr1 - 34610 36081 1471
FAM138A NR_026818 chr1 - 34610 36081 1471


So from this final file, how can I get it to make a file that has only one gene per line?
So I would want output.txt to be modified as:

Desired final file
gene accession chr chr_st begin end length
TNFRSF18 NM_004195 chr1 - 1138887 1142089 3202
MIB2 NM_080875 chr1 + 1550794 1565990 15196
CDK11A NM_024011 chr1 - 1634169 1655791 21622
WASH7P NR_024540 chr1 - 14361 29370 15009
FAM138F NR_026820 chr1 - 34610 36081 1471
FAM138A NR_026818 chr1 - 34610 36081 1471

I hope this is clearer.
Thanks!

Last edited by wolf_blue; 10-21-2011 at 04:27 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Select and display sum depending upon even columns

i have a input as : 2898 | homy | pune | 7/4/09 1 :6298 | anna | chennai | 7/4/08 2 :3728 | gonna | kol | 8/2/10 3 :3987 | hogja | mumbai | 8/5/09 4 :6187 | galma | london | 9/5/01 5 :9167 | tamina | ny | 8/3/10 6 :3981 | dastan | bagh | 8/2/07 7 :4617 | vazir | ny now,i want to get... (2 Replies)
Discussion started by: adityamitra
2 Replies

2. Shell Programming and Scripting

Select and display sum depending upon even columns

Select and display sum depending upon even columns i have a input as : 2898 | homy | pune | 7/4/09 1 :6298 | anna | chennai | 7/4/08 2 :3728 | gonna | kol | 8/2/10 3 :3987 | hogja | mumbai | 8/5/09 4 :6187 | galma | london | 9/5/01 5 :9167 | tamina | ny | 8/3/10 6 :3981 | dastan | bagh |... (1 Reply)
Discussion started by: adityamitra
1 Replies

3. Shell Programming and Scripting

[Solved] Select the columns which have value greater than particular number

i have a file of the form 9488 14392 1 1.8586e-07 5702 7729 1 1.8586e-07 9048 14018 1 1.8586e-07 5992 12556 1 1.8586e-07 9488 14393 1 1.8586e-07 9048 14019 1 1.8586e-07 5992 12557 1 1.8586e-07 9488 14394 ... (1 Reply)
Discussion started by: vaibhavkorde
1 Replies

4. Shell Programming and Scripting

Select columns from a matrix given within a range in BASH

I have a huge matrix file which looks like this (example matrix): 1 2 3 5 4 5 6 7 7 6 8 9 1 2 4 2 7 6 5 1 3 2 1 9 As one can see, this matrix has 4 columns and 6 rows. But my original matrix has some 3 million rows and 6000 columns. For example, on this matrix I can define my task as... (2 Replies)
Discussion started by: shoaibjameel123
2 Replies

5. Shell Programming and Scripting

Select lines where at least x columns above threshold value

I have a file with 20 columns. I'd like to retain only the lines for which the values in at least x columns, looking only at columns 6-20, are above a threshold. For example, I'd like to retain only the lines in the file below that have at least 8 columns (again, looking only at columns 6-20)... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

Take greatest value from second column

Dear All, Please help me, I have file input like this, 1 2142 215 2162 217 2842 285 2862 287 4002 401 4022 403 4822 1 2142 215 2162 217 2842 285 2862 287 4002 401 4022 403 4882 1 4801 (8 Replies)
Discussion started by: attila
8 Replies

7. Shell Programming and Scripting

Comparing Select Columns from two CSV files in UNIX and create a third file based on comparision

Hi , I want to compare first 3 columns of File A and File B and create a new file File C which will have all rows from File B and will include rows that are present in File A and not in File B based on First 3 column comparison. Thanks in advance for your help. File A A,B,C,45,46... (2 Replies)
Discussion started by: ady_koolz
2 Replies

8. Shell Programming and Scripting

Select all the even columns from a file

Hi, I can select all the even columns from a file like this: awk '{ for (i=1;i<=NF;i+=2) $i="" }1' file > new file How can I select the 1st and all the even columns using awk? Thanks! (1 Reply)
Discussion started by: forU
1 Replies

9. Shell Programming and Scripting

How do I select certain columns with matching pattern and rest of the lines?

I want to select 2nd, 3rd columns if line has "key3" and print rest of the lines as is. # This is my sample input key1="val1" key2="val2" key3="val3" key4="val4" some text some text some text some text key1="val1" key2="val2" key3="val3" key4="val4" some text some text some text some... (3 Replies)
Discussion started by: kchinnam
3 Replies

10. UNIX for Beginners Questions & Answers

How to select rows that have opposite values (A vs B, or B vs A) on first two columns?

I have a dateset like this: Gly1 Gly2 2 1 0 Gly3 Gly4 3 4 5 Gly3 Gly5 1 3 2 Gly2 Gly1 3 6 2 Gly4 Gly3 2 2 1 Gly6 Gly4 4 2 1what I expected is: Gly1 Gly2 2 1 0 Gly2 Gly1 3 6 2 Gly3 Gly4 3 4 5 Gly4 Gly3 2 2 1 A vs B, or B vs A are the same... (7 Replies)
Discussion started by: nengcheng
7 Replies
CUT(1)							    BSD General Commands Manual 						    CUT(1)

NAME
cut -- select portions of each line of a file SYNOPSIS
cut -b list [-n] [file ...] cut -c list [file ...] cut -f list [-d delim] [-s] [file ...] DESCRIPTION
The cut utility selects portions of each line (as specified by list) from each file and writes them to the standard output. If no file argu- ments are specified, or a file argument is a single dash ('-'), cut reads from from the standard input. The items specified by list can be in terms of column position or in terms of fields delimited by a special character. Column numbering starts from 1. The list option argument is a comma or whitespace separated set of increasing numbers and/or number ranges. Number ranges consist of a num- ber, a dash ('-'), and a second number and select the fields or columns from the first number to the second, inclusive. Numbers or number ranges may be preceded by a dash, which selects all fields or columns from 1 to the first number. Numbers or number ranges may be followed by a dash, which selects all fields or columns from the last number to the end of the line. Numbers and number ranges may be repeated, over- lapping, and in any order. It is not an error to select fields or columns not present in the input line. The options are as follows: -b list The list specifies byte positions. -c list The list specifies character positions. -d delim Use the first character of delim as the field delimiter character instead of the tab character. -f list The list specifies fields, delimited in the input by a single tab character. Output fields are separated by a single tab character. -n Do not split multi-byte characters. -s Suppress lines with no field delimiter characters. Unless specified, lines with no delimiters are passed through unmodified. ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of cut if the -n option is specified. Their effect is described in environ(7). EXAMPLES
Extract users' login names and shells from the system passwd(5) file as ``name:shell'' pairs: cut -d : -f 1,7 /etc/passwd Show the names and login times of the currently logged in users: who | cut -c 1-16,26-38 DIAGNOSTICS
The cut utility exits 0 on success, and >0 if an error occurs. SEE ALSO
paste(1) STANDARDS
The cut utility conforms to IEEE Std 1003.2-1992 (``POSIX.2''). HISTORY
A cut command appeared in AT&T System III UNIX. BUGS
The -c option is a synonym for the -b option, which causes incorrect behaviour in locales that support multibyte characters. When operating on fields (-f option is specified), cut does not recognise multibyte characters, and the delim character is recognised in the middle of multibyte sequences. BSD
June 6, 1993 BSD
All times are GMT -4. The time now is 11:13 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy