Use a string in one column to get the largest or the smallest of another column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Use a string in one column to get the largest or the smallest of another column
# 1  
Old 03-05-2013
Use a string in one column to get the largest or the smallest of another column

I have data that looks like this:

Code:
chr1    mm9_knownGene   exon    155747075       155747189       0.000000        +       .       gene_id "Glul"; transcript_id "uc007daq.1";
chr1    mm9_knownGene   exon    155750064       155750076       0.000000        +       .       gene_id "Glul"; transcript_id "uc007daq.1";
chr1    mm9_knownGene   exon    155755310       155756844       0.000000        +       .       gene_id "Glul"; transcript_id "uc007daq.1";
chr1    mm9_knownGene   exon    164759692       164760105       0.000000        -       .       gene_id "Fmo1"; transcript_id "uc007dgx.1";
chr1    mm9_knownGene   exon    164789688       164789693       0.000000        -       .       gene_id "Fmo1"; transcript_id "uc007dgx.1";
chr1    mm9_knownGene   exon    164796370       164796679       0.000000        -       .       gene_id "Fmo1"; transcript_id "uc007dgx.1";
chr1    mm9_knownGene   exon    164789688       164789693       0.000000        -       .       gene_id "Fmo1"; transcript_id "uc007dgy.1";
chr1    mm9_knownGene   exon    164789932       164790085       0.000000        -       .       gene_id "Fmo1"; transcript_id "uc007dgy.1";

The data is grouped by the 9th column that has the ID for a particular gene.
If the 7th column is a plus, I want to choose the line where the 4th column has the highest value in a particular group as defined by the 9th column.
If the 7th column is a minus, I want to choose the line where the 4th column has the lowest value in a particular group as defined by the 9th column.

I've tried using awk arrays to parse the data, but I keep getting nonsensical results and I'm a bit lost.
# 2  
Old 03-05-2013
try:
Code:
awk '
!a[$9,$7]++ {if ($7 ~ /[+]/) h[$9,$7]=$4; if ($7 ~ /[-]/) l[$9,$7]=$4}
$7 == "+" {if ($4 >= h[$9,$7]) {h[$9,$7]=$4; hl[$9]=$0}}
$7 == "-" {if ($4 <= l[$9,$7]) {l[$9,$7]=$4; ll[$9]=$0}}
END { for (i in hl) {print hl[i]; print ll[i]}}
' infile

# 3  
Old 03-05-2013
Code:
awk ' NR == 1 {
        p = $NF
        next;
} p == $NF {
        if (($7 == "+") && (f <= $4))
                v = $0
        if (($7 == "-") && (f >= $4))
                v = $0
} p != $NF {
                print v
                v = $0
} {
        p = $NF
        f = $4
} END {
        print v
} ' file

This User Gave Thanks to Yoda For This Post:
# 4  
Old 03-06-2013
Thanks bipinajith. That worked perfectly.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Problem facing to compare different column and print out record with smallest number

Hi, Input file 1 : 37170 37196 77 51 37174 37195 73 52 37174 37194 73 53 Desired Output file 1 : 37170 37196 77 51 Input file 2 : 37174 37195 73 0 37170 37196 77 0 Desired Output file 2 : 37174 37195 73 0 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

2. Shell Programming and Scripting

Problem to print out record got smallest number in specific column

Hi, Anybody know how to print out the record that shown smallest number among column 3 and column 4 Case 1 Input : 37170 37196 77 51 37174 37195 73 52 37174 37194 73 53 Case 1 Output : 37170 37196 77 51 Case 2 Input : 469613 469660 73 ... (4 Replies)
Discussion started by: cpp_beginner
4 Replies

3. Shell Programming and Scripting

Help with compare two column and print out column with smallest number

Input file : 5 20 500 2 20 41 41 0 23 1 Desired output : 5 2 20 0 1 By comparing column 1 and 2 in each line, I hope can print out the column with smallest number. I did try the following code, but it don't look good :( (2 Replies)
Discussion started by: perl_beginner
2 Replies

4. Shell Programming and Scripting

Find smallest & largest in every column

Dear All, I have input like this, J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1501 1 1 4 6101 7392 2 2442 2685 18 3201 4008 20 120 4158 J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1502 1 1 4 5125 6416 2 ... (4 Replies)
Discussion started by: attila
4 Replies

5. Shell Programming and Scripting

Print smallest negative number with corresponding index from a column

considering the following table: ID col1 col2 col3 col4 1 -16.06801249 13.49785832 -56.57087607 -27.00500526 2 -1.53315720 0.71731735 -42.03602078 -39.78554623 3 -1.53315190 0.71731587 -42.03601548 ... (3 Replies)
Discussion started by: Birda
3 Replies

6. Shell Programming and Scripting

Taking largest (negative) number from column of coordinates and adding positive form to every other

Hello all, I'm new to the forums and hope to be able to contribute something useful in the future; however I must admit that what has prompted me to join is the fact that currently I need help with something that has me at the end of my tether. I have a PDB (Protein Data Bank) file which I... (13 Replies)
Discussion started by: crunchgargoyle
13 Replies

7. UNIX for Dummies Questions & Answers

How to remove duplicated based on longest row & largest value in a column

Hii i have a file with data as shown below. Here i need to remove duplicates of the rows in such a way that it just checks for 2,3,4,5 column for duplicates.When deleting duplicates,retain largest row i.e with many columns with values should be selected.Then it must remove duplicates such that by... (11 Replies)
Discussion started by: reva
11 Replies

8. Shell Programming and Scripting

AWK (how) to get smallest/largest nr of ls -la

Hey, This is a long-shot however, I am stuck with the following problem: I have the output from ls -la, and I want to sort some of that data out by using AWK to filter it. ls -la | awk -f scriptname.awk Input: For example: drwxr-xr-x 3 user users 4096 2010-03-14 20:15 bin/... (5 Replies)
Discussion started by: abciscool
5 Replies

9. UNIX for Dummies Questions & Answers

How to print largest and smallest number.

Hey. This is pretty easy stuff but I'm learning the basics of Unix at the moment so keep that in mind. I have to: 1) Write a C-shell script to monitor user activity on the server for 13 minutes. 2) Then print the smallest and largest number of users during these 13 minutes. I have this: 1)... (2 Replies)
Discussion started by: amp10388
2 Replies

10. Shell Programming and Scripting

checking the smallest and largest number

Hi All, My script is reading a log file line by line log file is like ; 19:40:22 :INFO Total time taken to Service External Request---115ms 19:40:25 DEBUG : Batch processed libdaemon.x86_64 0-0.10-5.el5 - u 19:40:22 INFO Total time taken to Service External Request---20ms 19:40:24... (4 Replies)
Discussion started by: subin_bala
4 Replies
Login or Register to Ask a Question