Awk: get upper and lower bound per group


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk: get upper and lower bound per group
# 1  
Old 11-14-2017
Awk: get upper and lower bound per group

Hi all,

I've data as:
Code:
22      51018157        51018157        exonic  CHKB    nonsynonymous SNV
22      51018204        51018204        exonic  CHKB    nonsynonymous SNV
22      51018428        51018428        exonic  CHKB    nonsynonymous SNV
22      51018814        51018814        exonic  CHKB    nonsynonymous SNV
22      51019001        51019001        exonic  CHKB    nonsynonymous SNV
22      51019849        51019849        exonic  CHKB    nonsynonymous SNV
22      51020736        51020736        exonic  CHKB    nonsynonymous SNV
22      51021027        51021027        exonic  CHKB    nonsynonymous SNV
22      51021197        51021197        exonic  CHKB    nonsynonymous SNV
22      51063758        51063758        exonic  ARSA    nonsynonymous SNV
22      51063778        51063778        exonic  ARSA    nonsynonymous SNV
22      51063820        51063820        exonic  ARSA    nonsynonymous SNV
22      51063845        51063845        exonic  ARSA    nonsynonymous SNV
22      51064416        51064416        exonic  ARSA    nonsynonymous SNV
22      51064489        51064489        exonic  ARSA    nonsynonymous SNV
22      51065266        51065266        exonic  ARSA    nonsynonymous SNV
22      51065287        51065287        exonic  ARSA    nonsynonymous SNV
22      51065341        51065341        exonic  ARSA    nonsynonymous SNV
22      51065361        51065361        exonic  ARSA    nonsynonymous SNV
22      51066194        51066194        exonic  ARSA    nonsynonymous SNV
22      51143462        51143462        exonic  SHANK3  nonsynonymous SNV
22      51153371        51153371        exonic  SHANK3  nonsynonymous SNV
22      51159778        51159778        exonic  SHANK3  nonsynonymous SNV
22      51160154        51160154        exonic  SHANK3  nonsynonymous SNV
22      51169684        51169684        exonic  SHANK3  nonsynonymous SNV
22      51176664        51176664        exonic  ACR     nonsynonymous SNV
22      51176734        51176734        exonic  ACR     nonsynonymous SNV
22      51177812        51177812        exonic  ACR     nonsynonymous SNV
22      51178286        51178286        exonic  ACR     nonsynonymous SNV

It's a tab separated data.
Column one is chromosome
Column two is start position, three is end. Column fifth is gene name.

My desired output is

Code:
22 CHKB 51018157 51021197
22 ARSA 51063758 51066194
22 SHANK3 51143462 51169684
22 ACR 51176664 51178286

That is, for each gene, I get the smallest number from column 2 and largest from column 3.

I could only get my head around this much:
Code:
 cat small_d.txt | awk '{a[$5]=$1} END {for (i in a) {print i,a[i]}}'

I can't simply think in awk. I can write a python script but would like to learn these magic tricks.

Last edited by Scrutinizer; 11-14-2017 at 09:32 PM.. Reason: edited output; mod: changed quote tags to code tags
# 2  
Old 11-14-2017
First off, you don't need cat's help to read a file, awk can read perfectly fine on its own. Same goes for nearly any other program.

Code:
$ awk '($3 > MAX[$5]) { MAX[$5]=$3 }
        (!($5 in MIN) || ($2 < MIN[$5] )) { MIN[$5]=$2 }
        END { for(X in MIN) print X, MIN[X], MAX[X] }' inputfile

ARSA 51063758 51066194
ACR 51176664 51178286
CHKB 51018157 51021197
SHANK3 51143462 51169684

$

# 3  
Old 11-14-2017
You need a min and max variable for smallest and largest position:-
Code:
awk -t'\t' '
        {
                idx = $1 FS $5
                if ( idx in A_min )
                {
                        if ( A_min[idx] > $2 )
                                A_min[idx] = $2
                        if ( A_max[idx] < $2 )
                                A_max[idx] = $2
                }
                else
                {
                        A_min[idx] = $2
                        A_max[idx] = $2
                }
        }
        END {
                for ( k in A_min )
                        print k, A_min[k], A_max[k]
        }
' small_d.txt

# 4  
Old 11-14-2017
If the files are always grouped and in sorted/increasing order per group, then something like this might suffice:
Code:
awk '{i=$1 FS $5; if(i!=p) {if(p) print p,l,h; l=$2; p=i} h=$3} END{print p,l,h}' file

Which would keep the group order of the input file

Last edited by Scrutinizer; 11-15-2017 at 12:55 AM..
# 5  
Old 11-15-2017
Quote:
Originally Posted by Corona688
First off, you don't need cat's help to read a file, awk can read perfectly fine on its own. Same goes for nearly any other program.

Code:
$ awk '($3 > MAX[$5]) { MAX[$5]=$3 }
        (!($5 in MIN) || ($2 < MIN[$5] )) { MIN[$5]=$2 }
        END { for(X in MIN) print X, MIN[X], MAX[X] }' inputfile

ARSA 51063758 51066194
ACR 51176664 51178286
CHKB 51018157 51021197
SHANK3 51143462 51169684

$

Thank you corona.
Do you think you can help me understand how this is working?

---------- Post updated at 09:12 AM ---------- Previous update was at 09:11 AM ----------

---------- Post updated at 09:13 AM ---------- Previous update was at 09:12 AM ----------

Quote:
Originally Posted by Scrutinizer
If the files are always grouped and in sorted/increasing order per group, then something like this might suffice:
Code:
awk '{i=$1 FS $5; if(i!=p) {if(p) print p,l,h; l=$2; p=i} h=$3} END{print p,l,h}' file

Which would keep the group order of the input file
Hi Scrutinizer

Thank you. This works exactly I needed, prints chromosome number as well.
Can you please help me understand your code?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Lower to upper..tr + awk ?

I have a file that has a pattern 2 lines, blanktwo line If encountering the first line, the 2nd line need to be converted to UPPERCASE...or...conver the 2nd line after ablank into uppercase Is there a 'tr' function in awk..(probably the best tool over sed) ? i.e. ......................... (6 Replies)
Discussion started by: stevie_velvet
6 Replies

2. Shell Programming and Scripting

Upper to lower case in encoded file

Hi All, I want to change the out put of a decode file from lower to upper. i used tr command but facing issue. set -vx id=$(id) dt=$(date) store=$1 if ]; then cd $APPL_TOP/local/bin cp .sqlpass.Z $$.temp.Z uncompress $$.temp.Z sed -e s/sqlpass/$$.sqlpass/ $$.temp >... (5 Replies)
Discussion started by: nag_sathi
5 Replies

3. Shell Programming and Scripting

File name lower to upper in Shell

I have a file file_name1=RYK11603_PLK5692601_RKYADAV.PDF i am using the below command to convert this file to RYK11603_5692601.pdf file_name=$(echo ${file_name1}| cut -d"#" -f2| sed "s/\(*\)_PLK\(*\)_\(*\).PDF/\1_\2.pdf/") but no success can somebody help on thi. (13 Replies)
Discussion started by: yadavricky
13 Replies

4. Shell Programming and Scripting

Trying to test for both upper and lower case directories

I am trying to get a script to print out whether a directory is lowercase uppercase or both. This is what I've got so far: echo -e read "enter name" read server for DIR in $(find /tmp/$server -type d -prune | sed 's/\.\///g');do if expr match "$server" "*$" > /dev/null; then echo "$server -... (7 Replies)
Discussion started by: newbie2010
7 Replies

5. Shell Programming and Scripting

Upper and Lower case

Hi, I think this is a weird problem. I have two files...one with all UPPER case and the other one with a mix of upper and lower. Match each record in file1 against record in file2, if they match, then change the record in file1 to that of record in file2. Thanks in advance. (2 Replies)
Discussion started by: jacobs.smith
2 Replies

6. Shell Programming and Scripting

lower to upper case in ksh

What is the command to change the contents of a file to UPPER case. Here in this file below you see some characters are Sp, Ch 1200812270046581 22885072800000652 B86860003OLFXXX592123320081227 22885029800000652 B86860003ODL-Sp592123420081227 22885093700000652-B94030001ODL-Ch592123520081227... (4 Replies)
Discussion started by: kshuser
4 Replies

7. Shell Programming and Scripting

how to convert from upper to lower case

Hi I am working in ksh and need to convert the following line into lower case: N344 N228 P227 N115 P116 N332 P331 P343 P293 N342 N294 N335 N329 P330 P336 P097 P092 N098 P334 N337 P345 P338 N091 N333 so the output should look like this: n344 n228 p227 n115 p116 n332 p331 p343 p293 n342... (5 Replies)
Discussion started by: aoussenko
5 Replies

8. Shell Programming and Scripting

using AWK see the upper lines and lower lines of the strings??

Hi experts, You cool guys already given me the awk script below- awk '/9366109380/,printed==5 { ++printed; print; }' 2008-09-14.0.log Morever, i have one more things- when i awk 9366109380, i can also see the Upper 3 lines as well as below 5 lines of that string. Line 1.... (3 Replies)
Discussion started by: thepurple
3 Replies

9. Shell Programming and Scripting

Accepting Upper and Lower case

Hi Gurus, This is my script: echo "" echo "Do you want to execute DWH Test Program?" echo "" echo -n "Okay?("y" or "n")=> " set ret = $< if ($ret != "y") then echo "" echo "" echo "End." exit 0 How can I make this script accept uppercase as well?... (8 Replies)
Discussion started by: lweegp
8 Replies

10. Shell Programming and Scripting

Upper And Lower Case

Hi! I pass a parameter to a script code and I have to make it upper case before use: $ MyShell aBcDe script code: UpperVariable=function($1) I can't know how make function, maybe some sed option? Thank You, PARIDE (1 Reply)
Discussion started by: pciatto
1 Replies
Login or Register to Ask a Question