Building intervals


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Building intervals
# 1  
Old 06-04-2014
Building intervals

Hi all,

I hope you can help me with the following question:

I have multiple tables like this:

Code:
      Chr    Start    End    Zygosity    Gene
chr1    153233510    153233510    het    LOR
chr1    153233615    153233615    hom    LOR
chr1    153233701    153233701    hom    LOR
chr1    153233728    153233728    hom    LOR
chr1    153233734    153233734    hom    LOR
chr1    153234295    153234295    het    LOR
chr1    153234602    153234602    hom    LOR
chr1    155205331    155205331    hom    GBA
chr1    155205669    155205669    hom    GBA
chr1    155208647    155208647    het    GBA
chr1    155208647    155208647    hom    GBA
chr1    155209338    155209341    het    GBA
chr1    155209341    155209341    hom    GBA
chr1    155209360    155209360    het    GBA
chr1    155214473    155214473    hom    GBA
chr2    159831015    159831015    hom    TANC1
chr2    128018063    128018063    het    ERCC3
chr2    128018192    128018192    hom    ERCC3
chr2    128018916    128018917    hom    ERCC3
chr2    128018919    128018919    hom    ERCC3
chr2    128018920    128018920    hom    ERCC3
chr2    128018926    128018927    het    ERCC3
chr2    128018928    128018928    het    ERCC3
chr2    128018930    128018930    het    ERCC3
chr2    128019178    128019178    hom    ERCC3
chr2    128047101    128047101    hom    ERCC3
chr2    128048142    128048142    hom    ERCC3
chr2    128050134    128050134    hom    ERCC3

What I need to do is to find, for every group in the "Gene" column, correlative lines where "Zygosity"=hom (there must be more than one per group). Once it's done, the interval is defined by the "Chr" value, "Start" value of the first element and the "End" value of the last element.

The expected output for the example above should be:
Code:
      Chr   Start   End   Gene   Width 
chr1 153233615   153233734   LOR   119       
chr1   153234602   155205669   GBA   1971067 
      chr2   128018192   128018920   ERCC3   728 
      chr2   128019178   128050134   ERCC3   30956

I'm not asking for a complete solution but a hint to start. All suggestions are welcome.

Thamks!

Last edited by Franklin52; 06-04-2014 at 05:52 AM.. Reason: Please use code tags
# 2  
Old 06-04-2014
You could first print an empty line when column 4 is not "hom". Then pipe that output to another awk which uses empty lines as a record separator then the start is in $2 (field 2) and the last is in $(NF-2)..

An alternative is to use a single awk and to start new array item each time $4 equals "hom" for the first time and then do the calculations and the printing in the END section...
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 06-04-2014
looks like the below line from the expect output is incorrect
Code:
chr1   153234602   155205669   GBA   1971067

it should be
Code:
chr1 155205331 155205669 GBA 338

If my assumption is true, below code should do
Code:
awk 'NR == 1 {print; next}
{ if (t == ($1 FS $5) && $4 == "hom") {FLG++}
  else if ($4 == "hom"){t = ($1 FS $5); START = $2; FLG = 0}
  else {t = ""; FLG = 0}
  if (FLG == 1) {STR[++n] = START; CHR[n] = $1; STP[n] = $3; GEN[n] = $5}
  else if (FLG > 1) {STP[n] = $3}}
END {for(i = 1; i <= n; i++)
  {print CHR[i], STR[i], STP[i], GEN[i], (STP[i] - STR[i])}}' file

This User Gave Thanks to SriniShoo For This Post:
# 4  
Old 06-04-2014
@SriniShoo, please note: the OP specifically states that he is not asking a complete solution, but a hint to start.
# 5  
Old 06-04-2014
SOLVED! Thank you so much

Thank you Scrutinizer and SriniShoo for your quick and useful replies

SriniShoo, your code works nicely.

Thank you so much!

---------- Post updated at 12:10 PM ---------- Previous update was at 12:01 PM ----------

Quote:
Originally Posted by Scrutinizer
@SriniShoo, please note: the OP specifically states that he is not asking a complete solution, but a hint to start.
I understand that the policy of this forum is not to ask for complete solutions. I'm sure SriniShoo did it with his best intention. Now after seeing the code, I' can say that even in my whole life I could not solve the problem. Please do not consider this valuable help as a policy violation.

Thanks
# 6  
Old 06-04-2014
Thank you for your concern Isantome. No worries, it is not a policy violation, or anything like that. I was merely pointing out that you specifically asked for it.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Gap length between intervals

hi all, I wish to calculate the length between intervals whose are defined by a starting and an end possition. The data looks like this: 1 10 23 30 45 60 70 100... The desired output should be: 13 # (23-10) 15 # (45-30) 10 # (70-60)... I donīt know how to operate with different... (2 Replies)
Discussion started by: lsantome
2 Replies

2. UNIX for Dummies Questions & Answers

Bulk load testing in regular intervals

I need to write a script which can send files via sftp communication continously for half an hour or any given duration of time. I have already written a batch file to send multiple file via SFTP. but I need to know how can we set a duration of half an hour through shell script. Can we use sleep... (2 Replies)
Discussion started by: talk1234
2 Replies

3. Programming

Selecting files in regular intervals from a folder

Hi, I need your expertise in selecting files from a folder. I have files named with convention: filename.i.j where j is an interger from 1 to 16, for each i which is an integer from 1 to 2000. I would like to select the files with i in regular interval of 50 like filename.1.j,... (2 Replies)
Discussion started by: rpd25
2 Replies

4. Shell Programming and Scripting

Divide numbers into intervals

divide input values into specified number (-100 or -200) according to the key (a1 or a2 ....) For ex: if we give -100 in the command line it would create 100 number intervals (1-100, 100-200, 200-300) untill it covers the value 300 in a1. Note: It should work the same even with huge numbers... (3 Replies)
Discussion started by: ruby_sgp
3 Replies

5. Shell Programming and Scripting

Bash loop script for specfic intervals

Hello, first of all I am happy to sign up here. Next is, I have shell scripts for all the files I want looped infinitely for specific intervals(This is for a wmii config). My question here is how can I run multiple scripts at a 10 second interval for instance? (4 Replies)
Discussion started by: Mesher
4 Replies

6. Red Hat

How do sa1/sar time intervals work?

Hi, I have set up sar on my RedHat and Fedora Linux systems. I am running sa1 from cron: 0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 & The 1200 and 3 parameters tell sa1 to save data every 1200 seconds (== 20 minutes) and to write 3 times. When I run sar to observe my data, I'll see... (1 Reply)
Discussion started by: mschwage
1 Replies

7. Programming

performing a task at regular intervals

hi! i m tryin to write a program that will perform a specific tasks after fixed interval of time.say every 1 min. i jus donno how to go abt it.. which functions to use and so on... i wud like to add that i am dont want to use crontab over here. ny lead is appreciated. thanx. (2 Replies)
Discussion started by: mridula
2 Replies

8. Shell Programming and Scripting

mailing myself at regular intervals...

hi all, i wrote a script to mail myself using pine (modified) to keep remind of b'days. #!/bin/bash grep "`date +%D |awk -F/ '{print $2+1, $1+0}'`" dataFile >/home/username/mailme if test -s /home/username/mailme then pine -I '^X,y' -subject "Birthday Remainder" username... (4 Replies)
Discussion started by: timepassman
4 Replies

9. Shell Programming and Scripting

Date Intervals

I posted a question on date intervals about a month back asking about how I could be able to go about a user entering the starting year/month/day and an ending year/month/day and then the script automatically cycling through each day of each month of each year that the user has specified. I... (7 Replies)
Discussion started by: yongho
7 Replies

10. Shell Programming and Scripting

How to perform Date Intervals?

I have a 300 line script which generates key performance indicators for one of our systems. Since I just started learning sh/ksh half a month ago there's still a lot I haven't had experience with yet. Currently, the script generates the report for a specific day. (It takes the date specified by... (2 Replies)
Discussion started by: yongho
2 Replies
Login or Register to Ask a Question