Removing repeates sequences


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing repeates sequences
# 1  
Old 10-08-2010
Removing repeates sequences

Hai,
How to remove the repeated 'Chr's in different sequences. In the given example, Chr19 is repeated in two samples
with the same number i.e. +52245923. How to remove one of the entry in any of the samples and to give the range for each
Chr which is -20 for minimum range value and +120 for maximum range value. For Chr19 it will be displayed as
Chr19:52245903-5224546043 in output file (i.e., for Chr19, +5224593 given. So -20 from this value is min.range and +120 is max. range)
No impotance for the sign (+ or -) in the input data. The final output also givn for easy understanding.


INPUT FILE:
>sample1:1:1:1058:8130#0 5 830
Chr19 +52245923 1
Chr17 +69679873 1
Chr23 +52121254 1
>sample1:1:1:1060:5177#0 5 67
Chr19 +52245923 1
Chr17 -69679873 1
Chr15 +82202352 1
Chr5 +30440548 1

OUTPUT FILE:
>sample1:1:1:1058:8130#0 5 830
Chr19:52245903-52246043
Chr17:69679853-69679993
Chr23:52121234-52121374

>sample1:1:1:1060:5177#0 5 67
Chr15:82202332-82202472
Chr5:30440528-30440628

PLS. HELP ME TO WRITE A SHELL SCRIPT FOR THIS SEQUENCE WHICH HELPS A LOT IN BIOINFORMATICS RESEARCH.
THANKS IN ADVANCE.
i.e., Chr19 and Chr17 are removed from second sample because they are repeated. For all Chrs we replaced the value with a range in the above format shown in output.
# 2  
Old 10-08-2010
Code:
awk ' !a[$1]++ { if(! /^Chr/){print} else { print $1 ":" $2-20 "-" $2+120 } } ' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicate sequences and modifying a text file

Hi. I've tried several different programs to try and solve this problem, but none of them seem to have done exactly what I want (and I need the file in a very specific format). I have a large file of DNA sequences in a multifasta file like this, with around 15 000 genes: ... (2 Replies)
Discussion started by: 4galaxy7
2 Replies

2. Shell Programming and Scripting

Escape Sequences

Hi Gurus, Escape sequences \n, \t, \b, \t, \033(1m are not working. I just practiced these escape sequences. It worked first. Later its not working. Also the command - echo inside the script editor shows as shaded by a color. Before that echo inside the script editor wont show like this.... (4 Replies)
Discussion started by: GaneshAnanth
4 Replies

3. Programming

Searching String which repeates

Hi All, Any one please help me with the below scenario. I have a file with the data like below This is the integer variable name is <abc1> This is the Float variable name is <abc1> This is the integer variable name is <abc2> This is the Float variable name is <abc2> This is the integer... (1 Reply)
Discussion started by: jhon1257
1 Replies

4. Shell Programming and Scripting

Removing specific sequences from file

My file looks like this But I need to remove the entry with the identifier >Reference1 along with the entire sequence. Thus, I will end up having the following file Thanks in advance! (2 Replies)
Discussion started by: Xterra
2 Replies

5. Shell Programming and Scripting

Removing low frequency sequences

If I have a file with the following information And I would like to remove all the sequences with Freq less than 3, so I end up having the following file: I am currently using awk to accomplish this task but I am not getting the results I actually want. Any help will be greatly appreciated. (3 Replies)
Discussion started by: Xterra
3 Replies

6. Shell Programming and Scripting

trimming sequences

My file looks like this: But I would like to 'trim' all sequences to the same lenght 32 characters, keeping intact all the identifier (>GHXCZCC01AJ8CJ) Would it be possible to use awk to perform this task? (2 Replies)
Discussion started by: Xterra
2 Replies

7. Programming

Trigraph sequences

Hi, i have read trigraph sequence in The C99 Draft (N869, 18 January, 1999) printf("Eh???/n"); will produce printf("Eh?\n"); what does that mean? i tried that but i am getting the same output i.e Eh???/n. what actually these tri graph characters are? any idea why ,when and... (1 Reply)
Discussion started by: MrUser
1 Replies

8. UNIX for Advanced & Expert Users

Deal with binary sequences

Hello, I have come across the necessity for me to deal with binary sequences and I had a few questions. 1- Does any UNIX scripting language provide any tool or command for converting text data to binary sequences? Example of binary sequence: "0x97 0x93 0x85 0x40 0xd5 0xd6 0xd7" 2- If I want... (2 Replies)
Discussion started by: Indalecio
2 Replies

9. Solaris

Available escape sequences

:) Hi, Can any one help me to find available escape sequences in UNIX shell programming? ( Like \n, \c etc,. in C or C++) Iam generating one report using one of the script, in that it is very much essential. Regards, LOVE (6 Replies)
Discussion started by: Love
6 Replies

10. Shell Programming and Scripting

AWK and hex sequences

for file in `seq 1 256`; do printf "\x$file -- $file" ; done ; printf "\n" produces the wrong output. I want to show the ascii codes but need to output a hexidecimal number sequence. I know I should use awk to do this but i'm not sure how cause I forget. what is the awk equivelant of seq... (5 Replies)
Discussion started by: JoeTheGuy
5 Replies
Login or Register to Ask a Question