RNA-seq analysis


 
Thread Tools Search this Thread
# 1  
RNA-seq analysis

I am processing RNA-seq data files that have been aligned using RUM. One of the output files is a *.sam that includes:

Unique alignments
Non-unique alignments
original read files

I want to extract only the unique alignments by pulling out alignments that have "IH:i:1" (indicates this read had only one alignment).

I have tried.... grep "IH:i:1" file.sam > filtered.sam

but this will also return "IH:i:11 IH:i:12" etc etc. I have also tried "IH:i:1 " which returns nothing (I believe it's tab delimited)

Any suggestions?
# 2  
Try with grep -w to match exact word
grep -w 'IH:i:1' file.sam | uniq -u >filtered.sam
This User Gave Thanks to kg_gaurav For This Post:
# 3  
Quote:
Originally Posted by genGirl23
I have tried.... grep "IH:i:1" file.sam > filtered.sam

but this will also return "IH:i:11 IH:i:12" etc etc. I have also tried "IH:i:1 " which returns nothing (I believe it's tab delimited)
As you didn't post a sample of your file i am left to guessing, but no problem: just give me a moment while i inspect your file with my crystal ball....

OK, somehow the device must be broken, but i can still give you some pointers:

"grep" always searches for the expression you give it. "1" will find "1", but also "11", "12", "1blabla", etc.. The trick therefore is to make to expression longer, up to a point where all the wrong lines are excluded.

Having said this: if you are sure (instead of just guessing) that the expression is followed by a tab you could add this tab as part of the expression. grep accepts this, but you will have to enclose your expression in single quotes to protect it from the shell evaluation process. In the following "<TAB>" means a literal tab character:

Code:
grep 'IH:i:1<TAB>' /path/to/infile > filtered.file

You can verify the character following being a tab by opening the file in a "vi" editor and enter ":set list" to display all the non-printable characters. Tabs will be displayed as "^I".

In case you just don't want any digit to follow your expression you simply forbid that instead of trying to find out what follows your expression:

Code:
grep 'IH:i:1[^0-9]' /path/to/infile > filtered.file

"[^0-9]" means "any character except numbers 0-9". The "[...]" means "any one character inside" and the "^" at the beginning reverses the meaning.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Science: Mathematics
Difficulty: Easy
A scalene triangle has two sides of equal length.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Easy seq Question

Hi! I'm trying to do this: 1 - 2 - 3 - 4 - 5 - I'm using seq for this: seq 1 20 > filename.txt How do I get the "-"? I've tried -f per man but can't get anything to work. Also, is there an easier or better way than using sequence? Thanks! (6 Replies)
Discussion started by: TonyBe
6 Replies

2. Shell Programming and Scripting

awk command to find seq lines

I have a below file FILE.cfg JAN_01 VAR1=4 VAR2=SUM VAR3=PRIVATE JAN_10 VAR1=44 VAR2=GUN VAR3=NATUR JAN_20 VAR1=3 VAR2=TQN VAR3=COMMA code: (JAN_10 is argument passed from script) (6 Replies)
Discussion started by: Roozo
6 Replies

3. Shell Programming and Scripting

How to show first 0 using seq or +1 count?

Greetings, Using linux based OS and KSH. I m trying to make a simple script to parse some logs to show a count per hour on a specific alarm starting from midnight to the current hour. So I format my "HOUR" variable to show the current time and so I can use it in the following bit of code.... (6 Replies)
Discussion started by: Sekullos
6 Replies

4. Shell Programming and Scripting

sorting a fixed width seq file

I have a file like this... 2183842512010-11-25 15379043 453130325 2386225062010-11-30 4946518 495952336 2386225062010-11-30 4946518 495952345 2386225062010-11-25 262066688 -516224026 2679350512010-11-25 262066688 -516224124 3196089062010-11-25 262066688 203238229... (5 Replies)
Discussion started by: issaq84mohd
5 Replies

5. UNIX for Dummies Questions & Answers

Help with seq (print a series of dates)

Assuming one does not have such luxuries as bash, zsh, jot, rs, perl, etc. what is the most elegant way to print out a formatted date series like this: 01-01-2010 01-02-2010 01-03-2010 ... 02-01-2010 02-02-2010 ... Can I accomplish this with just basic shell builtins and seq, or... (3 Replies)
Discussion started by: uiop44
3 Replies

6. Shell Programming and Scripting

Using seq (Or alternative)

I usually just browse the forum/google for answers, however I've been stuck on a problem for a number of hours now and I've decided to join up and actually ask I've searched the forum ad naseum in an attempt to find answer to my query, however so far I have been unsuccessful. I'm no expert... (3 Replies)
Discussion started by: gtc
3 Replies

7. Shell Programming and Scripting

Using Seq As A Variable With Padded Digits

Hi all. Im trying to use a sequence in a while loop like this below. I need it for navigating a year, month, day folder structure where a user can input the start date and have it go to the desired end date. The script will grab a certain file on each day then move onto the next. Ive got all that... (3 Replies)
Discussion started by: Grizzly
3 Replies

8. Shell Programming and Scripting

declaring variable with for $(seq)

Hi guys. i have the following script: 1 #!/bin/bash 2 linkcount=$(grep "/portal" tickets | wc -l) 3 grep "/portal" tickets > links 4 for i in $(seq 1 $linkcount); do 5 echo "BLYAT" 6 let link$i=$(sed -n "$i"p links) 7 echo $ 8 done the problem is, that "let" can`t... (1 Reply)
Discussion started by: neverhood
1 Replies

9. Shell Programming and Scripting

Move and rename files in seq. with padded digits

Greetings, I am new to scripting, but find if I can see the code working for a given problem, then I can eventually figure it out. (9 Replies)
Discussion started by: rocinante
9 Replies

10. Shell Programming and Scripting

script to loop and check jumping seq.

Hi, Normally, I will manually to use "ll" command to list the following file from \FILE\CACHE\ directory and check the jump seq. Can I write a script to loop or/and check jump seq file (if jumped seq and show "missing seq no" message for me) -rw-rw----+ 1 user develop 14012 Sep 4... (1 Reply)
Discussion started by: happyv
1 Replies

Featured Tech Videos