Regular Expression problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regular Expression problem
# 1  
Old 10-06-2010
Network Regular Expression problem

I have two input files (given below) and to compare each line of the File1 with each line of File2 starts with '>sample1'. If a match occurs and that matched line in the File2 contains another line or sequence of lines starting with "Chr" they have to be displayed in output file with that sample. If a match occurs and the matched line in File2 does not contain a 'Chr' line(s) it has be omitted or not taken into consideration for output file. For easy understanding, I marked the matched lines in file1 and file2 with blue color which are taken into consideration for final output. I maked the matched lines in file1 with file2 (which doesnt contain 'Chr' lines) in red color which are not taken into account for output. The final output to be selected also given below. [COLOR="Blue"]PLS. KINDLY HELP ME FOR THE SHELL SCRIPTING OF THIS. [/COLOR



File1:
>sample1:1:1:1057:7503#0 0 0
>sample1:1:1:1057:12664#0 0 0
>sample1:1:1058:8130#0 5 830
>sample1:1:1:1059:6357#0 0 0
>sample1:1:1:1059:10418#0 0 0
>sample1:1:1:1059:12084#0 1 1
>sample1:1:1:1060:11510#0 0 0
>sample1:1:1:1060:5177#0 0 0
>sample1:1:1:1061:8105#0 0 0
>sample1:1:1:1063:6105#0 0 0
>sample1:1:1:1064:11266#0 0 0
>sample1:1:1:1066:5654#0 0 0
>sample1:1:1:1067:10266#0 0 0
>sample1:1:1:1068:2100#0 0 0
>sample1:1:1:1069:3450#0 0 0
>sample1:1:1:1070:7530#0 0 0
>sample1:1:1:1071:8627#0 0 0
>sample1:1:1:1071:8552#0 0 0
>sample1:1:1:1072:7060#0 0 0
>sample1:1:1:1073:7329#0 0 0
>sample1:1:1:1073:20394#0 0 0
>sample1:1:1:1074:7081#0 0 0
>sample1:1:1:1076:1654#0 0 0
>sample1:1:1:1077:15575#0 0 0
>sample1:1:1:1077:15683#0 0 0

File2:
>sample1:1:1:1056:8164#0 1 1
Chr21 +25913822 2
>sample1:1:1:1057:7503#0 0 0
>sample1:1:1:1057:18666#0 1 1
Chr21 +25913822 2
>sample1:1:1:1057:1725#0 1 1
Chr21 +25913822 2
>sample1:1:1:1057:12664#0 0 0
>sample1:1:1:1057:18537#0 1 1
Chr21 +25913822 2
>sample1:1:1:1058:8130#0 5 830
Chr19 +52245923 1
Chr17 +69679873 1
Chr23 +52121254 1
Chr11 +100949523 1
Chr8 +28333267 1

>sample1:1:1:1058:19619#0 1 1
Chr21 +25913822 2
>sample1:1:1:1059:6357#0 0 0
>sample1:1:1:1059:10418#0 0 0
>sample1:1:1:1059:12084#0 1 1
Chr12 -19596251 2

>sample1:1:1:1060:13498#0 1 1

Output:
sample1:1:1:1058:8130#0 5 830
Chr19 +52245923 1
Chr17 +69679873 1
Chr23 +52121254 1
Chr11 +100949523 1
Chr8 +28333267 1
sample1:1:1:1059:12084#0 1 1
Chr12 -19596251

Last edited by hravisankar; 10-08-2010 at 03:27 PM.. Reason: New posting for shell scripting. Help me
# 2  
Old 10-06-2010
You posted the input data, but it would be kind to post us the expected result/output.

Regards
# 3  
Old 10-06-2010
Quote:
Originally Posted by hravisankar
I have to read a file- line by line using shell script. the format should be exactly same as given below

Chr18:4000-4010
Chr20:393939-400303
Chr30:38838-30020

I already posted a thread and 2 answers did not read the data like that. ...
That's because your problem statement is quite vague. A "read" operation does not "have" a format, which is why "read" cannot "be" in any format.

Your input data in a file or from a pipe "has" or "is in" some particular format. So, I'll assume that this -

Code:
Chr18:4000-4010
Chr20:393939-400303
Chr30:38838-30020

is the format of your input data. Again, I'll assume that this data is in a file, as opposed to a pipe stream.

In the shell, you'd read a file like so -

Code:
$
$ # display the content of the file. My file is called "f32", yours may be different.
$ cat f32
Chr18:4000-4010
Chr20:393939-400303
Chr30:38838-30020
$
$ # read data from input file "f32"
$ while read LINE; do   echo "Oh my! I've now read this line => $LINE"; done < f32
Oh my! I've now read this line => Chr18:4000-4010
Oh my! I've now read this line => Chr20:393939-400303
Oh my! I've now read this line => Chr30:38838-30020
$
$

HTH,
tyler_durden
This User Gave Thanks to durden_tyler For This Post:
# 4  
Old 10-07-2010
Shell scripting problem

My input file (data1) is like this
Code:
Chr8:4000-4500
Chr10:4000-4600

I written a shell program like this.
Code:
while read LINE;
do   echo "$LINE";
samtools faidx Bos_taurusUMD3.fa "$line";
done < data1


It has to read a line like Chr8:4000-4500 from input file and that line has to be executed in a command like 'samtools faidx Bos_taurusUMD3.fa
Code:
Chr8:4000-4500'. Then I will get the sequence like this.
>Chr8:4000-4500
TAATTCGTTTTTCTTTTTTCCTCTCTGACTCATTTATTTGTACCATTCTATCTTCTAATT
CACTAATCTTATCTTCTGCCTCTGTTATTCTACTATTTGTCGCCTCCAGAGTGTTTTTGA
TCTCATTTATTGCATTATTCATTATATATTGACTCTTTTTTATGTCTTCTAGGTCCTTGT
TAAACCTTTCTTGCATCTTCTCAATCCTTGTCTCCAGGTTATTTATCTGTGATTCCATTT
TGATTTCAAGATTTTGGATCAATTTCACTATCATTATTCAGAATTCTTTATCAGGTAGAT
TCCTTATCTCTTCCTCTTTTGTTTTGTTTGGTGGGCATTTATCCTGTTCCTTTACCTGCT
GGGTATTCCTCTGTCTCTTCATCTTGTTTATATTGCTGAGTTTGGGGTGTCCTTTCTGTA
TTCTGGCAGTTTGTGGAGTTCTCTTTATTGTGGAGTTTCCTCGCTGTGTATGGGTTTGTA
CAGGTGGCTTGTCAAGGTTTC

but when I execute the shell it is displaying the output like this
Code:
 Chr8:4000-4500
>
Chr10:4000-4600
>

but not displaying the sequence.but I execute the command at $ prompt like
$samtools faidx Bos_taurusUMD3.fa Chr8:4000-4500
it is running and displaying the sequence.

How to execute it in my shell by reading each line from input line, execute it in command and display the sequences.
KINDLY HELP ME

Last edited by Scott; 10-07-2010 at 08:09 PM.. Reason: CODE TAGS PLEASE...
# 5  
Old 10-07-2010
try like this
Code:
samtools faidx Bos_taurusUMD3.fa "$line"

This User Gave Thanks to ygemici For This Post:
# 6  
Old 10-07-2010
Reading input line problem

Thank u very much for your reply. I used as u specified "$line" in samtools command like samtools faidx Bos_taurusUMD3.fa "$line";

It is giving out like this. Sequence was not given. it displays it as > only.

Code:
Chr8:86884850-86884997
>
ChrX:96383583-96383703
>
Chr15:33347613-33347720
>
~

Kindly help me how to execute that to generate the sequence.

Last edited by Scott; 10-07-2010 at 08:10 PM..
# 7  
Old 10-07-2010
Quote:
Originally Posted by hravisankar
I have an input file in this format (shown below). I have to select the lines which doesnt followed by 'miR-" and to save such lines into an output file. For easy identification they are shown here in blue color. They have to be selected. Pls. help me to write a shell script to select those lines which doesnt followed by miR- and have to write them in a file.
KINDLY DO THE HELP

>sample1:1:1:1056:8164#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1057:7503#0 0 0
>sample1:1:1:1057:18666#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1057:1725#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1057:12664#0 0 0
>sample1:1:1:1057:18537#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1058:8130#0 1 1
miR-2396;Chr26:42482649-42482717 -1 2
>sample1:1:1:1058:19619#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1059:6357#0 0 0
>sample1:1:1:1059:10418#0 0 0

>sample1:1:1:1059:12084#0 1 1
miR-16-1;Chr12:19596200-19596290 -52 2
>sample1:1:1:1060:13498#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1060:11510#0 0 0
>sample1:1:1:1060:2691#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1060:5177#0 0 0
>sample1:1:1:1060:13599#0 1 1
miR-16-1;Chr12:19596200-19596290 -52 2
>sample1:1:1:1060:12022#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1061:8105#0 0 0
>sample1:1:1:1062:4635#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1062:2052#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1062:17129#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1063:6105#0 0 0
>sample1:1:1:1064:11266#0 0 0

>sample1:1:1:1065:5224#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1065:14605#0 1 1
miR-152;Chr19:39081165-39081250 +53 2
>sample1:1:1:1066:5654#0 0 0
>sample1:1:1:1066:10310#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1067:3521#0 1 1
miR-184;Chr21:25913771-25913853 +52 2
>sample1:1:1:1067:1055#0 1 1
Little issue, but simple code.

Code:
awk 'BEGIN{RS=ORS=">";FS="\n"} NF==2' infile

sample1:1:1:1057:7503#0 0 0
>sample1:1:1:1057:12664#0 0 0
>sample1:1:1:1059:6357#0 0 0
>sample1:1:1:1059:10418#0 0 0
>sample1:1:1:1060:11510#0 0 0
>sample1:1:1:1060:5177#0 0 0
>sample1:1:1:1061:8105#0 0 0
>sample1:1:1:1063:6105#0 0 0
>sample1:1:1:1064:11266#0 0 0
>sample1:1:1:1066:5654#0 0 0
>sample1:1:1:1067:1055#0 1 1
>

This User Gave Thanks to rdcwayx For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Problem with Regular expression in awk

Hi, I have a file with two fields in it as shown below 14,30 28,30 16,30 22,30 21,30 3,30 Fields are separated by comma ",". I've been trying to validate the file based on the condition "each field must be a numeric value" I am using HP-UX OS. I have tried the following awk... (4 Replies)
Discussion started by: meetsriharsha
4 Replies

2. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

3. Shell Programming and Scripting

SED (regular expression) problem ---

Hello, I would like to replace Line 187 of my file named run_example. The original line is below, including the spaces: celldm(1) = 6.00, I want it to become something like celldm(1) = 6.05, or celldm(1) = 6.10, where the number is stored in a variable called... (6 Replies)
Discussion started by: bluesmodular
6 Replies

4. Shell Programming and Scripting

Problem with regular expression

Witam, mam oto taki ciąg znaków: 8275610268 + 9012383215 =niepotrzebnytextPotrzebuję w bash'u wyciągnąć obie liczby (mają taką samą liczbę cyfr), zapisać je do osobnych zmiennych, ale coś nie idzie, kombinowałem z grepem, ale nie potrafię skleić tego wyrażenia regularnego, no i potem przypisać do... (7 Replies)
Discussion started by: menda90
7 Replies

5. Shell Programming and Scripting

problem with Regular expression as input in shell script

Hi, I have script which will take a string as input and search in a file. But when I want to search a pattern which has special characters script is ignoring it. For example: I want to search a pattern "\.tumblr\.com". shell script is removing \ (backslah) and trying to search... (7 Replies)
Discussion started by: Anjan1
7 Replies

6. Shell Programming and Scripting

Problem with a regular expression

Hello! I'm working with AWK, and i have this code: /<LOOP_TIME>/,/<\/LOOP_TIME>/ I want that match every everything between <LOOP_TIME> and </LOOP_TIME>, but not if the line have a "#" before the tags. Someone can help me? Thanks! (6 Replies)
Discussion started by: claw82
6 Replies

7. Shell Programming and Scripting

New line problem of regular expression

could anybody tell me how i can add/append a new line using regular expression in vi on AIX? i've tried several ways before, but all of them failed. e.g. :%s/$/\n/ :%s/^/\v\r/ :( (1 Reply)
Discussion started by: wrl
1 Replies

8. UNIX for Dummies Questions & Answers

Regular Expression Problem

this is how my xyz.log file loooks like :- info ( 816): CORE1116: Sun ONE Web Server 6.1SP5 B08/17/2005 22:09 info ( 817): CORE5076: Using from info ( 817): WEB0100: Loading web module in virtual server at info ( 817): WEB0100: Loading web module in virtual server at perl... (12 Replies)
Discussion started by: chris1234
12 Replies

9. Shell Programming and Scripting

Regular Expression problem

Hi guys I've been trying to write a regular expression. If I'm tryin to validate a sequence of characters as follows... AB1-232-623482-743 43/3 where a) any character after the "AB" can be any alphanumeric character b) the " 43/3" part is optional is there a quick neat way for me... (5 Replies)
Discussion started by: djkane
5 Replies

10. UNIX for Dummies Questions & Answers

Regular Expression Problem

Display all of the lines in a file that contain "Raspberry" followed later in the line by the letter "a" I tried: grep Raspberry*a filename that didn't work Anyone know a solution? (1 Reply)
Discussion started by: netmaster
1 Replies
Login or Register to Ask a Question