Shell scripting for this sequence to compare


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shell scripting for this sequence to compare
# 1  
Old 10-08-2010
Question Shell scripting for this sequence to compare

I have two input files (given below) and to compare each line of the File1 with each line of File2 starts with '>sample1'. If a match occurs and that matched line in the File2 contains another line or sequence of lines starting with "Chr" they have to be displayed in output file with that sample. If a match occurs and the matched line in File2 does not contain a 'Chr' line(s) it has be omitted or not taken into consideration for output file. For easy understanding, I marked the matched lines in file1 and file2 with blue color which are taken into consideration for final output. I maked the matched lines in file1 with file2 (which doesnt contain 'Chr' lines) in red color which are not taken into account for output. The final output to be selected also given below. PLS. KINDLY HELP ME FOR THE SHELL SCRIPTING OF THIS. [/COLOR



File1:
[COLOR="Blue"]>sample1:1:1:1057:7503#0 0 0
>sample1:1:1:1057:12664#0 0 0
>sample1:1:1058:8130#0 5 830
>sample1:1:1:1059:6357#0 0 0
>sample1:1:1:1059:10418#0 0 0
>sample1:1:1:1059:12084#0 1 1
>sample1:1:1:1060:11510#0 0 0
>sample1:1:1:1060:5177#0 0 0
>sample1:1:1:1061:8105#0 0 0
>sample1:1:1:1063:6105#0 0 0
>sample1:1:1:1064:11266#0 0 0
>sample1:1:1:1066:5654#0 0 0
>sample1:1:1:1067:10266#0 0 0
>sample1:1:1:1068:2100#0 0 0
>sample1:1:1:1069:3450#0 0 0
>sample1:1:1:1070:7530#0 0 0
>sample1:1:1:1071:8627#0 0 0
>sample1:1:1:1071:8552#0 0 0
>sample1:1:1:1072:7060#0 0 0
>sample1:1:1:1073:7329#0 0 0
>sample1:1:1:1073:20394#0 0 0
>sample1:1:1:1074:7081#0 0 0
>sample1:1:1:1076:1654#0 0 0
>sample1:1:1:1077:15575#0 0 0
>sample1:1:1:1077:15683#0 0 0

File2:
>sample1:1:1:1056:8164#0 1 1
Chr21 +25913822 2
>sample1:1:1:1057:7503#0 0 0
>sample1:1:1:1057:18666#0 1 1
Chr21 +25913822 2
>sample1:1:1:1057:1725#0 1 1
Chr21 +25913822 2
>sample1:1:1:1057:12664#0 0 0
>sample1:1:1:1057:18537#0 1 1
Chr21 +25913822 2
>sample1:1:1:1058:8130#0 5 830
Chr19 +52245923 1
Chr17 +69679873 1
Chr23 +52121254 1
Chr11 +100949523 1
Chr8 +28333267 1
>sample1:1:1:1058:19619#0 1 1
Chr21 +25913822 2
>sample1:1:1:1059:6357#0 0 0
>sample1:1:1:1059:10418#0 0 0
>sample1:1:1:1059:12084#0 1 1
Chr12 -19596251 2
>sample1:1:1:1060:13498#0 1 1

Output:
sample1:1:1:1058:8130#0 5 830
Chr19 +52245923 1
Chr17 +69679873 1
Chr23 +52121254 1
Chr11 +100949523 1
Chr8 +28333267 1
sample1:1:1:1059:12084#0 1 1
Chr12 -19596251
# 2  
Old 10-08-2010
Code:
awk 'BEGIN{FS=RS;RS=">"}NR==FNR{a[$1]=$0}/Chr/ && a[$1]{printf}'  file1 file2

This User Gave Thanks to danmero For This Post:
# 3  
Old 10-08-2010
MySQL

Code:
sed "" File1 | while read -r l ; do sed -n "/$l/,/>sample1/p" File2 | sed -e 'N;' -e '/>sample1.*\n>sample1/d' | 
sed -e '/Chr.*/N;' -e 's/\(Chr.*\)\n>sample1.*/\1/' ; done
>sample1:1:1:1058:8130#0 5 830
Chr19 +52245923 1
Chr17 +69679873 1
Chr23 +52121254 1
Chr11 +100949523 1
Chr8 +28333267 1
>sample1:1:1:1059:12084#0 1 1
Chr12 -19596251 2

# 4  
Old 10-08-2010
Problem in reading largers sequences

Thanks a lot for this script. It worked well with small data file. My datafiles file1 and file2 contains minimum 370MB. when I execute this command, after 5 minutes i am getting $ prompt without any result. If possible, can u help me anything to add for executing more data.

Quote:
Originally Posted by danmero
Code:
awk 'BEGIN{FS=RS;RS=">"}NR==FNR{a[$1]=$0}/Chr/ && a[$1]{printf}'  file1 file2

# 5  
Old 10-08-2010
Quote:
Originally Posted by hravisankar
Thanks a lot for this script. It worked well with small data file. My datafiles file1 and file2 contains minimum 370MB.
Before asking for a solution you should bring up-front all variables as: file(s) size, system specs, your own knowledge, etc...
This User Gave Thanks to danmero For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to find a missing file sequence using shell scripting?

Hey guys, I want the below files to be processed with the help of BASH so that i will be able to find the missing file names : PP01674520141228X.gz PP01674620141228X.gz PP01674820141228X.gz PP01674920141228X.gz PP01675420141228X.gz PP01675520141228X.gz PP01676020141228X.gz . . . .... (4 Replies)
Discussion started by: TANUJ
4 Replies

2. UNIX for Dummies Questions & Answers

How to compare to values returned from sql in shell scripting?

hey i am using this code to connect to sql , store the value in variable and then compare it with another variable after some time by executing the same query but the desired result is not coming #!/bin/bash val=$(sqlplus -s rte/rted2@rel76d2 <<ENDOFSQL set heading off set feedback off... (11 Replies)
Discussion started by: ramsavi
11 Replies

3. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

4. Shell Programming and Scripting

Compare two files using shell scripting

Hi, I need to compare two files using shell scripting. Say like: File1 AAAAAAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBBBBBBB CCCCCCCCCCCCCCCCCCCCCCCCC eeeeeeeeeeeeeeeeeeeeeeeee DDDDDDDDDDDDDDDDDDDDDDDDDDDD File2 BBBBBBBBBBBBBBBBBBBBB DDDDDDDDDDDDDDDDDDDDDDDDDDDD AAAAAAAAAAAAAAAAAAAA ... (6 Replies)
Discussion started by: roshParab
6 Replies

5. Shell Programming and Scripting

How to insert a sequence number column inside a pipe delimited csv file using shell scripting?

Hi All, I need a shell script which could insert a sequence number column inside a dat file(pipe delimited). I have the dat file similar to the one as shown below.. |A|B|C||D|E |F|G|H||I|J |K|L|M||N|O |P|Q|R||S|T As shown above, the column 4 is currently blank and i need to insert sequence... (5 Replies)
Discussion started by: nithins007
5 Replies

6. Shell Programming and Scripting

Shell scripting for this sequence

KINDLY HELP ME FOR SHELL SCRIPTING FOR THIS TASK. My input file consists of thousands of sequence in this format. The given input file consists of four sequences which are starting with ‘>’ symbol (each sequence shown in different colour for easy understanding). I have to use a command at $... (3 Replies)
Discussion started by: kswapnadevi
3 Replies

7. Shell Programming and Scripting

Shell Scripting: Compare pattern in two files and merge the o/p in one.

one.txt ONS.1287677000.820.log 20Oct2010 ONS.1287677000.123.log 21Oct2010 ONS.1287677000.456.log 22Oct2010 two.txt ONS.1287677000.820.log:V AC CC EN ONS.1287677000.123.log:V AC CC EN ONS.1287677000.820.log:V AC CC EN In file two.txt i have to look for pattern which column one... (17 Replies)
Discussion started by: saluja.deepak
17 Replies

8. Shell Programming and Scripting

Shell scripting : Help Me for this sequence

I have two input files (given below) and to compare each line of the File1 with each line of File2 starts with '>sample1'. If a match occurs and that matched line in the File2 contains another line or sequence of lines starting with "Chr" they have to be displayed in output file with that sample.... (9 Replies)
Discussion started by: hravisankar
9 Replies

9. Shell Programming and Scripting

How to compare a command line parameter with -- in shell scripting

Hi, I need to check if a parameter provided at the command line is equal to --.How can i do that ? Please help me. Thanks and Regards, Padmini (4 Replies)
Discussion started by: padmisri
4 Replies

10. Shell Programming and Scripting

difference between AIX shell scripting and Unix shell scripting.

please give the difference between AIX shell scripting and Unix shell scripting. (2 Replies)
Discussion started by: haroonec
2 Replies
Login or Register to Ask a Question