Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Matching string and assembling Post 302969875 by Xterra on Tuesday 29th of March 2016 01:23:36 PM
Old 03-29-2016
Matching string and assembling

I have been thinking how to address this particular task but is way beyond my knowledge.
I have a reference sequence, something like this:
Code:
>Reference
AGAGAGACCTGGAGAGAGAGTGACGATGAGCAGTGACGATGACGTACGATAGCAGTAGACGCA

and a input.txt file with thousand of short sequences, something like this
Code:
>read1 ori 498
AGAGAGACCTGGAGAGAGAGT
>read2 ori-rep 500
AGAGAGACCTGGAGAGAGAGT
>read3 1-misma 456
GGAGAGACCTGGAGAGAGAGT
>read4 2-misma 456
TGAGAGACCTGGAGAGAGAGA
>read5 ori-rev 532
ACTCTCTCTCCAGGTCTCTCT
>read6 ori-rev-1-misma 499
ACTCTCTCTCCAGGTCTCTCC
>read7 medium 512
AGAGAGAGTGACGATGAGCAG
>read8 last 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read9 last rep 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read10 last gap 488
AGTGACGATGACGTACGATAGCAGTAGACGA
>read11 nomatch1 500
GGGGGGAAAAAGCGTGCGT
>read12 nomatch2 500
CCCCGGGATGACGATGACGATGACGATGACGATGAC
>read13 nomatch3 550
GGGGTGCGAAAAAACCCCCGGGGTGG
>read14 nomatch4 543
TTTTTTTTTTAAAAAGCCGCGCTTTTTTT
>read15 nomatch5 543
TTTTTTTTTTAAAAAGCCGCGCTTTTTAA

The output file should contain the following:
1. All sequences that match the reference sequence 100% (in my example, sequences 1, 2, 7, 8 and 9)
2. If a sequence does not match the reference, it should reversed and complemented (A=>T; T=>A; C=>G; G=>C), and run against the reference sequence for a second time. If it matches, it should be included in the output file as reversed/complemented sequence (sequences 5)
3. All sequences containing 1 or 2 mismatches should be included without changes (sequences 3 and 4)
4. All sequences that after being reversed and complemented contain 1 or 2 mismatches should also be included as reversed/complemented sequences (sequences 6)
5. All sequences missing 1 character (sequence 10)

Resulting in the following outfile
Code:
>read1 ori 498
AGAGAGACCTGGAGAGAGAGT
>read2 ori-rep 500
AGAGAGACCTGGAGAGAGAGT
>read3 1-misma 456
GGAGAGACCTGGAGAGAGAGT
>read4 2-misma 456
TGAGAGACCTGGAGAGAGAGA
>read5 ori-rev 532
AGAGAGACCTGGAGAGAGAGT
>read6 ori-rev-1-misma 499
GGAGAGACCTGGAGAGAGAGT
>read7 medium 512
AGAGAGAGTGACGATGAGCAG
>read8 last 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read9 last rep 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read10 last gap 488
AGTGACGATGACGTACGATAGCAGTAGACGA

The second outfile should be based on the first outfile. Here, I would like to assemble all sequences into one by overlapping the matching portions and name the new reference with the input file name. An "N" should be inserted if a variable position is found:
Code:
>input
NGAGAGACCTGGAGAGAGAGNGACGATGAGCAGTGACGATGACGTACGATAGCAGTAGACGCA

I know perl will probably be the best way to go. However, my understanding about perl is quite limited and I do not think AWK would be the best way to solve this task
Any help will be greatly appreciated
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed problem - replacement string should be same length as matching string.

Hi guys, I hope you can help me with my problem. I have a text file that contains lines like this: 78 ANGELO -809.05 79 ANGELO2 -5,000.06 I need to find all occurences of amounts that are negative and replace them with x's 78 ANGELO xxxxxxx 79... (4 Replies)
Discussion started by: amangeles
4 Replies

2. Shell Programming and Scripting

String matching

for a certain directory, I want to grep a particular file called ABCD so what I do is ls /my/dir | grep -i "ABCD" | awk '{print $9}' however, there is also this file called ABCDEFG, the above command would reurn both file when I only want ABCD, please help! (3 Replies)
Discussion started by: mpang_
3 Replies

3. UNIX for Dummies Questions & Answers

Matching string

Hello, i have a program where i have to get a character from the user and check it against the word i have and then replace the character in a blank at the same position it is in the word. (7 Replies)
Discussion started by: nehaquick
7 Replies

4. Shell Programming and Scripting

Help assembling script

I am trying to figure out how to write a bash script to process a file in order to make it more user readable. The file to be processed is quite uniform, every line starts with a 32 bit Unix timestamp in hexadecimal format, then a single tab charcter (0x09) then a string of text. What I want to... (1 Reply)
Discussion started by: stumpyuk
1 Replies

5. Shell Programming and Scripting

matching a string

I have a requirement of shell script where i need to read the File name i.e ls -t | head -1 and Match that Filename with some delimited values which are in a separate File. For Example i am reading the File name i.e (ls -t | head -1) after that i need to read one more sequential file which... (2 Replies)
Discussion started by: dsdev_123
2 Replies

6. Shell Programming and Scripting

String matching

I have a string like ab or abc of whatever length. But i want to know whether another string ( for example, abcfghijkl, OR a<space> bcfghijkl ab<space> cfghijkl OR a<space>bcfghijkl OR ab<space> c<space> fghijkl ) starts with ab or abc... space might existing on the longer string... If so, i... (4 Replies)
Discussion started by: nram_krishna@ya
4 Replies

7. Shell Programming and Scripting

Matching string from input to string of file

Hi, i want to know how to compare string of file with input string im trying following code: file_no=`paste -s -d "||||\n" a.txt | cut -c 1` #it will return collection number from file echo "enter number" read " curr_no" if ; then echo " current number already present" fi ... (4 Replies)
Discussion started by: a_smith
4 Replies

8. UNIX for Dummies Questions & Answers

finding, copying, assembling

Hi everybody, I've been running some analyses, the results of which have been stored in a sequential manner with a directory structure like step0, step1, step2, ... for iterations 0-2, for example. Each iteration contains several nested folders, with three pieces of information I need. I need to... (1 Reply)
Discussion started by: JDenton
1 Replies

9. Shell Programming and Scripting

Assembling the Pieces of a Regular Expression

Hello all. I'm scripting in ksh and trying to put together a regular expression. I think my logic is sound, but I'm doing the head-against-the-wall routine while trying to put the individual pieces together. Can anybody lend some suggestions to the below problem? I'm taking a date in the... (2 Replies)
Discussion started by: Michael_K
2 Replies

10. UNIX for Dummies Questions & Answers

Matching string

Hello all, i am trying to match a string and based on that proceed with my script or error out... i have a file called /tmp/sta.log that will be populated by oracle's spooling..it can have a output of either 2 of the below (OPEN or errors/ORACLE not avaiable) $ cat /tmp/sta.log OPEN $ $... (2 Replies)
Discussion started by: abdul.irfan2
2 Replies
All times are GMT -4. The time now is 05:28 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy