The output file should contain the following:
1. All sequences that match the reference sequence 100% (in my example, sequences 1, 2, 7, 8 and 9)
2. If a sequence does not match the reference, it should reversed and complemented (A=>T; T=>A; C=>G; G=>C), and run against the reference sequence for a second time. If it matches, it should be included in the output file as reversed/complemented sequence (sequences 5)
3. All sequences containing 1 or 2 mismatches should be included without changes (sequences 3 and 4)
4. All sequences that after being reversed and complemented contain 1 or 2 mismatches should also be included as reversed/complemented sequences (sequences 6)
5. All sequences missing 1 character (sequence 10)
Resulting in the following outfile
Code:
>read1 ori 498
AGAGAGACCTGGAGAGAGAGT
>read2 ori-rep 500
AGAGAGACCTGGAGAGAGAGT
>read3 1-misma 456
GGAGAGACCTGGAGAGAGAGT
>read4 2-misma 456
TGAGAGACCTGGAGAGAGAGA
>read5 ori-rev 532
AGAGAGACCTGGAGAGAGAGT
>read6 ori-rev-1-misma 499
GGAGAGACCTGGAGAGAGAGT
>read7 medium 512
AGAGAGAGTGACGATGAGCAG
>read8 last 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read9 last rep 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read10 last gap 488
AGTGACGATGACGTACGATAGCAGTAGACGA
The second outfile should be based on the first outfile. Here, I would like to assemble all sequences into one by overlapping the matching portions and name the new reference with the input file name. An "N" should be inserted if a variable position is found:
I know perl will probably be the best way to go. However, my understanding about perl is quite limited and I do not think AWK would be the best way to solve this task
Any help will be greatly appreciated
Rudi
Thanks! First script is outputting the right sequences.
Code:
>read1 ori 498
AGAGAGACCTGGAGAGAGAGT
>read2 ori-rep 500
AGAGAGACCTGGAGAGAGAGT
>read3 1-misma 456
GGAGAGACCTGGAGAGAGAGT
>read4 2-misma 456
TGAGAGACCTGGAGAGAGAGA
>read5 ori-rev 532
ACTCTCTCTCCAGGTCTCTCT
>read6 ori-rev-1-misma 499
ACTCTCTCTCCAGGTCTCTCC
>read7 medium 512
AGAGAGAGTGACGATGAGCAG
>read8 last 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read9 last rep 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
>read10 last gap 488
AGTGACGATGACGTACGATAGCAGTAGACGA
However, sequences 5 and 6 should be reversed and complemented to meet the criteria -all other ones should be reported "as is"
The second script is not given the desired output:
Code:
>read1 ori 498
AGAGAGACCTGGAGAGAGAGT
ACTCTCTCTCCAGGTCTCTCT
>read2 ori-rep 500
AGAGAGACCTGGAGAGAGAGT
ACTCTCTCTCCAGGTCTCTCT
>read3 1-misma 456
GGAGAGACCTGGAGAGAGAGT
ACTCTCTCTCCAGGTCTCTCC
>read4 2-misma 456
TGAGAGACCTGGAGAGAGAGA
TCTCTCTCTCCAGGTCTCTCA
>read5 ori-rev 532
ACTCTCTCTCCAGGTCTCTCT
AGAGAGACCTGGAGAGAGAGT
>read6 ori-rev-1-misma 499
ACTCTCTCTCCAGGTCTCTCC
GGAGAGACCTGGAGAGAGAGT
>read7 medium 512
AGAGAGAGTGACGATGAGCAG
CTGCTCATCGTCACTCTCTCT
>read8 last 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
TGCGTCTACTGCTATCGTACGTCATCGTCACT
>read9 last rep 488
AGTGACGATGACGTACGATAGCAGTAGACGCA
TGCGTCTACTGCTATCGTACGTCATCGTCACT
>read10 last gap 488
AGTGACGATGACGTACGATAGCAGTAGACGA
TCGTCTACTGCTATCGTACGTCATCGTCACT
I will go over your first script to see if I can modify it to meet my needs.
Thanks a bunch!
Hello all,
i am trying to match a string and based on that proceed with my script or error out...
i have a file called /tmp/sta.log that will be populated by oracle's spooling..it can
have a output of either 2 of the below (OPEN or errors/ORACLE not avaiable)
$ cat /tmp/sta.log
OPEN
$
$... (2 Replies)
Hello all.
I'm scripting in ksh and trying to put together a regular expression. I think my logic is sound, but I'm doing the head-against-the-wall routine while trying to put the individual pieces together. Can anybody lend some suggestions to the below problem?
I'm taking a date in the... (2 Replies)
Hi everybody,
I've been running some analyses, the results of which have been stored in a sequential manner with a directory structure like step0, step1, step2, ... for iterations 0-2, for example. Each iteration contains several nested folders, with three pieces of information I need. I need to... (1 Reply)
Hi,
i want to know how to compare string of file with input string
im trying following code:
file_no=`paste -s -d "||||\n" a.txt | cut -c 1`
#it will return collection number from file
echo "enter number"
read " curr_no"
if ; then
echo " current number already present"
fi
... (4 Replies)
I have a string like ab or abc of whatever length. But i want to know whether another string ( for example, abcfghijkl, OR a<space> bcfghijkl ab<space> cfghijkl OR a<space>bcfghijkl OR ab<space> c<space> fghijkl ) starts with ab or abc... space might existing on the longer string... If so, i... (4 Replies)
I have a requirement of shell script where i need to read the File name i.e ls -t | head -1 and Match that Filename with some delimited values which are in a separate File.
For Example i am reading the File name i.e (ls -t | head -1) after that i need to read one more sequential file which... (2 Replies)
I am trying to figure out how to write a bash script to process a file in order to make it more user readable. The file to be processed is quite uniform, every line starts with a 32 bit Unix timestamp in hexadecimal format, then a single tab charcter (0x09) then a string of text.
What I want to... (1 Reply)
Hello,
i have a program where i have to get a character from the user and check it against the word i have and then replace the character in a blank at the same position it is in the word. (7 Replies)
for a certain directory, I want to grep a particular file called ABCD so what I do is
ls /my/dir | grep -i "ABCD" | awk '{print $9}'
however, there is also this file called ABCDEFG, the above command would reurn both file when I only want ABCD, please help! (3 Replies)
Hi guys, I hope you can help me with my problem.
I have a text file that contains lines like this:
78 ANGELO -809.05
79 ANGELO2 -5,000.06
I need to find all occurences of amounts that are negative and replace them with x's
78 ANGELO xxxxxxx
79... (4 Replies)