I have been thinking how to address this particular task but is way beyond my knowledge.
I have a reference sequence, something like this:
and a input.txt file with thousand of short sequences, something like this
The output file should contain the following:
1. All sequences that match the reference sequence 100% (in my example, sequences 1, 2, 7, 8 and 9)
2. If a sequence does not match the reference, it should reversed and complemented (A=>T; T=>A; C=>G; G=>C), and run against the reference sequence for a second time. If it matches, it should be included in the output file as reversed/complemented sequence (sequences 5)
3. All sequences containing 1 or 2 mismatches should be included without changes (sequences 3 and 4)
4. All sequences that after being reversed and complemented contain 1 or 2 mismatches should also be included as reversed/complemented sequences (sequences 6)
5. All sequences missing 1 character (sequence 10)
Resulting in the following outfile
The second outfile should be based on the first outfile. Here, I would like to assemble all sequences into one by overlapping the matching portions and name the new reference with the input file name. An "N" should be inserted if a variable position is found:
I know perl will probably be the best way to go. However, my understanding about perl is quite limited and I do not think AWK would be the best way to solve this task
Any help will be greatly appreciated
This is a tedious process, only for the first file, and it may take a while for the result to be returned - try
Not sure if sequence 10 is reported correctly for the condition you defined or just as it fits one of the conditions named before...
---------- Post updated at 22:24 ---------- Previous update was at 21:47 ----------
Minor simplification for the reverse/complement code:
Rudi
Thanks! First script is outputting the right sequences.
However, sequences 5 and 6 should be reversed and complemented to meet the criteria -all other ones should be reported "as is"
The second script is not given the desired output:
I will go over your first script to see if I can modify it to meet my needs.
Thanks a bunch!
Hello all,
i am trying to match a string and based on that proceed with my script or error out...
i have a file called /tmp/sta.log that will be populated by oracle's spooling..it can
have a output of either 2 of the below (OPEN or errors/ORACLE not avaiable)
$ cat /tmp/sta.log
OPEN
$
$... (2 Replies)
Hello all.
I'm scripting in ksh and trying to put together a regular expression. I think my logic is sound, but I'm doing the head-against-the-wall routine while trying to put the individual pieces together. Can anybody lend some suggestions to the below problem?
I'm taking a date in the... (2 Replies)
Hi everybody,
I've been running some analyses, the results of which have been stored in a sequential manner with a directory structure like step0, step1, step2, ... for iterations 0-2, for example. Each iteration contains several nested folders, with three pieces of information I need. I need to... (1 Reply)
Hi,
i want to know how to compare string of file with input string
im trying following code:
file_no=`paste -s -d "||||\n" a.txt | cut -c 1`
#it will return collection number from file
echo "enter number"
read " curr_no"
if ; then
echo " current number already present"
fi
... (4 Replies)
I have a string like ab or abc of whatever length. But i want to know whether another string ( for example, abcfghijkl, OR a<space> bcfghijkl ab<space> cfghijkl OR a<space>bcfghijkl OR ab<space> c<space> fghijkl ) starts with ab or abc... space might existing on the longer string... If so, i... (4 Replies)
I have a requirement of shell script where i need to read the File name i.e ls -t | head -1 and Match that Filename with some delimited values which are in a separate File.
For Example i am reading the File name i.e (ls -t | head -1) after that i need to read one more sequential file which... (2 Replies)
I am trying to figure out how to write a bash script to process a file in order to make it more user readable. The file to be processed is quite uniform, every line starts with a 32 bit Unix timestamp in hexadecimal format, then a single tab charcter (0x09) then a string of text.
What I want to... (1 Reply)
Hello,
i have a program where i have to get a character from the user and check it against the word i have and then replace the character in a blank at the same position it is in the word. (7 Replies)
for a certain directory, I want to grep a particular file called ABCD so what I do is
ls /my/dir | grep -i "ABCD" | awk '{print $9}'
however, there is also this file called ABCDEFG, the above command would reurn both file when I only want ABCD, please help! (3 Replies)
Hi guys, I hope you can help me with my problem.
I have a text file that contains lines like this:
78 ANGELO -809.05
79 ANGELO2 -5,000.06
I need to find all occurences of amounts that are negative and replace them with x's
78 ANGELO xxxxxxx
79... (4 Replies)