Program to match the id and replace one letter in the content
Hi all,
I have one file with a sequence and the other file which says the position and the letter to be changed. I have to match two files and replace content. Example is shown which will describe what I want to do. For example, file 1 has many sequences and few are shown below
sequence file:
and the second file has the id which is enclosed in between || in the first file and the letters to be changed. Now I have to match the id column with the header line in the seuqence file |P78363|. if they are same, do the change as mentioned and write in the output file.
the output file should contain the sequences of the changed letters and the change name in the header as highlighted (bold) in the below code
Could anyone please help with a script which can do this. It will be bery helpful
Note: this is not a school exercise.
How long are the lines in the "sequences"? Is the position to be changed always in the 1st line of a sequence? If not, is the <newline> at the end of each line included in the position count?
Does each "record" consist of one header line and two sequence lines?
Hi Don,
I am expecting the output file might be around 250kb to 300kb. All the sequences will have one header line starting with >sp.... The sequence line will have 60 letters each line. The change might happen anywhere not restricted to first line. The new record will start in new line with >sp and the end of the sequence will have *.
Thanks Kaavya
---------- Post updated at 03:35 PM ---------- Previous update was at 03:35 PM ----------
Hi Don,
The position count should start after the header line
Hi Don,
I am expecting the output file might be around 250kb to 300kb. All the sequences will have one header line starting with >sp.... The sequence line will have 60 letters each line. The change might happen anywhere not restricted to first line. The new record will start in new line with >sp and the end of the sequence will have *.
Thanks Kaavya
---------- Post updated at 03:35 PM ---------- Previous update was at 03:35 PM ----------
Hi Don,
The position count should start after the header line
There is no asterisk in your sample input. What do you mean by "the end of the sequence will have *."?
I know the position starts with 1 being the 1st character of the sequence. What I asked was what is the position of the 1st character of the second line of the sequence? Is it 61 or 62? (Do the newlines in the sequence count?)
Hi Don,
I am sorry about the asterisk. There is no asterisk in any of the sequence. Also, the length of the sequence varies not necessarily two lines. The newline is not included in the position count. It should be counted as 61.
I have a pbd file, which has the following format:
TITLE Protein X
MODEL 1
ATOM 1 N PRO 24 45.220 71.410 43.810 1.00 0.00
ATOM 2 H1 PRO 24 45.800 71.310 42.000 1.00 0.00
TER
ENDMDL
Column 22 is the chain... (5 Replies)
Hi,
I need to replace, as the title says, the first letter of each line (when it's not a number) by the same letter, but capital.
For instance :
hello
Who
123pass
Would become :
Hello
Who
123pass
Is there a way with sed to do that ? Or other unix command ?
Thank you :) (7 Replies)
Good afternoon all,
I want to ask how to change some letter in my file with other letter in spesific line
eg.
data.txt
1
1
1
0
0
0
0
for example i want to change the 4th line with character 1.
How could I do it by SED or AWK.
I have tried to run this code but actually did not... (3 Replies)
Hi,
im able to search for string in a file (ex: grep -w "$a" input.txt). but i have to search for the uppercase of a string in a file where upper case of the file content matches something like below.
where upper("$a")== converted to upper case string in (input.txt)
can someone please provide... (5 Replies)
Hi,
Long list of Input file1 content:
1285_t
4860_i
4817_v
8288_c
9626_a
.
.
.
Long list of Input file2 content:
1285_t chris germany
8288_c steve england
9626_a dave swiss
9260_s stephanie denmark
.
.
. (14 Replies)