Removing duplicate sequences and modifying a text file
Hi. I've tried several different programs to try and solve this problem, but none of them seem to have done exactly what I want (and I need the file in a very specific format). I have a large file of DNA sequences in a multifasta file like this, with around 15 000 genes:
I'd like to do two things to the folder. Firstly, for some of the genes (each XLOC is a gene), there are multiple entries (i.e XLOC_00024543 might have 3 entires). I'd like to modify the file so that there is just one XLOC entry for each gene, i.e. delete all bar one entry for each XLOC.
Second, I'd like to delete the whole TCONS word, so that each line simply has an XLOC number. I've tried:
But it only deletes the letters TCONS, rather than the TCONS_00012343 whatever bit after.
Essentially, I'd like the equivalent file to look like this
Hope I've explained myself clearly. I've spent a long tme trying to do this, but to no avail!
Hello friends,
Could anyone please advise on how to remove escape sequences from a text file?
$ file input.txt
input.txt: ASCII English text, with escape sequences
I'm able to see those escape characters when opened in vi editor like shown below:
^
but not when I run more... (6 Replies)
Hi everybody
I have a .txt file that contains some assembly code for optimizing it i need to remove some replicated parts.
for example I have:e_li r0,-1
e_li r25,-1
e_lis r25,0000
add r31, r31 ,r0
e_li r28,-1
e_lis r28,0000
add r31, r31 ,r0
e_li r28,-1 ... (3 Replies)
Hai,
How to remove the repeated 'Chr's in different sequences. In the given example, Chr19 is repeated in two samples
with the same number i.e. +52245923. How to remove one of the entry in any of the samples and to give the range for each
Chr which is -20 for minimum range value and +120 for... (1 Reply)
My file looks like this
But I need to remove the entry with the identifier >Reference1 along with the entire sequence. Thus, I will end up having the following file
Thanks in advance! (2 Replies)
If I have a file with the following information
And I would like to remove all the sequences with Freq less than 3, so I end up having the following file:
I am currently using awk to accomplish this task but I am not getting the results I actually want.
Any help will be greatly appreciated. (3 Replies)
Hi there, I'm new to the board and I did try a search, but couldn't quite find what I was looking for.
I deal in mostly large sets of sequential files, usually images. I was wondering if someone has modified the standard ls() command, or created another command that would display standardly... (9 Replies)
Hi,
I need to concatenate three files in to one destination file.In this if some duplicate data occurs it should be deleted.
eg:
file1:
-----
data1 value1
data2 value2
data3 value3
file2:
-----
data1 value1
data4 value4
data5 value5
file3:
-----
data1 value1
data4 value4 (3 Replies)
Hi,
I am trying to remove duplicate lines from a file. For example the contents of example.txt is:
this is a test
2342
this is a test
34343
this is a test
43434
and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with:
2342... (4 Replies)