How can I remove those duplicate sequence in UNIX?What command line I should type?
The input is:
>HWI-EAS382_30FC7AAXX:4:1:1580:1465
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>HWI-EAS382_30FC7AAXX:4:1:1062:1640
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>HWI-EAS382_30FC7AAXX:4:1:272:629
AAAAAAAAGCTATAGTCTCGTCACACATACTCACAA
>HWI-EAS382_30FC7AAXX:4:1:1033:1135
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>HWI-EAS382_30FC7AAXX:4:1:1421:27
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
My desired output is:
>HWI-EAS382_30FC7AAXX:4:1:1580:1465
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>HWI-EAS382_30FC7AAXX:4:1:272:629
AAAAAAAAGCTATAGTCTCGTCACACATACTCACAA
What command line I should type to remove those duplicated sequence?
Thanks for all of your advise.
Hi, fajohnson...
Your command line is worked. But still left all the header of the nucleotide sequence. Do you have better idea that I just remain the first header of those same nucleotide sequence?
My input:
>HWI-EAS382_30FC7AAXX:4:1:631:449
>HWI-EAS382_30FC7AAXX:4:1:93:1407
>HWI-EAS382_30FC7AAXX:4:1:154:1123
>HWI-EAS382_30FC7AAXX:4:1:912:1008
>HWI-EAS382_30FC7AAXX:4:1:57:316
>HWI-EAS382_30FC7AAXX:4:1:1287:1193
>HWI-EAS382_30FC7AAXX:4:1:1451:1559
>HWI-EAS382_30FC7AAXX:4:1:1431:1913
TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT
My output just want left this:
>HWI-EAS382_30FC7AAXX:4:1:631:449
TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT
Hi, fajohnson...
Your command line is worked. But still left all the header of the nucleotide sequence. Do you have better idea that I just remain the first header of those same nucleotide sequence?
My input:
>HWI-EAS382_30FC7AAXX:4:1:631:449
>HWI-EAS382_30FC7AAXX:4:1:93:1407
>HWI-EAS382_30FC7AAXX:4:1:154:1123
>HWI-EAS382_30FC7AAXX:4:1:912:1008
>HWI-EAS382_30FC7AAXX:4:1:57:316
>HWI-EAS382_30FC7AAXX:4:1:1287:1193
>HWI-EAS382_30FC7AAXX:4:1:1451:1559
>HWI-EAS382_30FC7AAXX:4:1:1431:1913
TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT
My output just want left this:
>HWI-EAS382_30FC7AAXX:4:1:631:449
TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT
Hello,
I have a file which have several duplicate entries on the same line:
File
ID source
1 GM GF GM
2 GM GF GM GF GM GF GM GF GM GF
3 GM GF GM SF GM GF GM SF
4 FF FF FF FF
5 FF GM FF ... (2 Replies)
So I have a bunch of files that look like this
>gi|33332323
MMKCRGVIMVVEKVMKRDGRIVPFDESRIRWAVQ---
>gi|45235353
MMKCR----VEKMRDVFFDESIRWAVQ
They go on...sequences are much longer but all in two line (fasta) format.
I want to remove duplicate pairs of ID(GI) number and sequence. I tried... (12 Replies)
Hi
Ive been scratching over this for some time with no solution.
I have a file like this
1 bla bla 1
2 bla bla 2
4 bla bla 3
5 bla bla 1
6 bla bla 1
I want to remove consecutive occurrences of lines like bla bla 1, but the first column may be different.
Any ideasss?? (23 Replies)
I have a file a.txt having content like
deepak
ram
sham
deepram
sita
kumar
I Want to delete the first line containing "deep" ...
I tried using...
grep -i 'deep' a.txt
It gives me 2 rows...I want to delete the first one..
+ need to know the command to delete the line from... (5 Replies)
Hi,
Please help!
I have a file having duplicate words in some line and I want to remove the duplicate words.
The order of the words in the output file doesn't matter.
INPUT_FILE
pink_kite red_pen ball pink_kite ball
yellow_flower white no white no
cloud nine_pen pink cloud pink nine_pen... (6 Replies)
For example, if I have the file whose content are:
>HWI-EAS382_30FC7AAXX:7:1:927:1368
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>HWI-EAS382_30FC7AAXX:7:1:924:1373
ACGAACTTTAAAGCACCTCTTGGCTCGTATGCCGTC
I want my output calculate the total of nucleotide. So my output should look like this:... (2 Replies)
Can anyone help me how can i print only the unique entry in a line?
MI_AP MI_AP MI_CM MI_MF
RC_NAP MBS_AP SF_RAN MBS_AP NT_CAR
so that it will on output the one unique entry per line.
MI_AP MI_CM MI_MF
RC_NAP MBS_AP SF_RAN NT_CAR
I can't find the same situation on the knowledge... (5 Replies)
Hi,
I have a scenario here where I have created a flatfile with the below mentioned information. File as you can see is dispalyed in three columns
1st column is FileNameString
2nd column is Report_Name (this has spaces)
3rd column is Flag
Result file needed is, removal of duplicate... (1 Reply)