Delete some words


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete some words
# 1  
Old 08-21-2014
Delete some words

hi, i have a fasta file like this:
Code:
>contig00003  length=363  numreads=45  gene=isogroup00001  status=it_thresh
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010  length=760  numreads=49  gene=isogroup00001  status=it_thresh
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W length=257
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO length=105
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA

.....

How can i remove the additional information for each sequence and get a file like this:
Code:
>contig00003
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA

.....


Thanks
# 2  
Old 08-21-2014
Code:
awk '/^>/ { NF=1 } 1' inputfile > outputfile

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 08-21-2014
Thanks, that works perfectly.
Quote:
Originally Posted by Corona688
Code:
awk '/^>/ { NF=1 } 1' inputfile > outputfile

This User Gave Thanks to the_simpsons For This Post:
# 4  
Old 08-21-2014
Hello the_simpsons,

The following may also help.

Code:
awk '/^>/ {$0=$1} 1'  filename

Output will be as follows.


Code:
>contig00003
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA


EDIT: Adding one more solution for same.

Code:
[singh@localhost awk_programming]$ awk '/^>/ {print $1} !/^>/ {print $0}' filename

Code:
>contig00003
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA

Thanks,
R. Singh

Last edited by RavinderSingh13; 08-21-2014 at 01:31 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Delete between two words

So;C951;1;2;0;100;true;SNetwork=ORM_RO_MO_R,MeCext=C5951,ManagedElement=1,vsDaFunction=1;473;12;EEE So;C951;2;2;0;100;true;SNetwork=ORM_RO_MO_R,MeCext=L5921,ManagedElement=1,vsDaFunction=4;481;12;EEE Output:- So;C951;1;2;0;100;true;1;473;12;EEE So;C951;2;2;0;100;true;4;481;12;EEE Output... (7 Replies)
Discussion started by: pareshkp
7 Replies

2. UNIX for Dummies Questions & Answers

Delete all words not containing letter /s/

I have a word file that looks like: pens binder spiral user I want to delete all the words without the letter /s/, so output looks like: pens spiral user I tried using sed: sed '//d' infile.txt > out.txt (5 Replies)
Discussion started by: pxalpine
5 Replies

3. Shell Programming and Scripting

SED - delete words between two possible words

Hi all, I want to make an script using sed that removes everything between 'begin' (including the line that has it) and 'end1' or 'end2', not removing this line. Let me paste an 2 examples: anything before any string begin few lines of content end1 anything after anything before any... (4 Replies)
Discussion started by: meuser
4 Replies

4. Shell Programming and Scripting

Using Sed to Delete Words in a File

This is a Nagios situation. So i have a list of servers in one file called Servers.txt And in another file called hostgroups.cfg, i want to remove each and every one of the servers in the Servers.txt file. The problem is, the script I wrote is having a problem removing the exact servers in... (5 Replies)
Discussion started by: SkySmart
5 Replies

5. Shell Programming and Scripting

Delete between two words

Hi, I wanted to delete data between two words. Input: I read gihoihsahkjlk write goal hard read hsakdjhkh write work read hlkhlkhlkh write Desired Output: I write goal hard write work write We have to replace the data that comes between 'read' and 'write' with... (3 Replies)
Discussion started by: mahish20
3 Replies

6. Shell Programming and Scripting

Need to delete words in a file

Hi All, I have an input file a.txt which contains the following :: 08-08-09 1:00 PM 763763762 f00_unix1_server.txt i Just need to delete all the words which is before f Output :: f00_unix1_server.txt Thanks (4 Replies)
Discussion started by: raghav1982
4 Replies

7. UNIX for Dummies Questions & Answers

how can i delete words based on search

hi, I have a doubt. how we can remove few words based on search. here im specifying my requirement with example. ALL the words should delete between two words ... those words will ends ** EX : cat infile.txt "HI",ob1**,ob2,ob3,ob4**,ob5,ob6 OUTPUT... (2 Replies)
Discussion started by: spc432
2 Replies

8. UNIX for Advanced & Expert Users

How to delete first 10 words from file

Hi, Could you please let me know, how to delete first 10 words from text files using vi? 10dw will delete it from current line, how to do it for all the lines from file? Thanks (6 Replies)
Discussion started by: sentak
6 Replies

9. UNIX for Dummies Questions & Answers

sed [delete everything between two words]

Hi, I have the following codes below that aims to delete every words between two pattern word. Say I have the files To delete every word between WISH_LIST=" and " I used the below codes (but its not working): #!/bin/sh sed ' /WISH_LIST=\"/ { N /\n.*\"/ {... (3 Replies)
Discussion started by: Orbix
3 Replies

10. Shell Programming and Scripting

Delete lines that contain 3 or more words?

How can I delete lines that contain 3 or more words? I have a file, old.txt, that has multi-word phrases in it and I want to remove the lines with 3 words or more and send the output to new.txt. I've tried the following using sed but it doesn't seem to work: sed '/(\b\w+\b){3,}/d' old.txt >... (5 Replies)
Discussion started by: revax
5 Replies
Login or Register to Ask a Question