04-30-2010
Text Manipulation
Greetings. Iīm a biologist and I donīt have mucho knowledge on Unix/Linux, but I need to use Cygwin to change some documents from a GenBank format to a FASTA format. GenBank format goes somthing like this:
LOCUS NM_013964 2568 bp mRNA linear PRI 26-APR-2009
DEFINITION Homo sapiens neuregulin 1 (NRG1), transcript variant HRG-alpha,
mRNA.
ACCESSION NM_013964
VERSION NM_013964.2 GI:116006963
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
TITLE Genetic variation in the schizophrenia-risk gene neuregulin 1
correlates with brain activation and impaired speech production in
a verbal fluency task in healthy individuals
JOURNAL Hum Brain Mapp (2009) In press
HRGA; NDF;
SMDF"
polyA_site 2568
/gene="NRG1"
/gene_synonym="ARIA; GGF; GGF2; HGL; HRG; HRG1; HRGA; NDF;
SMDF"
ORIGIN
1 gcgcctgcct ccaacctgcg ggcgggaggt gggtggctgc ggggcaattg aaaaagagcc
61 ggcgaggagt tccccgaaac ttgttggaac tccgggctcg cgcggaggcc aggagctgag
121 cggcggcggc tgccggacga tgggagcgtg agcaggacgg tgataacctc tccccgatcg
181 ggttgcgagg gcgccgggca gaggccagga cgcgagccgc cagcggtggg acccatcgac
But all I would need to get a FASTA format is this:
>NM_013964
1 gcgcctgcct ccaacctgcg ggcgggaggt gggtggctgc ggggcaattg aaaaagagcc
61 ggcgaggagt tccccgaaac ttgttggaac tccgggctcg cgcggaggcc aggagctgag
121 cggcggcggc tgccggacga tgggagcgtg agcaggacgg tgataacctc tccccgatcg
181 ggttgcgagg gcgccgggca gaggccagga cgcgagccgc cagcggtggg acccatcgac
This is just the ">" symbol, followed by the LOCUS ID and, in a separate line the sequence of the gen wich goes from the ORIGEN till the end of the document. How can I achieve this? Can you please help me???
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi
I have only ever used awk and sed for basic requirements up until now.
I have had to break a log down for multiple purposes.
Using awk, sed and a date script. I am left with this:
(message id, time of msg attempt, message id, domain name, time of msg completion)
... (4 Replies)
Discussion started by: Icepick
4 Replies
2. UNIX for Dummies Questions & Answers
I am tryin to figure out how to extract interested text from file
example.txt
blah blah
blah a: child1
blah a: child2
blah b: parent1
blah blah
blah ....
blah a: child21
blah a: child22
blah a: child23
blah b: parent2
this kinda text repeats .. number of children is... (6 Replies)
Discussion started by: rajkishore
6 Replies
3. UNIX for Dummies Questions & Answers
Hi there,
I have some text files in unix format that processed by a program in windows, and when I open them with less or vi in linux, a warn for opening binary file is prompted, and as shown in vi, between every two characters there was inserted a "^@". How can I fix this. Plus, there are over... (2 Replies)
Discussion started by: dustinwang2003
2 Replies
4. Shell Programming and Scripting
I need to know how can I remove all word after comma on each line.
Like:
jjkj,iiuiui,ijlkjkij,ookoo
kijljlj,jhhkj,ijijkijkj,oijkijj
kjkljlkj,kjkjlkjlkj,opok,okop
to
jjkj,
kijljlj,
... (5 Replies)
Discussion started by: slutb3
5 Replies
5. UNIX for Dummies Questions & Answers
Hello again unix.com
How can I extract from a large file in format:
steve@aol.com steve hawkins Location of this member is bla bla bla
sun@hotmail.com Sun Ying This member is using browser bla bla bla
to another text in format:
steve@aol.com steve hawkins
sun@hotmail.com sun ying
... (5 Replies)
Discussion started by: galford
5 Replies
6. UNIX for Dummies Questions & Answers
Hello Unix.com,
I have a text in format:
john
sara
lee
How can I make it:
john:john
john:john1
john:john12
john:john123
sara:sara
sara:sara12
sara:sara123 and so on (2 Replies)
Discussion started by: galford
2 Replies
7. UNIX for Dummies Questions & Answers
Hello unix.com users,
I have a ip file (line-by-line). How can I delete the ips that keep repeating by mark XXX.XXX.XXX.* ... I want to erase only the lines that keep repeating more than 2 times.
Example:
1.2.3.1
1.2.3.2
1.2.3.3
I want to erase all ips blocks that are repeating by C... (1 Reply)
Discussion started by: galford
1 Replies
8. UNIX for Dummies Questions & Answers
i want to generate a list line-by-line of normal characters
using letters . for example :
dnds
gnos
mgod
pets
jnfp
etc...
i want to use all letters with all the posibilities
is there a script that can do this ? (3 Replies)
Discussion started by: suppliernr1
3 Replies
9. Shell Programming and Scripting
Hello again,
I have a problem manipulating a large text document and there is no way I could edit this document by hand.
Format is:
Address : XXXX N 37 Ave, Hollywood, FL, 33021
Phone: XXX3190XXX
Player: XXXXXX
Character: Jaramillo
DOB: June-14-1995
-----
Name: Alexandra
Ticket... (3 Replies)
Discussion started by: galford
3 Replies
10. Shell Programming and Scripting
Hello Forum ,
I need a help about text manupulation. I have a text file and I have to manipulate this file. Let's say source.txt
source.txt
UNB+UNOC:3+O0013000005MAN MN RVS:91+0098006688:92+190304:2313+F004169241'
UNH+8146848+DELJIT:D:96A:UN'
BGM+307:::JIS_SYNCRO_FIRM+2019030423234101+9'... (8 Replies)
Discussion started by: cemokam65
8 Replies