Corona,
I am still trying to understand your script but I am not able to get it to do exactly what I need. CTSGNB srcipt is working but I am trying to understand the logic behind your script since I believe it might help in the future. So, this is the code:
And this is the outfile:
As you can see I still have the ">" at the end of the file which completely messes up the FASTA format. I have been trying to get rid of it by modifying your script but I just cannot get the job done. Can you help me one more time?
Thanks!
If awk's default handling of ORS doesn't do what you want, you'll have to print the >'s yourself:
[edit] adding a reply that explains in more detail.
---------- Post updated at 12:15 PM ---------- Previous update was at 12:05 PM ----------
You know how the FS and OFS variables control what awk considers fields for input, and what awk prints as fields for output?
RS and ORS are the exact same thing, but for lines. So when we do RS=">"; FS="\n" we're telling awk "each time you see >, that is a new line", and "each time you see \n, that's a new field".
When you have a statement like
, the { code } part is only executed when EXPRESSION is true. If you drop an unadorned /regex/ into there, it assumes you want $0 ~ /regex/. BEGIN and END are just special expressions that are true before any processing, and after all records have been processed.
My first try puts extra >'s on the end because the record separator gets printed at the end of the record, not the beginning -- the same place you'd expect a newline. So it ends up kind of off by one.
My improved version here just prepends a > to the input string and prints it, so it gets them in the correct place.
So:
Last edited by Corona688; 10-07-2011 at 03:26 PM..
However, adding "!" to the script changes the output slightly:
You might have blank lines at the start of the file. It sees
as the first record and, since it contains no -, happily prints it.
Either that, or your version of awk is quite happy to believe that > at the beginning of the file implies a completely blank record before it. Mine doesn't, but an easy fix anyway -- just tell it not to print the first record.
Last edited by Corona688; 10-07-2011 at 03:49 PM..
Reason: many edits, hopefully not stealth ones.
Hi
I have a file which is tab-delimited. Now, I'd like to print the lines which have "chr6" string in both first and second columns. Could anybody help? (3 Replies)
Hi All,
Assuming i have got a file test.dat which has contains as follows:
Unix = abc def fgt jug
111 2222 3333
Linux = gggg pppp qqq
C# = ccc ffff llll
I would like to traverse through the file, get the 1st occurance of "=" and then need to get the sting... (22 Replies)
Hi all, I need help.
I have an input text file (input.txt) like this:
21 GTGCAACACCGTCTTGAGAGG 50
21 GACCGAGACAGAATGAAAATC 73
21 CGGGTCTGTAGTAGCAAACGC 108
21 CGAAAAATGAACCCCTTTATC 220
21 CGTGATCCTGTTGAAGGGTCG 259
Now I need to count A/T/G/C numbers at each character location in column... (2 Replies)
I am trying to find a specific set of characters in a long file. I only want to find the characters in column 265 for 4 bytes.
Is there a search for that? I tried cut but couldn't get it to work.
Ex. I want to find '9999' in column 265 for 4 bytes. If it is in there, I want it to print... (12 Replies)
Hi all.
I have a .txt file that I need to sort it
My file is like:
1- 88 chain0 MASTER (FF-TE) FFFF 1962510 /TCK T FD2TQHVTT1 /jtagc/jtag_instreg/updateinstr_reg_1 dff1 (TI,SO)
2- ... (10 Replies)
sed -e "s// /g" old.txt > new.txt
While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
I have two LARGE files of data more than 20,000 line each, file-1 and file-2, and I wish to do the following if possible:
file-1
1 2 5 7 9
2 4 6 3 8 9
4 6 8 9 3 2 1 3
1 2
.
.
.
file-2
1 2 3
2 5 7
5 7 3
7 9 4
. (5 Replies)
Dear friends,
hello to everyone. I am new to this forum.
I have a set of data where I need to find the repitition of series as below
data format:
0001230000456000001230000456
each digit can be separated by any delimeter
I need to find out the starting point (index) of '123' and '456'
I... (2 Replies)
Hi I made a post earlier but now my problem has become a lot more complicated.
So I have a file that looks like this:
Name 1 13 94 1 AGGTT
Name 1 31 44 1 TTCCG
Name 1 13 94 2 AAAAATTTT
Name 1 41 47 2 GGGGGGGGGGG So the file is tab delimited and what I want to do is find... (8 Replies)
suppose
fileA
kanika123ABC 1222222222222222
raciat5678ty 1221123333331121
jessica78ulllo 2233243223333333
so output shud be print only first 10 characters in series and rest remain same
kanika123A 1222222222222222
raciat5678 1221123333331121
jessica78u ... (1 Reply)