but i want new result file in the same folder where my other files are present file1 and file2 in the terminal it is showing nothing and also not creating any new file.
kindly help me out this is a major step of my research and am stuck here from 3 weeks
Post#9 is the result of the script in #7, lines chopped at 178 chars to keep the post reasonably small. The real result has the desired -100 to +100 chars.
To get the output format requested in post #6 (no > at the start of the output lines, line breaks at 70 characters in the last field, and double quotes around the last field, you could also try:
which, with the data provided in post #1 in this thread produces the output:
I also have a version that will work with versions of awk that can't handle "long" strings, but it splits the last field on boundaries based on the uncombined input lines instead of the hard coded 70 character maximum segments used by the code above.
Note also that the above code will work even if the requested region starts before the 101st character. In that case it will truncate the leading context to start at character position 1. All of the code provided so far obviously truncates the trailing context if less than 100 characters of trailing context are present in the input.
I don't understand how this output is useful when there are multiple outputs for a single sequence ID and nothing in the output identifies what range from the input sequence is included in the output; but this is what you said you wanted...
cragun sir, how to run this command from terminal ? i mean i need to save it with .awk extension or to paste these lines directly into the terminal? i did do the later but it shows infinite loop.
---------- Post updated at 12:31 AM ---------- Previous update was at 12:29 AM ----------
your output format and output is excellent output as like rudic sir in post 7 and 9 but i also want to run this too sucessfully.
i want to extract specific region of interest from big file. i have only start position, end position and seq id, see my query is:
I have file1 is this
>GL3482.1
GAACTTGAGATCCGGGGA
GCAGTGGATCTCCACCAG
CGGCCAGAACTGGTGCAC
CTCCAGGCCAGCCTCGTC
CTGCGTGTC
>GL3550.1... (14 Replies)
HI,
I have a Complete genome fasta file and I have list of sub sequence regions
in the format as :
4353..5633
6795..9354
1034..14456
I want a script which can mask these region in a single complete genome fasta file with the alphabet N
kindly help (2 Replies)
Old skool UNIX and Linux geek here, but newbie to the world of DNS and bind. I've recently been tasked with replacing our DNS infrastructure, currently on Windows, with a RHEL based solution. And I assume that means using bind, which I've not used before. Here's my question:
Suppose our company... (3 Replies)
Hi I have 2 files; usually the end position in the file1 is the start position in the file2 and the end position in file2 will be the start position in file1 (flanks)
file1
Id start end
aaa1 0 3000070
aaa1 3095270 3095341
aaa1 3100822 3100894
aaa1 ... (1 Reply)
FILE_ID extraction from file name and save it in CSV file after looping through each folders
My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that?
I have folders in unix environment, directory structure is... (15 Replies)
Hi all,
I have a file like this
ID 3BP5L_HUMAN Reviewed; 393 AA.
AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT 05-JUL-2004, sequence version 1.
DT 05-SEP-2012, entry version 71.
FT COILED 59 140 ... (1 Reply)
Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2.
Based on a post elsewhere, I found the code:
awk... (2 Replies)
Hi everyone,
I have a large text file containing DNA sequences in fasta format as follows:
>someseq
GAACTTGAGATCCGGGGAGCAGTGGATCTC
CACCAGCGGCCAGAACTGGTGCACCTCCAG
GCCAGCCTCGTCCTGCGTGTC
>another seq
GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT
GACATTTTCATTACTACCATTTTGGAGTACA
>seq3450... (4 Replies)
Hi all,
I have difficulty to solve the followign problem.
mydata:
StartPoint EndPoint
22 55
2222 2230
33 66
44 58
222 240
11 25
22 60
33 45
The union of above... (2 Replies)
Hi,
I'm working hard on SQL and I came across a hurdle I'm hoping you can help me out with.
I have two tables
table1
headers: chrom start end name score strand
11 9720685 9720721 U0 0 +
21 9721043 9721079 U0 0 -
1 9721093 9721129 U0 0 +
20 ... (2 Replies)