extract regions of file based on start and end position
Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2.
Based on a post elsewhere, I found the code:
But with the files I have, regions are extracted from only a subset of the specified sequences.
file1 (my real file is much longer, >47000 lines, and each sequence is much longer too):
So the specified region is only extracted for 3 out of 10 queries. I have checked and all headers that appear in file2 are also represented in file1. The sequences are long enough to contain all of the beginning and end points. Any ideas on what's going wrong?
hi
In the foll example the whole text in a single line....
i want to extract text from IPTel to RTCPBase.h.
want to use this acrooss the whole file
Updated: IPTel\platform\core\include\RTCPBase.h \main\MWS2051_Sablime_Int\1... (7 Replies)
Hi,
I am a newbie in unix programming so maybe this is a simple question.
I would like to know how can I make a script that outputs only the values that are not between any given start and end positions
Example
file1:
2 30
40 80
82 100
file2:
ID1 1
ID2 35
ID3 80
ID4 81
ID6... (9 Replies)
Hello People,
I have the following contents in an XML file
...........
...........
..........
...........
<Details = "Sample Details">
<Name>Bob</Name>
<Age>34</Age>
<Address>CA</Address>
<ContactNumber>1234</ContactNumber>
</Details>
...........
.............
.............. (4 Replies)
Hi Guys,
While I was writing one shell script , I just got struck at this point.
I need to extract words from a file at some specified position and do some comparison operation and need to replace the extracted word with another word.
Eg : I like Orange very much.
I need to replace... (19 Replies)
The file has record length 200. And i have 100 search strings which are ten digits of character from 1 to 10 characters all of them are unique, they need to searched in a file. Please help me to pull the records based on position (say from 1-10).
test data
1FAHP2DW0BG115206RASHEED ... (6 Replies)
Hello All,
Could you please help with this.
This is what I have:
506234.222 2
506234.222 2
506234.222 2
506234.222 2
508212.200 2
508212.200 2
333456.111 2
333456.111 2
333456.111 2
333456.111 2
But this is what I want:
506234.222 1
506234.222 2
506234.222 2
506234.222 3 (5 Replies)
Hi,
I have a log file (log.txt) that which contains lines of date/time.
I need to create a script to extract a CSV file (out.csv) that gets all the sequential times (with only 1 minute difference) together by stating the start time and end time of this period.
Sample log file (log.txt)
... (7 Replies)
Hi all,
I have a file like this I want to extract only those regions which are big and continous
chr1 3280000 3440000
chr1 3440000 3920000
chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output
chr1 3920000 4800000
chr1 ... (2 Replies)
Hi all,
I have a fasta file of a reference sequnce, I will like to retrieve sequences corresponding to a list of start and end position in another file
>my_ref_seq
GCCCTATAAGGGCAGAAGCTTGTCCTTCTTGTGCCAGTTATGACGTTTGTCCTAACTGCACATCTGGTAG... (4 Replies)
Below are my custom period start and end dates based on a calender, these dates are placed in a file, for each period i need to split into three weeks for each period row, example is given below.
Could you please help out to achieve solution through shell script..
File content:
... (2 Replies)