08-05-2015
With your two sample input files (with the combined lengths of the lines in each group that do not start with a > being less than 100 characters), I don't see how you would expect any output when the substring you are trying to extract from those strings starts more than 40,000 characters into that string, and in two of the three cases has an ending position in the string that comes before the starting position (thereby requesting a substring that has negative length).
In addition to those problems, as Scrutinizer said, your script specifies that the input field separator for file2 is a tab character, but there are no tab characters in the data you showed us. Therefore, you are requesting a substring of 1 character starting at position 0 (when arrays of characters in awk start at position 1).
Note also that although you might be able to create an array element in awk or gawk on Ubuntu that is more than 323,000 characters long; on most UNIX systems and BSD-based systems, awk won't let you read a line, write a single output string, or create a variable whose value is much more that LINE_MAX bytes long (on most systems LINE_MAX is 2,048).
This User Gave Thanks to Don Cragun For This Post:
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have this tar file which has files of (.ksh, .ini &.sql) and their hard and soft links.
Later when the original files and their directories are deleted (or rather lost as in a system crash), I have this tar file as the only source to restore all of them.
In such a case when I do,
tar... (4 Replies)
Discussion started by: manthasirisha
4 Replies
2. Shell Programming and Scripting
Hi all,
I have a data file from which i would like to extract only certain fields, which are not adjacent to each other. Following is the format of data file (data.txt) that i have, which has about 6 fields delimited by "|"
HARRIS|23|IT|PROGRAMMER|CHICAGO|EMP
JOHN|35|IT|JAVA|NY|CON... (2 Replies)
Discussion started by: harris2107
2 Replies
3. Shell Programming and Scripting
I need to extract the character before the last "|" in the following lines, which are 'N' and 'U'. The last "|" shouldn't be extracted. Also the no.s of "|" may vary in a line, but I need only the character before the last one.
... (5 Replies)
Discussion started by: hidnana
5 Replies
4. Shell Programming and Scripting
Hello,
I need your help to extract text from following:
./sherg_fyd_rur:blkabl="R23.21_BL2008_0122_1"
./serge_a75:rlwual="/main/r23.21=26-Mar-2008.05:00:20UTC@R11.31_BL2008_0325"
./serge_a75:blkabl="R23.21_BL2008_0325"
./sherg_proto_npiv:bkguals="R23.21_BL2008_0302 I80_11.31_LR"
I... (11 Replies)
Discussion started by: abdurrouf
11 Replies
5. Programming
Hi,
Can you help me on this two problems?
how can i get :
from input: /ect/exp/hom/bin ==> output: exp
and
from input: aex1234 =====>output: ex
thanks, (1 Reply)
Discussion started by: yeclota
1 Replies
6. Shell Programming and Scripting
I have following input
@xxxxxx@
I want to extract what's between @....@ that is : xxxx
using SED command (6 Replies)
Discussion started by: xerox
6 Replies
7. UNIX for Dummies Questions & Answers
Hi everyone,
I have a large text file containing DNA sequences in fasta format as follows:
>someseq
GAACTTGAGATCCGGGGAGCAGTGGATCTC
CACCAGCGGCCAGAACTGGTGCACCTCCAG
GCCAGCCTCGTCCTGCGTGTC
>another seq
GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT
GACATTTTCATTACTACCATTTTGGAGTACA
>seq3450... (4 Replies)
Discussion started by: Fahmida
4 Replies
8. Shell Programming and Scripting
Hi all,
I have a file like this
ID 3BP5L_HUMAN Reviewed; 393 AA.
AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT 05-JUL-2004, sequence version 1.
DT 05-SEP-2012, entry version 71.
FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies
9. Shell Programming and Scripting
I am trying to extract a time from the below string in perl but not able to get the time properly
I just want to extract the time from the above line I am using the below syntax
x=~ /(.*) (\d+)\:(\d+)\:(\d+),(.*)\.com/
$time = $2 . ':' . $3 . ':' . $4;
print $time
Can... (1 Reply)
Discussion started by: karan8810
1 Replies
10. Shell Programming and Scripting
Hello, here I am posting my query again with modified data input files.
see my query is :
i have two input files file1 and file2.
file1 is smalldata.fasta
>gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies
LEARN ABOUT DEBIAN
bio::asn1::sequence::indexer
Bio::ASN1::Sequence::Indexer(3pm) User Contributed Perl Documentation Bio::ASN1::Sequence::Indexer(3pm)
NAME
Bio::ASN1::Sequence::Indexer - Indexes NCBI Sequence files.
SYNOPSIS
use Bio::ASN1::Sequence::Indexer;
# creating & using the index is just a few lines
my $inx = Bio::ASN1::Sequence::Indexer->new(
-filename => 'seq.idx',
-write_flag => 'WRITE'); # needed for make_index call, but if opening
# existing index file, don't set write flag!
$inx->make_index('seq1.asn', 'seq2.asn');
my $seq = $inx->fetch('AF093062'); # Bio::Seq obj for Sequence (doesn't work yet)
# alternatively, if one prefers just a data structure instead of objects
$seq = $inx->fetch_hash('AF093062'); # a hash produced by Bio::ASN1::Sequence
# that contains all data in the Sequence record
PREREQUISITE
Bio::ASN1::Sequence, Bioperl and all dependencies therein.
INSTALLATION
Same as Bio::ASN1::EntrezGene
DESCRIPTION
Bio::ASN1::Sequence::Indexer is a Perl Indexer for NCBI Sequence genome databases. It processes an ASN.1-formatted Sequence record and
stores the file position for each record in a way compliant with Bioperl standard (in fact its a subclass of Bioperl's index objects).
Note that this module does not parse record, because it needs to run fast and grab only the gene ids. For parsing record, use
Bio::ASN1::Sequence.
As with Bio::ASN1::Sequence, this module is best thought of as beta version - it works, but is not fully tested.
SEE ALSO
Please check out perldoc for Bio::ASN1::EntrezGene for more info.
AUTHOR
Dr. Mingyi Liu <mingyi.liu@gpc-biotech.com>
COPYRIGHT
The Bio::ASN1::EntrezGene module and its related modules and scripts are copyright (c) 2005 Mingyi Liu, GPC Biotech AG and Altana Research
Institute. All rights reserved. I created these modules when working on a collaboration project between these two companies. Therefore a
special thanks for the two companies to allow the release of the code into public domain.
You may use and distribute them under the terms of the Perl itself or GPL (<http://www.gnu.org/copyleft/gpl.html>).
CITATION
Liu, M and Grigoriev, A(2005) "Fast Parsers for Entrez Gene" Bioinformatics. In press
OPERATION SYSTEMS SUPPORTED
Any OS that Perl & Bioperl run on.
METHODS
fetch
Parameters: $geneid - id for the Sequence record to be retrieved
Example: my $hash = $indexer->fetch(10); # get Sequence #10
Function: fetch the data for the given Sequence id.
Returns: A Bio::Seq object produced by Bio::SeqIO::sequence
Notes: Bio::SeqIO::sequence does not exist and probably won't
exist for a while! So call fetch_hash instead
fetch_hash
Parameters: $seqid - id for the Sequence record to be retrieved
Example: my $hash = $indexer->fetch_hash('AF093062');
Function: fetch a hash produced by Bio::ASN1::Sequence for given id
Returns: A data structure containing all data items from the Sequence
record.
Notes: Alternative to fetch()
_file_handle
Title : _file_handle
Usage : $fh = $index->_file_handle( INT )
Function: Returns an open filehandle for the file
index INT. On opening a new filehandle it
caches it in the @{$index->_filehandle} array.
If the requested filehandle is already open,
it simply returns it from the array.
Example : $fist_file_indexed = $index->_file_handle( 0 );
Returns : ref to a filehandle
Args : INT
Notes : This function is copied from Bio::Index::Abstract. Once that module
changes file handle code like I do below to fit perl 5.005_03, this
sub would be removed from this module
perl v5.14.2 2005-05-04 Bio::ASN1::Sequence::Indexer(3pm)