We don't need you to post you whole data file. We need to you post two small sample input files, and the exact output that should be produced when given those two sample input files. If the Start and End positions are sometimes the End and Start positions instead, you need to make that explicit up front; not assume that we will guess that the data you're showing us is corrupt and that we are supposed to guess what should be done with that corrupt data.
Do not show us sample data that does not match the sample output you provide. Doing that just confuses anyone who might want to help you!
Telling us that you want exactly 23 characters and showing us 138 doesn't make us understand that it is a small sequence; it makes us understand that you are trying to confuse us OR that you can't be bothered to explain what you are trying to do.
If Scrutinizer's script is producing the output you want, but not redirecting it to the file in which you want that output saved, add the redirection operator:
to the end of the awk command he suggested!
[..]Note also that although you might be able to create an array element in awk or gawk on Ubuntu that is more than 323,000 characters long; on most UNIX systems and BSD-based systems, awk won't let you read a line, write a single output string, or create a variable whose value is much more that LINE_MAX bytes long (on most systems LINE_MAX is 2,048).
Hi Don, I don't think this is the case on "most systems", but rather on some systems.
For awk, LINE_MAX is a minimum requirement specified by POSIX, but I found no systems with a limit equal to LINE_MAX. A few systems have a low limit, but higher than LINE_MAX and most awk implementations on various platforms have a much higher limit or perhaps no limit.
A small test on Solaris:
I found these case to have a high limit if any:
The lower limits I found were:
--
Interestingly on Solaris nawk has a high limit, whereas early POSIX compliant /usr/xpg4/bin/awk has a low limit.
Last edited by Scrutinizer; 08-05-2015 at 02:17 PM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
Hi Don, I don't think this is the case on "most systems", but rather on some systems.
For awk, LINE_MAX is a minimum requirement specified by POSIX, but I found no systems with a limit equal to LINE_MAX. A few systems have a low limit, but higher than LINE_MAX and most awk implementations on various platforms have a much higher limit or perhaps no limit.
A small test on Solaris:
I found these case to have a high limit if any:
The lower limits I found were:
--
Interestingly on Solaris nawk has a high limit, whereas early POSIX compliant /usr/xpg4/bin/awk has a low limit.
Hi Scrutinizer,
Thanks for the information. I knew that the Solaris /usr/xpg4/bin/awk had a limit larger than LINE_MAX, but still "relatively" small. I didn't remember that nawk was unlimited.
The OS X 10.9 BSD-based awk also had a 3000 byte limit. I hadn't checked the limit lately not realizing that it had changed. Sometime between OS X version 10.9 and OS X Yosemite, version 10.10.4 that limit was raised considerably or removed. And, looking at the OS X awk man page, the usual BSD banner has disappeared. The command:
now returns:
while the sed utility (whose man page still has the BSD General Commands Manual banner) command:
still returns:
so I'm guessing that awk isn't from BSD anymore.
Hi Don, I am not sure about awk, on OS X, I seem to remember it always had that 20070501 version label. And to me it seems like it still behaves like before:
If I look at the man page of OS X 10.6.2, it looks like my current 10.10.4 man page, and there is no BSD label in there. It also looks identical to the FreeBSD 11.0 awk man page and the NetBSD 6.5 awk man page and they also do not have BSD banners..
Last edited by Scrutinizer; 08-06-2015 at 01:17 AM..
Hi Scrutinizer,
The OS X 10.10.4 awk also still rejects -v options with the option-argument in the same argument as the option specifier. I.e., awk -v a="abc" sets the awk variable a to abc, but awk -va="abc" fails with the diagnostic:
The standards require conforming implementations of awk to accept both forms as valid ways to set a to abc.
I could swear that at some point in the past year, awk on OS X gave me a diagnostic and exited when it read a line from a file that was longer than 3000 bytes, when I tried to set a variable to a string longer than 3000 bytes, and when I tried to use print or printf to write more than 3000 bytes in a single call. But, I successfully read a line that contained more than 350Mb a few minutes ago. So, if it did have a lower limit before, it doesn't in OS X Yosemite, version 10.10.4.
Hello, here I am posting my query again with modified data input files.
see my query is :
i have two input files file1 and file2.
file1 is smalldata.fasta
>gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
I am trying to extract a time from the below string in perl but not able to get the time properly
I just want to extract the time from the above line I am using the below syntax
x=~ /(.*) (\d+)\:(\d+)\:(\d+),(.*)\.com/
$time = $2 . ':' . $3 . ':' . $4;
print $time
Can... (1 Reply)
Hi all,
I have a file like this
ID 3BP5L_HUMAN Reviewed; 393 AA.
AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT 05-JUL-2004, sequence version 1.
DT 05-SEP-2012, entry version 71.
FT COILED 59 140 ... (1 Reply)
Hi everyone,
I have a large text file containing DNA sequences in fasta format as follows:
>someseq
GAACTTGAGATCCGGGGAGCAGTGGATCTC
CACCAGCGGCCAGAACTGGTGCACCTCCAG
GCCAGCCTCGTCCTGCGTGTC
>another seq
GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT
GACATTTTCATTACTACCATTTTGGAGTACA
>seq3450... (4 Replies)
Hi,
Can you help me on this two problems?
how can i get :
from input: /ect/exp/hom/bin ==> output: exp
and
from input: aex1234 =====>output: ex
thanks, (1 Reply)
Hello,
I need your help to extract text from following:
./sherg_fyd_rur:blkabl="R23.21_BL2008_0122_1"
./serge_a75:rlwual="/main/r23.21=26-Mar-2008.05:00:20UTC@R11.31_BL2008_0325"
./serge_a75:blkabl="R23.21_BL2008_0325"
./sherg_proto_npiv:bkguals="R23.21_BL2008_0302 I80_11.31_LR"
I... (11 Replies)
I need to extract the character before the last "|" in the following lines, which are 'N' and 'U'. The last "|" shouldn't be extracted. Also the no.s of "|" may vary in a line, but I need only the character before the last one.
... (5 Replies)
Hi all,
I have a data file from which i would like to extract only certain fields, which are not adjacent to each other. Following is the format of data file (data.txt) that i have, which has about 6 fields delimited by "|"
HARRIS|23|IT|PROGRAMMER|CHICAGO|EMP
JOHN|35|IT|JAVA|NY|CON... (2 Replies)
I have this tar file which has files of (.ksh, .ini &.sql) and their hard and soft links.
Later when the original files and their directories are deleted (or rather lost as in a system crash), I have this tar file as the only source to restore all of them.
In such a case when I do,
tar... (4 Replies)