Sequence extraction Post: 302951371

Sponsored Content

Top Forums Shell Programming and Scripting Sequence extraction Post 302951371 by Don Cragun on Wednesday 5th of August 2015 04:40:45 PM

08-05-2015

Registered User

Quote:

Originally Posted by Scrutinizer

Hi Don, I don't think this is the case on "most systems", but rather on some systems.

For awk, LINE_MAX is a minimum requirement specified by POSIX, but I found no systems with a limit equal to LINE_MAX. A few systems have a low limit, but higher than LINE_MAX and most awk implementations on various platforms have a much higher limit or perhaps no limit.

A small test on Solaris:

Code:

$ getconf LINE_MAX
2048
$ LANG=C tr -dc '[a-z]' < /dev/urandom | dd count=1000 2>/dev/null | nawk '{foo=substr($0,1,409600); print foo}' | wc -c
  409601
$

I found these case to have a high limit if any:

Code:

Linux      : gawk, mawk
AIX 7      : awk
Solaris 10 : nawk
OSX 10.10  : BSD awk, gawk, mawk

The lower limits I found were:

Code:

Solaris 10 : /usr/xpg4/bin/awk: 19999 Bytes
HPUX 11.11 : awk :               3000 Bytes
IRIX 6.5   : awk :               3000 Bytes

--
Interestingly on Solaris nawk has a high limit, whereas early POSIX compliant /usr/xpg4/bin/awk has a low limit.

Hi Scrutinizer,
Thanks for the information. I knew that the Solaris /usr/xpg4/bin/awk had a limit larger than LINE_MAX, but still "relatively" small. I didn't remember that nawk was unlimited.

The OS X 10.9 BSD-based awk also had a 3000 byte limit. I hadn't checked the limit lately not realizing that it had changed. Sometime between OS X version 10.9 and OS X Yosemite, version 10.10.4 that limit was raised considerably or removed. And, looking at the OS X awk man page, the usual BSD banner has disappeared. The command:

Code:

awk --version

now returns:

Code:

awk version 20070501

while the sed utility (whose man page still has the BSD General Commands Manual banner) command:

Code:

sed --version

still returns:

Code:

sed: illegal option -- -
usage: sed script [-Ealn] [-i extension] [file ...]
       sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...]

so I'm guessing that awk isn't from BSD anymore.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with tar extraction!

I have this tar file which has files of (.ksh, .ini &.sql) and their hard and soft links. Later when the original files and their directories are deleted (or rather lost as in a system crash), I have this tar file as the only source to restore all of them. In such a case when I do, tar...

2. Shell Programming and Scripting

AWK extraction

Hi all, I have a data file from which i would like to extract only certain fields, which are not adjacent to each other. Following is the format of data file (data.txt) that i have, which has about 6 fields delimited by "|" HARRIS|23|IT|PROGRAMMER|CHICAGO|EMP JOHN|35|IT|JAVA|NY|CON...

3. Shell Programming and Scripting

extraction of last but one char

I need to extract the character before the last "|" in the following lines, which are 'N' and 'U'. The last "|" shouldn't be extracted. Also the no.s of "|" may vary in a line, but I need only the character before the last one. ...

4. Shell Programming and Scripting

Regex extraction

Hello, I need your help to extract text from following: ./sherg_fyd_rur:blkabl="R23.21_BL2008_0122_1" ./serge_a75:rlwual="/main/r23.21=26-Mar-2008.05:00:20UTC@R11.31_BL2008_0325" ./serge_a75:blkabl="R23.21_BL2008_0325" ./sherg_proto_npiv:bkguals="R23.21_BL2008_0302 I80_11.31_LR" I...

5. Programming

extraction from a path

Hi, Can you help me on this two problems? how can i get : from input: /ect/exp/hom/bin ==> output: exp and from input: aex1234 =====>output: ex thanks,

6. Shell Programming and Scripting

extraction

I have following input @xxxxxx@ I want to extract what's between @....@ that is : xxxx using SED command

7. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450...

8. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ...

9. Shell Programming and Scripting

String Extraction

I am trying to extract a time from the below string in perl but not able to get the time properly I just want to extract the time from the above line I am using the below syntax x=~ /(.*) (\d+)\:(\d+)\:(\d+),(.*)\.com/ $time = $2 . ':' . $3 . ':' . $4; print $time Can...

10. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with tar extraction!

Discussion started by: manthasirisha

2. Shell Programming and Scripting

AWK extraction

Discussion started by: harris2107

3. Shell Programming and Scripting

extraction of last but one char

Discussion started by: hidnana

4. Shell Programming and Scripting

Regex extraction

Discussion started by: abdurrouf

5. Programming

extraction from a path

Discussion started by: yeclota

6. Shell Programming and Scripting

extraction

Discussion started by: xerox

7. UNIX for Dummies Questions & Answers

fast sequence extraction

Discussion started by: Fahmida

8. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Discussion started by: manigrover

9. Shell Programming and Scripting

String Extraction

Discussion started by: karan8810

10. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Discussion started by: harpreetmanku04