Sponsored Content
Full Discussion: Sequence extraction
Top Forums Shell Programming and Scripting Sequence extraction Post 302951371 by Don Cragun on Wednesday 5th of August 2015 04:40:45 PM
Old 08-05-2015
Quote:
Originally Posted by Scrutinizer
Hi Don, I don't think this is the case on "most systems", but rather on some systems.

For awk, LINE_MAX is a minimum requirement specified by POSIX, but I found no systems with a limit equal to LINE_MAX. A few systems have a low limit, but higher than LINE_MAX and most awk implementations on various platforms have a much higher limit or perhaps no limit.

A small test on Solaris:
Code:
$ getconf LINE_MAX
2048
$ LANG=C tr -dc '[a-z]' < /dev/urandom | dd count=1000 2>/dev/null | nawk '{foo=substr($0,1,409600); print foo}' | wc -c
  409601
$

I found these case to have a high limit if any:
Code:
Linux      : gawk, mawk
AIX 7      : awk
Solaris 10 : nawk
OSX 10.10  : BSD awk, gawk, mawk

The lower limits I found were:
Code:
Solaris 10 : /usr/xpg4/bin/awk: 19999 Bytes
HPUX 11.11 : awk :               3000 Bytes
IRIX 6.5   : awk :               3000 Bytes

--
Interestingly on Solaris nawk has a high limit, whereas early POSIX compliant /usr/xpg4/bin/awk has a low limit.
Hi Scrutinizer,
Thanks for the information. I knew that the Solaris /usr/xpg4/bin/awk had a limit larger than LINE_MAX, but still "relatively" small. I didn't remember that nawk was unlimited.

The OS X 10.9 BSD-based awk also had a 3000 byte limit. I hadn't checked the limit lately not realizing that it had changed. Sometime between OS X version 10.9 and OS X Yosemite, version 10.10.4 that limit was raised considerably or removed. And, looking at the OS X awk man page, the usual BSD banner has disappeared. The command:
Code:
awk --version

now returns:
Code:
awk version 20070501

while the sed utility (whose man page still has the BSD General Commands Manual banner) command:
Code:
sed --version

still returns:
Code:
sed: illegal option -- -
usage: sed script [-Ealn] [-i extension] [file ...]
       sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...]

so I'm guessing that awk isn't from BSD anymore.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with tar extraction!

I have this tar file which has files of (.ksh, .ini &.sql) and their hard and soft links. Later when the original files and their directories are deleted (or rather lost as in a system crash), I have this tar file as the only source to restore all of them. In such a case when I do, tar... (4 Replies)
Discussion started by: manthasirisha
4 Replies

2. Shell Programming and Scripting

AWK extraction

Hi all, I have a data file from which i would like to extract only certain fields, which are not adjacent to each other. Following is the format of data file (data.txt) that i have, which has about 6 fields delimited by "|" HARRIS|23|IT|PROGRAMMER|CHICAGO|EMP JOHN|35|IT|JAVA|NY|CON... (2 Replies)
Discussion started by: harris2107
2 Replies

3. Shell Programming and Scripting

extraction of last but one char

I need to extract the character before the last "|" in the following lines, which are 'N' and 'U'. The last "|" shouldn't be extracted. Also the no.s of "|" may vary in a line, but I need only the character before the last one. ... (5 Replies)
Discussion started by: hidnana
5 Replies

4. Shell Programming and Scripting

Regex extraction

Hello, I need your help to extract text from following: ./sherg_fyd_rur:blkabl="R23.21_BL2008_0122_1" ./serge_a75:rlwual="/main/r23.21=26-Mar-2008.05:00:20UTC@R11.31_BL2008_0325" ./serge_a75:blkabl="R23.21_BL2008_0325" ./sherg_proto_npiv:bkguals="R23.21_BL2008_0302 I80_11.31_LR" I... (11 Replies)
Discussion started by: abdurrouf
11 Replies

5. Programming

extraction from a path

Hi, Can you help me on this two problems? how can i get : from input: /ect/exp/hom/bin ==> output: exp and from input: aex1234 =====>output: ex thanks, (1 Reply)
Discussion started by: yeclota
1 Replies

6. Shell Programming and Scripting

extraction

I have following input @xxxxxx@ I want to extract what's between @....@ that is : xxxx using SED command (6 Replies)
Discussion started by: xerox
6 Replies

7. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450... (4 Replies)
Discussion started by: Fahmida
4 Replies

8. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

9. Shell Programming and Scripting

String Extraction

I am trying to extract a time from the below string in perl but not able to get the time properly I just want to extract the time from the above line I am using the below syntax x=~ /(.*) (\d+)\:(\d+)\:(\d+),(.*)\.com/ $time = $2 . ':' . $3 . ':' . $4; print $time Can... (1 Reply)
Discussion started by: karan8810
1 Replies

10. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies
All times are GMT -4. The time now is 12:03 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy