Sponsored Content
Full Discussion: Sequence extraction
Top Forums Shell Programming and Scripting Sequence extraction Post 302951310 by Don Cragun on Wednesday 5th of August 2015 03:48:28 AM
Old 08-05-2015
With your two sample input files (with the combined lengths of the lines in each group that do not start with a > being less than 100 characters), I don't see how you would expect any output when the substring you are trying to extract from those strings starts more than 40,000 characters into that string, and in two of the three cases has an ending position in the string that comes before the starting position (thereby requesting a substring that has negative length).

In addition to those problems, as Scrutinizer said, your script specifies that the input field separator for file2 is a tab character, but there are no tab characters in the data you showed us. Therefore, you are requesting a substring of 1 character starting at position 0 (when arrays of characters in awk start at position 1).

Note also that although you might be able to create an array element in awk or gawk on Ubuntu that is more than 323,000 characters long; on most UNIX systems and BSD-based systems, awk won't let you read a line, write a single output string, or create a variable whose value is much more that LINE_MAX bytes long (on most systems LINE_MAX is 2,048).
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with tar extraction!

I have this tar file which has files of (.ksh, .ini &.sql) and their hard and soft links. Later when the original files and their directories are deleted (or rather lost as in a system crash), I have this tar file as the only source to restore all of them. In such a case when I do, tar... (4 Replies)
Discussion started by: manthasirisha
4 Replies

2. Shell Programming and Scripting

AWK extraction

Hi all, I have a data file from which i would like to extract only certain fields, which are not adjacent to each other. Following is the format of data file (data.txt) that i have, which has about 6 fields delimited by "|" HARRIS|23|IT|PROGRAMMER|CHICAGO|EMP JOHN|35|IT|JAVA|NY|CON... (2 Replies)
Discussion started by: harris2107
2 Replies

3. Shell Programming and Scripting

extraction of last but one char

I need to extract the character before the last "|" in the following lines, which are 'N' and 'U'. The last "|" shouldn't be extracted. Also the no.s of "|" may vary in a line, but I need only the character before the last one. ... (5 Replies)
Discussion started by: hidnana
5 Replies

4. Shell Programming and Scripting

Regex extraction

Hello, I need your help to extract text from following: ./sherg_fyd_rur:blkabl="R23.21_BL2008_0122_1" ./serge_a75:rlwual="/main/r23.21=26-Mar-2008.05:00:20UTC@R11.31_BL2008_0325" ./serge_a75:blkabl="R23.21_BL2008_0325" ./sherg_proto_npiv:bkguals="R23.21_BL2008_0302 I80_11.31_LR" I... (11 Replies)
Discussion started by: abdurrouf
11 Replies

5. Programming

extraction from a path

Hi, Can you help me on this two problems? how can i get : from input: /ect/exp/hom/bin ==> output: exp and from input: aex1234 =====>output: ex thanks, (1 Reply)
Discussion started by: yeclota
1 Replies

6. Shell Programming and Scripting

extraction

I have following input @xxxxxx@ I want to extract what's between @....@ that is : xxxx using SED command (6 Replies)
Discussion started by: xerox
6 Replies

7. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450... (4 Replies)
Discussion started by: Fahmida
4 Replies

8. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

9. Shell Programming and Scripting

String Extraction

I am trying to extract a time from the below string in perl but not able to get the time properly I just want to extract the time from the above line I am using the below syntax x=~ /(.*) (\d+)\:(\d+)\:(\d+),(.*)\.com/ $time = $2 . ':' . $3 . ':' . $4; print $time Can... (1 Reply)
Discussion started by: karan8810
1 Replies

10. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies
StringLabels(3) 						   OCaml library						   StringLabels(3)

NAME
StringLabels - String operations. Module Module StringLabels Documentation Module StringLabels : sig end String operations. val length : string -> int Return the length (number of characters) of the given string. val get : string -> int -> char String.get s n returns character number n in string s . The first character is character number 0. The last character is character number String.length s - 1 . You can also write s.[n] instead of String.get s n . Raise Invalid_argument index out of bounds if n is outside the range 0 to (String.length s - 1) . val set : string -> int -> char -> unit String.set s n c modifies string s in place, replacing the character number n by c . You can also write s.[n] <- c instead of String.set s n c . Raise Invalid_argument index out of bounds if n is outside the range 0 to (String.length s - 1) . val create : int -> string String.create n returns a fresh string of length n . The string initially contains arbitrary characters. Raise Invalid_argument if n < 0 or n > Sys.max_string_length . val make : int -> char -> string String.make n c returns a fresh string of length n , filled with the character c . Raise Invalid_argument if n < 0 or n > Sys.max_string_length . val copy : string -> string Return a copy of the given string. val sub : string -> pos:int -> len:int -> string String.sub s start len returns a fresh string of length len , containing the characters number start to start + len - 1 of string s . Raise Invalid_argument if start and len do not designate a valid substring of s ; that is, if start < 0 , or len < 0 , or start + len > StringLabels.length s . val fill : string -> pos:int -> len:int -> char -> unit String.fill s start len c modifies string s in place, replacing the characters number start to start + len - 1 by c . Raise Invalid_argu- ment if start and len do not designate a valid substring of s . val blit : src:string -> src_pos:int -> dst:string -> dst_pos:int -> len:int -> unit String.blit src srcoff dst dstoff len copies len characters from string src , starting at character number srcoff , to string dst , start- ing at character number dstoff . It works correctly even if src and dst are the same string, and the source and destination chunks overlap. Raise Invalid_argument if srcoff and len do not designate a valid substring of src , or if dstoff and len do not designate a valid sub- string of dst . val concat : sep:string -> string list -> string String.concat sep sl concatenates the list of strings sl , inserting the separator string sep between each. val iter : f:(char -> unit) -> string -> unit String.iter f s applies function f in turn to all the characters of s . It is equivalent to f s.[0]; f s.[1]; ...; f s.[String.length s - 1]; () . val iteri : f:(int -> char -> unit) -> string -> unit Same as String.iter , but the function is applied to the index of the element as first argument (counting from 0), and the character itself as second argument. Since 4.00.0 val map : f:(char -> char) -> string -> string String.map f s applies function f in turn to all the characters of s and stores the results in a new string that is returned. Since 4.00.0 val trim : string -> string Return a copy of the argument, without leading and trailing whitespace. The characters regarded as whitespace are: ' ' , '12' , ' ' , ' ' , and ' ' . If there is no whitespace character in the argument, return the original string itself, not a copy. Since 4.00.0 val escaped : string -> string Return a copy of the argument, with special characters represented by escape sequences, following the lexical conventions of OCaml. If there is no special character in the argument, return the original string itself, not a copy. val index : string -> char -> int String.index s c returns the position of the leftmost occurrence of character c in string s . Raise Not_found if c does not occur in s . val rindex : string -> char -> int String.rindex s c returns the position of the rightmost occurrence of character c in string s . Raise Not_found if c does not occur in s . val index_from : string -> int -> char -> int Same as StringLabels.index , but start searching at the character position given as second argument. String.index s c is equivalent to String.index_from s 0 c . val rindex_from : string -> int -> char -> int Same as StringLabels.rindex , but start searching at the character position given as second argument. String.rindex s c is equivalent to String.rindex_from s (String.length s - 1) c . val contains : string -> char -> bool String.contains s c tests if character c appears in the string s . val contains_from : string -> int -> char -> bool String.contains_from s start c tests if character c appears in the substring of s starting from start to the end of s . Raise Invalid_argument if start is not a valid index of s . val rcontains_from : string -> int -> char -> bool String.rcontains_from s stop c tests if character c appears in the substring of s starting from the beginning of s to index stop . Raise Invalid_argument if stop is not a valid index of s . val uppercase : string -> string Return a copy of the argument, with all lowercase letters translated to uppercase, including accented letters of the ISO Latin-1 (8859-1) character set. val lowercase : string -> string Return a copy of the argument, with all uppercase letters translated to lowercase, including accented letters of the ISO Latin-1 (8859-1) character set. val capitalize : string -> string Return a copy of the argument, with the first character set to uppercase. val uncapitalize : string -> string Return a copy of the argument, with the first character set to lowercase. type t = string An alias for the type of strings. val compare : t -> t -> int The comparison function for strings, with the same specification as Pervasives.compare . Along with the type t , this function compare allows the module String to be passed as argument to the functors Set.Make and Map.Make . OCamldoc 2014-06-09 StringLabels(3)
All times are GMT -4. The time now is 04:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy