Sponsored Content
Full Discussion: Sequence extraction
Top Forums Shell Programming and Scripting Sequence extraction Post 302951310 by Don Cragun on Wednesday 5th of August 2015 03:48:28 AM
Old 08-05-2015
With your two sample input files (with the combined lengths of the lines in each group that do not start with a > being less than 100 characters), I don't see how you would expect any output when the substring you are trying to extract from those strings starts more than 40,000 characters into that string, and in two of the three cases has an ending position in the string that comes before the starting position (thereby requesting a substring that has negative length).

In addition to those problems, as Scrutinizer said, your script specifies that the input field separator for file2 is a tab character, but there are no tab characters in the data you showed us. Therefore, you are requesting a substring of 1 character starting at position 0 (when arrays of characters in awk start at position 1).

Note also that although you might be able to create an array element in awk or gawk on Ubuntu that is more than 323,000 characters long; on most UNIX systems and BSD-based systems, awk won't let you read a line, write a single output string, or create a variable whose value is much more that LINE_MAX bytes long (on most systems LINE_MAX is 2,048).
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with tar extraction!

I have this tar file which has files of (.ksh, .ini &.sql) and their hard and soft links. Later when the original files and their directories are deleted (or rather lost as in a system crash), I have this tar file as the only source to restore all of them. In such a case when I do, tar... (4 Replies)
Discussion started by: manthasirisha
4 Replies

2. Shell Programming and Scripting

AWK extraction

Hi all, I have a data file from which i would like to extract only certain fields, which are not adjacent to each other. Following is the format of data file (data.txt) that i have, which has about 6 fields delimited by "|" HARRIS|23|IT|PROGRAMMER|CHICAGO|EMP JOHN|35|IT|JAVA|NY|CON... (2 Replies)
Discussion started by: harris2107
2 Replies

3. Shell Programming and Scripting

extraction of last but one char

I need to extract the character before the last "|" in the following lines, which are 'N' and 'U'. The last "|" shouldn't be extracted. Also the no.s of "|" may vary in a line, but I need only the character before the last one. ... (5 Replies)
Discussion started by: hidnana
5 Replies

4. Shell Programming and Scripting

Regex extraction

Hello, I need your help to extract text from following: ./sherg_fyd_rur:blkabl="R23.21_BL2008_0122_1" ./serge_a75:rlwual="/main/r23.21=26-Mar-2008.05:00:20UTC@R11.31_BL2008_0325" ./serge_a75:blkabl="R23.21_BL2008_0325" ./sherg_proto_npiv:bkguals="R23.21_BL2008_0302 I80_11.31_LR" I... (11 Replies)
Discussion started by: abdurrouf
11 Replies

5. Programming

extraction from a path

Hi, Can you help me on this two problems? how can i get : from input: /ect/exp/hom/bin ==> output: exp and from input: aex1234 =====>output: ex thanks, (1 Reply)
Discussion started by: yeclota
1 Replies

6. Shell Programming and Scripting

extraction

I have following input @xxxxxx@ I want to extract what's between @....@ that is : xxxx using SED command (6 Replies)
Discussion started by: xerox
6 Replies

7. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450... (4 Replies)
Discussion started by: Fahmida
4 Replies

8. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

9. Shell Programming and Scripting

String Extraction

I am trying to extract a time from the below string in perl but not able to get the time properly I just want to extract the time from the above line I am using the below syntax x=~ /(.*) (\d+)\:(\d+)\:(\d+),(.*)\.com/ $time = $2 . ':' . $3 . ':' . $4; print $time Can... (1 Reply)
Discussion started by: karan8810
1 Replies

10. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies
paste(1)						      General Commands Manual							  paste(1)

Name
       paste - merge file data

Syntax
       paste file1 file2...
       paste -dlist file1 file2...
       paste -s [-dlist] file1 file2...

Description
       In  the	first  two forms, concatenates corresponding lines of the given input files file1, file2, etc.	It treats each file as a column or
       columns of a table and pastes them together horizontally (parallel merging).

       In the last form, the command combines subsequent lines of the input file (serial merging).

       In all cases, lines are glued together with the tab character, or with characters from an optionally specified  list.   Output  is  to  the
       standard output, so it can be used as the start of a pipe, or as a filter, if - is used in place of a file name.

Options
       -       Used in place of any file name, to read a line from the standard input.	(There is no prompting).

       -dlist  Replaces  characters  of  all but last file with nontabs characters (default tab).  One or more characters immediately following -d
	       replace the default tab as the line concatenation character.  The list is used circularly, i. e. when exhausted, it is reused.	In
	       parallel  merging  (i. e. no -s option), the lines from the last file are always terminated with a new-line character, not from the
	       list.  The list may contain the special escape sequences: 
 (new-line), 	 (tab), \ (backslash), and  (empty string, not a null
	       character).   Quoting  may  be  necessary,  if characters have special meaning to the shell (for example, to get one backslash, use
	       -d"\\" ).
	       Without this option, the new-line characters of each but the last file (or last line in case of the -s option) are  replaced  by  a
	       tab character.  This option allows replacing the tab character by one or more alternate characters (see below).

       -s      Merges  subsequent  lines  rather  than	one  from  each input file.  Use tab for concatenation, unless a list is specified with -d
	       option.	Regardless of the list, the very last character of the file is forced to be a new-line.

Examples
       ls | paste -d" " -
       list directory in one column
       ls | paste - - - -
       list directory in four columns
       paste -s -d"	
" file
       combine pairs of lines into lines

Diagnostics
       line too long
		 Output lines are restricted to 511 characters.

       too many files
		 Except for -s option, no more than 12 input files may be specified.

See Also
       cut(1), grep(1), pr(1)

																	  paste(1)
All times are GMT -4. The time now is 05:21 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy