Sponsored Content
Top Forums Shell Programming and Scripting Filling positions based on consensus character Post 302434202 by Xterra on Thursday 1st of July 2010 08:13:08 PM
Old 07-01-2010
Thanks once again for your help. This is what I am getting
Code:
awk: cmd. line:1: {n=split($0,b,"");for (i=1;i<=n;i++) if (b[i]=="-" b[i]=a[i]; for (i=1;i<=n;i++) printf b[i];\
awk: cmd. line:1: ^ syntax error

As a result, the Outputfile is empty.
I used the code in its entire and by the 2 individual AWKs.
Code:
$ awk '/>/{fr=$3;getline;n=split ($0,a,""); for (i=1;i<=n;i++) b[i"-"a[i]]+=fr}\
END{for (i in b) {split (i,c,"-"); if (d[c[1]]<=b[i]){e[c[1]]=c[2];d[c[1]]=b[i]}}\
for (i in e) print i" "e[i]}' Input.txt | awk 'NR==FNR{a[$1]=$2;next}\
{n=split($0,b,"");for (i=1;i<=n;i++) if (b[i]=="-" b[i]=a[i]; for (i=1;i<=n;i++) printf b[i];\
printf "\n"}' - Input.txt > Output.txt

Perhaps Perl?
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script replace positions if certain positions equal prescribed value

I am attempting to replace positions 44-46 with YYY if positions 48-50 = XXX. awk -F "" '{if (substr($0,48,3)=="XXX") $44="YYY"}1' OFS="" $filename > $tempfile But this is not working, 44-46 is still spaces in my tempfile instead of YYY. Any suggestions would be greatly appreciated. (9 Replies)
Discussion started by: halplessProblem
9 Replies

2. Shell Programming and Scripting

filling a character

in my file data is like this 1,2,3 3,4,5,6,7,8 10,11,23,24 i want to make as 1,2,3,?,?,? 3,4,5,6,7,8 10,11,23,24,?,? here max no of words(separated by comma) in a line is 6.so every line contains 6 words.Line which have less than 6 words replaced with '?' as a word i have... (3 Replies)
Discussion started by: new2ubuntulinux
3 Replies

3. Shell Programming and Scripting

seds to extract fields based on positions

Hi My file has a series of rows up to 160 characters in length. There are 7 columns for each row. In each row, column 1 starts at position 4 column 2 starts at position 12 column 3 starts at position 43 column 4 starts at position 82 column 5 starts at... (7 Replies)
Discussion started by: malts18
7 Replies

4. Shell Programming and Scripting

Extract text between two character positions

Greetings. I need to extract text between two character positions, e.g: all text between character 4921 and 6534. The text blocks are FASTA-format sequence of whole chromosomes, so basically a million A, T, G, C, combinations. E.g: >Chr_1 ACCTGTTCAACTCTCAGGACTCTCAGGTCAACTCTCAG... (3 Replies)
Discussion started by: Twinklefingers
3 Replies

5. Shell Programming and Scripting

Sort based on positions in flat file

Hello, For example: 12........6789101112..............20212223242526..................50 ( Positions) LName FName DOB (Lastname starts from 1 to 6 , FName from 8 to 15 and date of birth from 21 to29) CURTIS KENNETH ... (5 Replies)
Discussion started by: duplicate
5 Replies

6. Shell Programming and Scripting

Join based on positions

I have two text files as shown below cat file1.txt Id leng sal mon 25671 34343 56565 5565 44888 56565 45554 6868 23343 23423 26226 6224 77765 88688 87464 6848 66776 23343 63463 4534 cat file2.txt Id number 25671 34343 76767 34234 23343 23423 66776 23343 (4 Replies)
Discussion started by: halfafringe
4 Replies

7. UNIX for Dummies Questions & Answers

Filling positions based on frequency

I have files with hundreds of sequences with frequency values reported as "Freq X" and missing characters represented by a dash ("-"), something like this >39sample Freq 4 TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTAGCAGCACTA >22sample Freq 15 T-GATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGTAGCAGCACTA... (12 Replies)
Discussion started by: Xterra
12 Replies

8. Shell Programming and Scripting

Filter lines based on values at specific positions

hi. I have a Fixed Length text file as input where the character positions 4-5(two character positions starting from 4th position) indicates the LOB indicator. The file structure is something like below: 10126Apple DrinkOmaha 10231Milkshake New Jersey 103 Billabong Illinois ... (6 Replies)
Discussion started by: kumarjt
6 Replies
iso2022(5)							File Formats Manual							iso2022(5)

NAME
iso2022, iso-2022, ISO-2022 - A character encoding mechanism standardized by the International Standards Organization (ISO) DESCRIPTION
The ISO-2022 standard defines a mechanism for handling single-byte and multibyte characters. The standard specifies four classes of charac- ter sets: The 94-charset class, which contains character sets with 94 positions (single-byte characters). Examples are the ASCII and JIS X0201 character sets. The 96-charset class, which contains character sets with 96 positions (single-byte characters). Examples are the ISO Latin series of character sets. The 94x94-charset class, which contains character sets with 94x94 positions (2-byte characters). Examples are the GB 2312 and the CNS 11643 character sets. The 96x96-charset class, which contains character sets with 96x96 positions (2-byte characters). In the ISO-2022 standard, four registers, called G0, G1, G2 and G3, are used to reference a character set. Before a character set can be used, the character set must be assigned, or designated, to one of these registers. The designation of a character set is done by using an escape sequence in the following format: ESC [I] F In this format: Is an intermediate character that is used to designate a character set to one of the registers (G0, G1, G2, oR G3). Is a unique final character of a particular character set. The designation of a character set, whose final character is F, to different registers is as follows: Designates a multibyte character set (94x94 or 96x96) to G0. Designates a character set in the 94-charset class to G0. Designates a character set in the 94-charset class to G1. Designates a character set in the 94-charset class to G2. Designates a character set in the 94-charset class to G3. Designates a character set in the 96-charset class to G1. Designates a character set in the 96-charset class to G2. Designates a character set in the 96-charset class to G3. SEE ALSO
Commands: locale(1) Others: ascii(5), i18n_intro(5), iso2022jp(5), l10n_intro(5) iso2022(5)
All times are GMT -4. The time now is 05:38 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy