Sponsored Content
Top Forums Shell Programming and Scripting Filling positions based on consensus character Post 302434225 by bartus11 on Friday 2nd of July 2010 12:20:20 AM
Old 07-02-2010
Try AWK again, but this time, 1.awk:
Code:
/>/{fr=$3;
getline;
n=split ($0,a,""); 
for (i=1;i<=n;i++) b[i"-"a[i]]+=fr}
END{for (i in b) {split (i,c,"-"); 
if (d[c[1]]<=b[i]) {e[c[1]]=c[2];
d[c[1]]=b[i]}}
for (i in e) print i" "e[i]}

2.awk:
Code:
NR==FNR{a[$1]=$2;next}
{n=split($0,b,"");
for (i=1;i<=n;i++) if (b[i]=="-" b[i]=a[i]; 
for (i=1;i<=n;i++) printf b[i];
printf "\n"}

command:
Code:
awk -f 1.awk Input.txt | awk -f 2.awk - Input.txt > Output.txt

 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script replace positions if certain positions equal prescribed value

I am attempting to replace positions 44-46 with YYY if positions 48-50 = XXX. awk -F "" '{if (substr($0,48,3)=="XXX") $44="YYY"}1' OFS="" $filename > $tempfile But this is not working, 44-46 is still spaces in my tempfile instead of YYY. Any suggestions would be greatly appreciated. (9 Replies)
Discussion started by: halplessProblem
9 Replies

2. Shell Programming and Scripting

filling a character

in my file data is like this 1,2,3 3,4,5,6,7,8 10,11,23,24 i want to make as 1,2,3,?,?,? 3,4,5,6,7,8 10,11,23,24,?,? here max no of words(separated by comma) in a line is 6.so every line contains 6 words.Line which have less than 6 words replaced with '?' as a word i have... (3 Replies)
Discussion started by: new2ubuntulinux
3 Replies

3. Shell Programming and Scripting

seds to extract fields based on positions

Hi My file has a series of rows up to 160 characters in length. There are 7 columns for each row. In each row, column 1 starts at position 4 column 2 starts at position 12 column 3 starts at position 43 column 4 starts at position 82 column 5 starts at... (7 Replies)
Discussion started by: malts18
7 Replies

4. Shell Programming and Scripting

Extract text between two character positions

Greetings. I need to extract text between two character positions, e.g: all text between character 4921 and 6534. The text blocks are FASTA-format sequence of whole chromosomes, so basically a million A, T, G, C, combinations. E.g: >Chr_1 ACCTGTTCAACTCTCAGGACTCTCAGGTCAACTCTCAG... (3 Replies)
Discussion started by: Twinklefingers
3 Replies

5. Shell Programming and Scripting

Sort based on positions in flat file

Hello, For example: 12........6789101112..............20212223242526..................50 ( Positions) LName FName DOB (Lastname starts from 1 to 6 , FName from 8 to 15 and date of birth from 21 to29) CURTIS KENNETH ... (5 Replies)
Discussion started by: duplicate
5 Replies

6. Shell Programming and Scripting

Join based on positions

I have two text files as shown below cat file1.txt Id leng sal mon 25671 34343 56565 5565 44888 56565 45554 6868 23343 23423 26226 6224 77765 88688 87464 6848 66776 23343 63463 4534 cat file2.txt Id number 25671 34343 76767 34234 23343 23423 66776 23343 (4 Replies)
Discussion started by: halfafringe
4 Replies

7. UNIX for Dummies Questions & Answers

Filling positions based on frequency

I have files with hundreds of sequences with frequency values reported as "Freq X" and missing characters represented by a dash ("-"), something like this >39sample Freq 4 TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTAGCAGCACTA >22sample Freq 15 T-GATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGTAGCAGCACTA... (12 Replies)
Discussion started by: Xterra
12 Replies

8. Shell Programming and Scripting

Filter lines based on values at specific positions

hi. I have a Fixed Length text file as input where the character positions 4-5(two character positions starting from 4th position) indicates the LOB indicator. The file structure is something like below: 10126Apple DrinkOmaha 10231Milkshake New Jersey 103 Billabong Illinois ... (6 Replies)
Discussion started by: kumarjt
6 Replies
CUT(1)							    BSD General Commands Manual 						    CUT(1)

NAME
cut -- cut out selected portions of each line of a file SYNOPSIS
cut -b list [-n] [file ...] cut -c list [file ...] cut -f list [-d delim] [-s] [file ...] DESCRIPTION
The cut utility cuts out selected portions of each line (as specified by list) from each file and writes them to the standard output. If no file arguments are specified, or a file argument is a single dash ('-'), cut reads from the standard input. The items specified by list can be in terms of column position or in terms of fields delimited by a special character. Column numbering starts from 1. The list option argument is a comma or whitespace separated set of numbers and/or number ranges. Number ranges consist of a number, a dash ('-'), and a second number and select the fields or columns from the first number to the second, inclusive. Numbers or number ranges may be preceded by a dash, which selects all fields or columns from 1 to the last number. Numbers or number ranges may be followed by a dash, which selects all fields or columns from the last number to the end of the line. Numbers and number ranges may be repeated, overlapping, and in any order. If a field or column is specified multiple times, it will appear only once in the output. It is not an error to select fields or columns not present in the input line. The options are as follows: -b list The list specifies byte positions. -c list The list specifies character positions. -d delim Use delim as the field delimiter character instead of the tab character. -f list The list specifies fields, separated in the input by the field delimiter character (see the -d option.) Output fields are separated by a single occurrence of the field delimiter character. -n Do not split multi-byte characters. Characters will only be output if at least one byte is selected, and, after a prefix of zero or more unselected bytes, the rest of the bytes that form the character are selected. -s Suppress lines with no field delimiter characters. Unless specified, lines with no delimiters are passed through unmodified. ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of cut as described in environ(7). EXIT STATUS
The cut utility exits 0 on success, and >0 if an error occurs. EXAMPLES
Extract users' login names and shells from the system passwd(5) file as ``name:shell'' pairs: cut -d : -f 1,7 /etc/passwd Show the names and login times of the currently logged in users: who | cut -c 1-16,26-38 SEE ALSO
colrm(1), paste(1) STANDARDS
The cut utility conforms to IEEE Std 1003.2-1992 (``POSIX.2''). HISTORY
A cut command appeared in AT&T System III UNIX. BSD
December 21, 2006 BSD
All times are GMT -4. The time now is 08:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy