Filling positions based on frequency


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Filling positions based on frequency
# 8  
Old 06-20-2015
My bad, I meant 80%
# 9  
Old 06-20-2015
Quote:
Originally Posted by Xterra
My bad, I meant 80%
This is the part that controls the percentage:
Code:
p90 = tot * .9

Change it to your heart's content.
# 10  
Old 06-20-2015
Remove all comments, change elsefor for else for & p90 = tot *.8
Code:
$ awk '
FNR == 1 && cnt {
p90 = tot * .8
for(i in cc) {
if(cc[i] > p90) {
off = index(i, SUBSEP)
rep[substr(i, 1, off - 1)] = substr(i, off + 1)
}
delete cc[i]
}
}
FNR == NR {
if($2 == "Freq") {
cnt = $3
tot += cnt
} else for(i = length($0); i > 0; i--) {
if((c = substr($0, i, 1)) == "-") continue
cc[i, c] += cnt
}
next
}
NF == 1 {
for(i = length($0); i > 0; i--)
if((substr($0, i, 1) == "-") && (i in rep))
$0 = (i > 1 ? substr($0, 1, i - 1) : "") rep[i] \
substr($0, i + 1)

}
1' Input.txt Input.txt


Last edited by Xterra; 06-20-2015 at 01:34 PM..
# 11  
Old 06-20-2015
Quote:
Originally Posted by Xterra
Remove all comments, change elsefor for else for & p90 = tot *.8
Code:
$ awk '
FNR == 1 && cnt {
p90 = tot * .8
for(i in cc) {
if(cc[i] > p90) {
off = index(i, SUBSEP)
rep[substr(i, 1, off - 1)] = substr(i, off + 1)
}
delete cc[i]
}
}
FNR == NR {
if($2 == "Freq") {
cnt = $3
tot += cnt
} else for(i = length($0); i > 0; i--) {
if((c = substr($0, i, 1)) == "-") continue
cc[i, c] += cnt
}
next
}
NF == 1 {
for(i = length($0); i > 0; i--)
if((substr($0, i, 1) == "-") && (i in rep))
$0 = (i > 1 ? substr($0, 1, i - 1) : "") rep[i] \
substr($0, i + 1)

}
1' Input.txt Input.txt

I'm glad you got something that works for you.

If you look back at the code I supplied in post #6 in this thread, you might note that there is no elsefor anywhere in it. And, I would NOT have removed the tabs since the structure of the code is hidden without them.

For my own sanity, if I would have been updating the code to your new requirements, I would have changed p90 = tot * .9 to p80 = tot * .8 and globally changed other references to what used to be p90 to instead be p80. Or, if it is likely to change again, rename the variable something like threshold or LowLimit or limit.

I left in the comments so you could see the calculations that were accumulated during the 1st pass through the data and see the replacements that could be performed by the 2nd pass. If you didn't understand how the code worked, uncommenting those lines would have given you a peek under the covers at what is going on. And, if you run into some anomalous data in the future, it might be nice to have that debugging code as a backup to quickly get a look at what the script gathered from your data.

Oh, well...
# 12  
Old 06-22-2015
This works:
Code:
awk 'FNR == 1 && cnt {p80 = tot * .99; for(i in cc) { if(cc[i] > p80) { off = index(i, SUBSEP); rep[substr(i, 1, off - 1)] = substr(i, off + 1)} delete cc[i]}} FNR == NR {if($2 == "Freq") {cnt = $3; tot += cnt} else for(i = length($0); i > 0; i--) {if((c = substr($0, i, 1)) == "-") continue; cc[i, c] += cnt}next} NF == 1 { for(i = length($0); i > 0; i--) if((substr($0, i, 1) == "-") && (i in rep)) $0 = (i > 1 ? substr($0, 1, i - 1) : "") rep[i] \
substr($0, i + 1)}1' Input.txt Input.txt

But this doesn't
Code:
awk 'FNR == 1 && cnt {p80 = tot * .99; for(i in cc) { if(cc[i] > p80) { off = index(i, SUBSEP); rep[substr(i, 1, off - 1)] = substr(i, off + 1)} delete cc[i]}} FNR == NR {if($2 == "Freq") {cnt = $3; tot += cnt} else for(i = length($0); i > 0; i--) {if((c = substr($0, i, 1)) == "-") continue; cc[i, c] += cnt}next} NF == 1 { for(i = length($0); i > 0; i--) if((substr($0, i, 1) == "-") && (i in rep)) $0 = (i > 1 ? substr($0, 1, i - 1) : "") rep[i] \ substr($0, i + 1)}1' Input.txt Input.txt

The syntax is wrong I just cannot nail it down
# 13  
Old 06-22-2015
Code:
awk 'FNR == 1 && cnt {p80 = tot * .99; for(i in cc) { if(cc[i] > p80) { off = index(i, SUBSEP); rep[substr(i, 1, off - 1)] = substr(i, off + 1)} delete cc[i]}} FNR == NR {if($2 == "Freq") {cnt = $3; tot += cnt} else for(i = length($0); i > 0; i--) {if((c = substr($0, i, 1)) == "-") continue; cc[i, c] += cnt}next} NF == 1 { for(i = length($0); i > 0; i--) if((substr($0, i, 1) == "-") && (i in rep)) $0 = (i > 1 ? substr($0, 1, i - 1) : "") rep[i] \ substr($0, i + 1)}1' Input.txt Input.txt

Remove the red `\'. That would work only if is the last character on the line.
This User Gave Thanks to Aia For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter lines based on values at specific positions

hi. I have a Fixed Length text file as input where the character positions 4-5(two character positions starting from 4th position) indicates the LOB indicator. The file structure is something like below: 10126Apple DrinkOmaha 10231Milkshake New Jersey 103 Billabong Illinois ... (6 Replies)
Discussion started by: kumarjt
6 Replies

2. Shell Programming and Scripting

Join based on positions

I have two text files as shown below cat file1.txt Id leng sal mon 25671 34343 56565 5565 44888 56565 45554 6868 23343 23423 26226 6224 77765 88688 87464 6848 66776 23343 63463 4534 cat file2.txt Id number 25671 34343 76767 34234 23343 23423 66776 23343 (4 Replies)
Discussion started by: halfafringe
4 Replies

3. Shell Programming and Scripting

Sort based on positions in flat file

Hello, For example: 12........6789101112..............20212223242526..................50 ( Positions) LName FName DOB (Lastname starts from 1 to 6 , FName from 8 to 15 and date of birth from 21 to29) CURTIS KENNETH ... (5 Replies)
Discussion started by: duplicate
5 Replies

4. Shell Programming and Scripting

awk regardless positions

brw------- 1 oracle dba 49, 21 Apr 05 11:45 dprod_0000018 brw------- 1 oracle dba 49, 26 Apr 05 11:45 dprod_0000019 brw------- 1 oracle dba 43, 93 Feb 02 2011 dprod_000002 brw------- 1 oracle dba 49, 27 Apr 05 11:45 dprod_0000020... (4 Replies)
Discussion started by: Daniel Gate
4 Replies

5. Shell Programming and Scripting

seds to extract fields based on positions

Hi My file has a series of rows up to 160 characters in length. There are 7 columns for each row. In each row, column 1 starts at position 4 column 2 starts at position 12 column 3 starts at position 43 column 4 starts at position 82 column 5 starts at... (7 Replies)
Discussion started by: malts18
7 Replies

6. UNIX for Dummies Questions & Answers

Need help filling in ranges

I have a list of about 200,000 lines in a text file that look like this: 1 1 120 1 80 200 1 150 270 5 50 170 5 100 220 5 300 420 The first column is an identifier, the next 2 columns are a range (always 120 value range) I'm trying fill in the values of those ranges, and remove... (4 Replies)
Discussion started by: knott76
4 Replies

7. Shell Programming and Scripting

awk script replace positions if certain positions equal prescribed value

I am attempting to replace positions 44-46 with YYY if positions 48-50 = XXX. awk -F "" '{if (substr($0,48,3)=="XXX") $44="YYY"}1' OFS="" $filename > $tempfile But this is not working, 44-46 is still spaces in my tempfile instead of YYY. Any suggestions would be greatly appreciated. (9 Replies)
Discussion started by: halplessProblem
9 Replies

8. Shell Programming and Scripting

Filling positions based on consensus character

I have files with hundreds of sequences with missing characters represented by a dash ("-"), something like this I need to go sequence by sequence and if a dash is found, it should be replaced with the most common character in that particular position. Thus, in my example the dash in the second... (6 Replies)
Discussion started by: Xterra
6 Replies

9. Shell Programming and Scripting

Deleting sequences based on character frequency

This is what I would like to accomplish, I have an input file (file A) that consist of thousands of sequence elements with the same number of characters (length), each headed by a free text header starting with the chevron ‘>' character followed by the ID (all different IDs with different lenghts)... (9 Replies)
Discussion started by: Xterra
9 Replies
Login or Register to Ask a Question