Matrix with Percentage


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matrix with Percentage
# 8  
Old 06-16-2016
OK. Seeing your code and the output it produces makes it very clear that none of the sample input files you have shown us in the past in this thread were used as input to this script. You have never said that your input files contain an additional field at the start of each line that is to be ignored by your script. Your descriptions up until now have said that you are looking at the stings in the first two fields in your input, but your script never looks at the first field in your input file; it only looks at the second and third fields in your input.

You have now shown us your script and the output it produces. Thank you for both.

Now, PLEASE show us three more things:
  1. the input that was given to your script to produce the output you supplied in post #7 in this thread,
  2. the output that you WANT your script to produce (including the percentages that you want to be produced), and
  3. a clear description of what determines whether or not the string in field three of your input file is "similar" to the string in field two of your input file. (Is the requirement that the string in field 3 is a substring of field 2 where the comparison performed is case insensitive?)
# 9  
Old 06-22-2016
Code:
Code:
INPUT 1 (i/p 1)|INPUT 2 (i/p 2)
Bharat Bazar|Bharat Bazar
Binny's Sales| 
|Binny's
|
Bharat bazar|Bharat
binny's|binny
state|country

Above one is my input.

o/p i want

Code:
Code:
INPUT 1 (i/p 1)|INPUT 2 (i/p 2)
Bharat Bazar|Bharat Bazar|TRUE POSITIVE
Binny's Sales| |FALSE NEGATIVE
|Binny's|FALSE POSITIVE
||TRUE NEGATIVE
Bharat bazar|Bharat|TRUE POSITIVE
binny's|binny|TRUE POSITIVE
state|country|FALSE NEGATIVE

Don ,

my filed 3 should be sub string of field 2 and vice-versa, it should be case insensitive

Plz let me know of any more queries which needs to be answered, will be glad to do the same.
# 10  
Old 06-22-2016
This is becomig ridiculous. Your script from post#7 applied to sample in post#9 yields
Code:
||FALSE NEGATIVE
||FALSE NEGATIVE
||FALSE NEGATIVE
||FALSE NEGATIVE
||FALSE NEGATIVE
||FALSE NEGATIVE
||FALSE NEGATIVE
||FALSE NEGATIVE

I'm not prepared to continue following this incredibly incomplete thread.
# 11  
Old 06-22-2016
Rudi,

PFB the stuff, I have copy pasted my whole project, may be when u copy pasted with extra space or something

Code:
Code:
[sdp@blr-qe101 TDE]$ ./confusionmatrix1.sh test.txt 
INPUT 1 (i/p 1)|INPUT 2 (i/p 2)|TRUE POSITIVE
Bharat Bazar|Bharat Bazar|TRUE POSITIVE
Binny's Sales| |TRUE POSITIVE
|Binny's|FALSE POSITIVE
||TRUE NEGATIVE
Bharat bazar|Bharat|TRUE POSITIVE
binny's|binny|TRUE POSITIVE
state|country|TRUE POSITIVE
[sdp@blr-qe101 TDE]$ 
[sdp@blr-qe101 TDE]$ 
[sdp@blr-qe101 TDE]$ cat confusionmatrix1.sh 
awk -F "|" '
BEGIN{IGNORECASE=1} 
{ if ($1 == $2 && $1 != "" && $2 != "" ) { print $1 "|" $2 "|TRUE POSITIVE"; } 
   else if ($2 == "" && $1 != "") {print $1"|"$2"|FALSE NEGATIVE";} 
   else if ($1 == "" && $2 != "" ){print $1"|"$2"|FALSE POSITIVE";} 
   else if ($1 == "" && $2 == "") { print $1 "|" $2"|TRUE NEGATIVE"; } 
   else {print $1 "|" $2 "|TRUE POSITIVE";}
 
}' $1
[sdp@blr-qe101 TDE]$ cat test.txt 
INPUT 1 (i/p 1)|INPUT 2 (i/p 2)
Bharat Bazar|Bharat Bazar
Binny's Sales| 
|Binny's
|
Bharat bazar|Bharat
binny's|binny
state|country

# 12  
Old 06-22-2016
Quote:
Originally Posted by nikhil jain
Rudi,

PFB the stuff, I have copy pasted my whole project, may be when u copy pasted with extra space or something

Code:
Code:
[sdp@blr-qe101 TDE]$ ./confusionmatrix1.sh test.txt 
INPUT 1 (i/p 1)|INPUT 2 (i/p 2)|TRUE POSITIVE
Bharat Bazar|Bharat Bazar|TRUE POSITIVE
Binny's Sales| |TRUE POSITIVE
|Binny's|FALSE POSITIVE
||TRUE NEGATIVE
Bharat bazar|Bharat|TRUE POSITIVE
binny's|binny|TRUE POSITIVE
state|country|TRUE POSITIVE
[sdp@blr-qe101 TDE]$ 
[sdp@blr-qe101 TDE]$ 
[sdp@blr-qe101 TDE]$ cat confusionmatrix1.sh 
awk -F "|" '
BEGIN{IGNORECASE=1} 
{ if ($1 == $2 && $1 != "" && $2 != "" ) { print $1 "|" $2 "|TRUE POSITIVE"; } 
   else if ($2 == "" && $1 != "") {print $1"|"$2"|FALSE NEGATIVE";} 
   else if ($1 == "" && $2 != "" ){print $1"|"$2"|FALSE POSITIVE";} 
   else if ($1 == "" && $2 == "") { print $1 "|" $2"|TRUE NEGATIVE"; } 
   else {print $1 "|" $2 "|TRUE POSITIVE";}
 
}' $1
[sdp@blr-qe101 TDE]$ cat test.txt 
INPUT 1 (i/p 1)|INPUT 2 (i/p 2)
Bharat Bazar|Bharat Bazar
Binny's Sales| 
|Binny's
|
Bharat bazar|Bharat
binny's|binny
state|country

If you look at the code in your post #11 (which is included above) and compare it to the code you showed us in your post #7, you might note that (except for the lines:
Code:
awk -F "|" '
BEGIN{IGNORECASE=1} 
}' $1

(i.e., the 1st two lines and the last line of your code), EVERYTHING is different. The code above references fields 1 and 2, while the code in post #7 references fields 2 and 3. The differences in YOUR code between posts #7 and #11 are not differences in spacing. They are differences in YOUR code.

When you originally posted your data, empty fields were shown as the literal text <BLANK>, later posts of your data change <BLANK> to a single <space> character, and your latest posts have used empty fields. The following code assumes that they are empty fields, but I have no reason to believe that that assumption is correct. (That assumption just makes more sense in my view of what might be reasonable.)

The title of this thread is "Matrix with Percentage" and you have asked for us to provide code that computes percentages in three of your posts. You have been asked to show us sample output containing percentages as you want them to be displayed three times, but you still have not shown us any example of what you want. You did show us sample code. And, if that code were added to the code you have shown us, it would not produce any output at all. If you refuse to clearly explain what you want and you refuse to show us any example of what you want, how do you expect us to be able to produce code that will produce anything remotely similar to what you want?

Some of your sample output uses mixed case in the 3rd output field (i.e., True Positive, True Negative, False Positive, and False Negative. Your descriptions of the output that should be produce sometimes use all lowercase output, sometimes use mixed case output, and sometimes use all uppercase output. The following code uses all uppercase since that is what is used in your latest sample outputs and in all of the code you have shown us.

The awk that I'm using doesn't have an IGNORECASE variable that your code depends upon (although, if I understand what you're trying to do, case insensitivity would not matter with any of your sample data). The code below only uses features of awk that are required by the POSIX standards.

The descriptions you have supplied for what should appear in the 3rd output field still are not clear. The following code assumes that what you want(except for the header line in the output) is two words of output in the 3rd field where the 1st word is "TRUE" if the 2nd input field is equal to the 1st input field or if the 2nd input field is a non-empty substring of the 1st input field (when performing a case insensitive match); otherwise it is the word "FALSE". And the 2nd word of the 3rd output field is to be "POSITIVE" if the 2nd input field is not an empty string, or "NEGATIVE" if it is an empty string. For the output header line, the following code assumes that you want the output shown in your post #1 (again, because that kind of made sense to me when the output header in most of your other posts did not).

So, even though this code produces output that is not at all similar to the output you said you wanted in post #9 in this thread, maybe the following code with give you something you can work with to get what you want:
Code:
awk '
FNR == 1 {
	FS = OFS = "|"
	print $0, "OUTPUT (o/p)"
	next
}
{	
	f3 = (($1 == $2 || ($2 != "" && index(tolower($1), tolower($2)))) ? \
		"TRUE " : "FALSE ")
	f3 = f3 ($2 == "" ? "NEGATIVE" : "POSITIVE")
	print $0, f3
}' "$1"

which, when given the name of your sample input file you provided in post #11 in this thread as a command-line argument, produces the output:
Code:
INPUT 1 (i/p 1)|INPUT 2 (i/p 2)|OUTPUT (o/p)
Bharat Bazar|Bharat Bazar|TRUE POSITIVE
Binny's Sales| |TRUE POSITIVE
|Binny's|FALSE POSITIVE
||TRUE NEGATIVE
Bharat bazar|Bharat|TRUE POSITIVE
binny's|binny|TRUE POSITIVE
state|country|FALSE POSITIVE

I believe the output here is correct even though it differs from the output you said you wanted for the following reasons:
  • Line 1 includes a title for the 3rd output field (as shown in some of your requested output, but not in post #9).
  • Line 3 reports TRUE POSITIVE (instead of FALSE NEGATIVE because the <space> character in field 2 on that line is a substring of field 1 on that line AND field 2 is not an empty string.
  • And the last line of the output reports FALSE POSITIVE instead of FALSE NEGATIVE because the string country is not an empty string and is not a substring of the string state.
You have also been asked what operating system and shell you're using, but you haven't answered that question either. If you are trying to run this on a Solaris/SunOS system, change awk in the above script to /usr/xpg4/bin/aw or nawk.
# 13  
Old 06-23-2016
Don,

I tried on my whole file after executing your script, It is not able to replace true positive in every place properly, one of the small examples shown below.
after executing your script

i/p
Code:
Code:
TRADER JOESE|Trader Joe's 
TRADER JOESE|Trader Joe's 
TARGET CORPORATION|Target 
IN-N-OUT-BURGER|In-N-Out Burger 
WALMART|Vudu 
ROSS STORES, INC|Ross Stores Inc. 
CHIPOTLE MEXICAN GRILL, INC|Chipotle 
SPORTS AUTHORITY|The Sports Authority


Code:
Code:
TRADER JOESE|Trader Joe's|FALSE POSITIVE 
TRADER JOESE|Trader Joe's|FALSE POSITIVE 
TARGET CORPORATION|Target|TRUE POSITIVE 
IN-N-OUT-BURGER|In-N-Out Burger|FALSE POSITIVE 
WALMART|Vudu|FALSE POSITIVE 
ROSS STORES, INC|Ross Stores Inc.|FALSE POSITIVE 
CHIPOTLE MEXICAN GRILL, INC|Chipotle|TRUE POSITIVE 
SPORTS AUTHORITY|The Sports Authority|FALSE POSITIVE

trader joese does match with trader joe's but it shows FALSE POSITIVE not the TRUE POSITIVE as in AMAZON shown below

Code:
Code:
AMAZON.COM, INC|Amazon.com|TRUE POSITIVE
AMAZON.COM, INC|Amazon.com|TRUE POSITIVE
AMAZON.COM, INC|Amazon.com|TRUE POSITIVE
AMAZON.COM, INC|Amazon.com|TRUE POSITIVE

# 14  
Old 06-24-2016
I apologize for wasting your time trying to help you.

Where in the string TRADER JOESE do you find the apostrophe (or single-quote) that is in Trader Joe's? There is no match for that character, so the field 2 value can't possibly be a substring of the field 1 value.

Similarly for:
Code:
IN-N-OUT-BURGER|In-N-Out Burger|

the <space> before Burger in the 2nd field is not present in the 1st field; so it is not a substring of the 1st field. A <space> character is not a match for a <hyphen> character.

None of the lines you have shown in post #13 meet the criteria you specified except the two lines that show the 3rd output field being TRUE POSITIVE (i.e., they do not have strings in field 2 that are substrings of field 1 with a case insensitive match).

If you want to change your requirements again as to what constitutes a match, YOU HAVE to ACTUALLY state your exact requirements instead of assuming that we can read your mind and guess what all of your unstated requirements might be.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Percentage calculation

Hi, I have a text file in below format. I trying to find a solution for finding percentage used for each of the NAMEs. Directory ALLOCATED USED NAME1 93MB 93KB NAME2 25G 62K NAME3 14G 873M NAME4 25G 62K NAME5 20G... (10 Replies)
Discussion started by: ctrld
10 Replies

2. Shell Programming and Scripting

Calculate percentage of columns greater than certain value in a matrix using awk

This matrix represents correlation values. Is it possible to calculate the percentage of columns (a1, a2, a3) that have a value >= |0.5| and report the percentage that has positive correlation >0.5 and negative correlation <-0.5 separately. thanx in advance! input name a1 a2 a3... (5 Replies)
Discussion started by: quincyjones
5 Replies

3. Shell Programming and Scripting

Need to monitor OS in percentage

Hi, I am looking for generic commands / scripts that could run across platforms especially on HP Itanium boxes to give me % of free OS parameters For eg: Free Total Memory RAM : 20 % Free Total Swap Memory: 35% Free Total CPU utilisation: 44% Free Disk Space: /appl = 55%... (5 Replies)
Discussion started by: mohtashims
5 Replies

4. Shell Programming and Scripting

Percentage of occurence

Dear all, I have data like below and i need to add coloumn before the COUNT field to see the Percentage out of all COUNT field value for respective raw. ============================================= COUNT CODE sConnType tConnType... (6 Replies)
Discussion started by: Iroshan
6 Replies

5. Shell Programming and Scripting

Percentage calculation

i am trying to get percentage : but not able to do it: i tried : x=1 y=2 z=`expr $x/$y*100` it is not giving me result can u pls help on this (4 Replies)
Discussion started by: Aditya.Gurgaon
4 Replies

6. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

7. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

8. Shell Programming and Scripting

diagonal matrix to square matrix

Hello, all! I am struggling with a short script to read a diagonal matrix for later retrieval. 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125 1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies

9. UNIX for Dummies Questions & Answers

percentage

How to calculate percentage of two values in unix. (5 Replies)
Discussion started by: venkatesht
5 Replies

10. Shell Programming and Scripting

awk percentage

how would you calculate percentage by per line? Given a column of 16 lines, grab each line and divide it by the sum of the entire column and multiply by 100? thanks ... (8 Replies)
Discussion started by: rockiefx
8 Replies
Login or Register to Ask a Question