Vector base Cosine Similarity for two Matrices -- R in UNIX


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Vector base Cosine Similarity for two Matrices -- R in UNIX
# 1  
Old 02-21-2013
Vector base Cosine Similarity for two Matrices -- R in UNIX

Dear All,
I am facing a problem and I would be Thankful if you can help
Hope this is the right place to ask this question
I have two matrices of (row=10, col=3) and I want to get the cosine similarity between two lines (vectors) of each file --> the result should be (10,1) of cosine measures
I am using cosine function from Package(lsa) from R called in unix but I am facing problems with it
if these files had only one row per file I can calculate the cosine similarity as following
Code:
data01 <- c(t(read.table(file = "data01.csv", sep = ",", header=FALSE)))
data02 <- c(t(read.table(file = "data02.csv", sep = ",", header=FALSE)))
result <-cosine(data01,data02)
write.csv(result, "result.csv")

but facing problems reading lines of two files into Vectors to do the same
I have tried to write a code, it does not give any error but does not create anything and I dont know what I am doing wrong --- (new to R)
Code:
con  <- file('data01.txt', open="r")
con2 <- file('data02.txt', open="r")
a <- list();
b <- list();
test <- list();
current.line01 <- 1
current.line02 <- 1
while (length(data01 <- readLines(con, n = 10, warn = FALSE)) > 0) {
   while (length(data02 <- readLines(con2, n = 10, warn = FALSE)) > 0) {
		a[[current.line01]]<- c(data01)
		b[[current.line02]]<- c(data02)
		test <-cosine(a[[current.line01]], b[[current.line02]])
		write.table(test , "test.txt")
		current.line01 <- current.line + 1
		current.line02 <- current.line + 1
  } 
  } 
close(con)
close(con2)

can you please help me?
SmilieSmilieSmilie
# 2  
Old 02-21-2013
I have no knowledge of R, but I see you have two while loops nested.
Shouldn't it be just one while loop from file1,
and within that you read one record from file2?
Something like
Code:
while (length(data01 <- readLines(con, n = 10, warn = FALSE)) > 0) {
  data02 <- readLines(con2, n = 10, warn = FALSE)
  ...
}

This User Gave Thanks to MadeInGermany For This Post:
# 3  
Old 02-21-2013
Yes, forgot to rewind the tape on that inner file before reusing it if you want an n squared cartesian product, but perhaps you want more of a paste: line N of both files only.
# 4  
Old 02-22-2013
DGPickett: Can you please explain more?
# 5  
Old 02-22-2013
If you read a file to EOF with the inner while, then the outer while loops, the inner file handle is still at EOF. Sequential disk files are like tape drives, and FILE* in C has a redundant command rewind(), which is an fseek to 0 absolute. Man Page for rewind (opensolaris Section 3) - The UNIX and Linux Forums Of course, R may rewind for you, but that seems a bit too magic.
This User Gave Thanks to DGPickett For This Post:
# 6  
Old 02-22-2013
I have changed few things including the inner loop ... but now I get an error Smilie
I cant personally see how is it going to read each of the second files lines...
Code:
con  <- file('data01.csv', open="r")
con2 <- file('data02.csv', open="r")
current.line<- 1
while (length(data01 <- readLines(con, n = 10, warn = FALSE)) > 0) {
	data02 <- readLines(con2, n = 10, warn = FALSE)
	a[[current.line]]<- as.vector(data01)
	b[[current.line]]<- as.vector(data02)
	test<- cosine (a, b)
	write.csv(test, file="test.txt", sep=",")
	current.line <- current.line+ 1
  } 
close(con)
close(con2)

error I get
Code:
Error in crossprod(x, y) : 
  requires numeric/complex matrix/vector arguments

# 7  
Old 02-22-2013
OK, now we are matching line 1 of each file, etc. A paste not a product.

Is there a header line in either file?

a and b both have 1 element the first time cosine is called, two the second time, etc. Does that do the right thing?
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a content in a file with specific interval base on the delimited values using UNIX command

Hi All, we have a requirement to split a content in a text file every 5 rows and write in a new file . conditions: if 5th line falls between center of the statement . it should look upto after ";" files are below format: 1 UPDATE TABLE TEST1 SET VALUE ='AFDASDFAS' 2 WHERE... (3 Replies)
Discussion started by: KK230689
3 Replies

2. UNIX for Dummies Questions & Answers

Multiplication of two matrices

Hi there! I have two files like below File1(with a header, ~1000 rows, ~50 columns) ID1 ID2 ID3 ID4 ID5 MI1_A MI1_H MI2_A MI2_H 0 1 0 0 0 1 0 2 1 0 2 0 0 0 2 1 0 1 File2 (without a header, ~50 rows) MI1 A 0.4 3.1 MI2 B -0.2 0.1 Output ID2 M1_A M2_A 1 1*0.4 2*-0.2 2 2*0.4 0*-0.2 ... (22 Replies)
Discussion started by: Akang
22 Replies

3. UNIX for Dummies Questions & Answers

Merge matrices

Hello all, I have square matrices that look like the following, I want to merge these matrices together, and add the file names as headers. This is a simple example with two variables, actually I have ~1500 variables and 10 files. The order of variables in the matrices are consistent. Please... (2 Replies)
Discussion started by: senhia83
2 Replies

4. UNIX for Dummies Questions & Answers

Importing R cosine similarity to UNIX?

I really need help in this :( I have a file and would like to calculate the cosine similarity of the values in it... For now I do use R which has an easy function for doing so test <- as.matrix(read.csv(file="file.csv", sep=",", header=FALSE)) result<- cosine(t(test)) I am using unix of... (3 Replies)
Discussion started by: A-V
3 Replies

5. Shell Programming and Scripting

Help with merge data based on similarity

Input_file data1 USA 100 ASE data3 UK 20 GWQR data4 Brazil 40 QWE data2 Scotland 60 THWE data5 USA 40 QWERR Reference_file USA 12312 34532 1324 Brazil 23321 231 3421 Scotland 342 34235 UK 231 141 England... (1 Reply)
Discussion started by: patrick87
1 Replies

6. UNIX for Dummies Questions & Answers

Unix commands in Base SAS programming

hi all, iam using unix command in the basesas programming. i need to delete one folder which is dynamically creating when SAS script runs. rm -rf " dynamic foldername" iam not able to delete the folder it is saying rm: cannot remove directory `test_lin_prod_06_20091211_0516':... (2 Replies)
Discussion started by: bbc17484
2 Replies

7. UNIX for Advanced & Expert Users

Configure SCO Unix to print on windows base XP printer

Hi, I have done this year ago, and now I need to do it again, but did not remember how I do it. I have a slip printer on a windows xp workstation and i need to print from SCO unix application to that printer. I try to create a remote printer but the only option available is unix, the other to... (0 Replies)
Discussion started by: comsiconsa
0 Replies

8. UNIX for Dummies Questions & Answers

recommendation please. Unix base ftp program

in win32 platform, i can easily find some GUI based ftp application like cuteFtp, WsFtp and etc which provides GUI + resuming download. pls recommend me some similar application which runs on Sun Solaris sparc 8. hopefully it is free. thank you very much. (1 Reply)
Discussion started by: champion
1 Replies
Login or Register to Ask a Question