Collaborative Filtering on Skewed Datasets


 
Thread Tools Search this Thread
Special Forums News, Links, Events and Announcements UNIX and Linux RSS News Collaborative Filtering on Skewed Datasets
# 1  
Old 05-22-2008
Collaborative Filtering on Skewed Datasets

HPL-2008-50 Collaborative Filtering on Skewed Datasets - Banerjee, Somnath; Ramanathan, Krishnan
Keyword(s): collaborative filtering, skewed dataset, pLSA
Abstract: Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets the state of the art collaborative filtering methods perform worse than a simple probabilistic model. Our test bench inc ...
Full Report

More...
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Who are all opening my datasets,?

Hi, I need a command/script, who opened my dataset, consider a situation like, if a user has opened the dataset few days back then, that command/script should list his/her id. I don't want audit on my dataset, i need only list of users who are using my dataset. Thank you. (10 Replies)
Discussion started by: subbarao12
10 Replies

2. AIX

Problem in ftpying the datasets containing comp values to AIX from mainframe

Hi, When i am trying to ftp COBOL generated data sets which contain comp values to AIX in ASCII mode. the comp values are getting corrupted. If i ftp the data set in binary mode it is working properly, but for this i have to change some compiler options in the COBOL. Also if i want to use the... (5 Replies)
Discussion started by: sekhar gajjala
5 Replies

3. UNIX for Dummies Questions & Answers

ls command showing skewed listing

Hello, I'm running the ls command on an HP-UX 11i platform and am getting skewed listings. In other words, I see 3 columns of perfectly aligned file names, except 1 file is shifted by 2 or 3 bytes. The file to the immediate left of it seems to be causing the problem, for when I do an ls on... (1 Reply)
Discussion started by: bsp18974
1 Replies

4. Solaris

Copy data from zfs datasets

I 've few data sets in my zfs pool which has been exported to the non global zones and i want to copy data on those datasets/file systems to my datasets in new pool mounted on global zone, how can i do that ? (2 Replies)
Discussion started by: fugitive
2 Replies

5. UNIX for Advanced & Expert Users

Planning on writing a Guide to Working with Large Datasets

In a recent research experiment I was handling, I faced this task of managing huge amounts of data to the order of Terabytes and with the help of many people here, I managed to learn quite a lot of things in the whole process. I am sure that many people will keep facing these situations quite often... (2 Replies)
Discussion started by: Legend986
2 Replies
Login or Register to Ask a Question
H5FROMTXT(1)							      h5utils							      H5FROMTXT(1)

NAME
h5fromtxt - convert text input to an HDF5 file SYNOPSIS
h5fromtxt [OPTION]... [HDF5FILE] DESCRIPTION
h5fromtxt takes a series of numbers from standard input and outputs a multi-dimensional numeric dataset in an HDF5 file. HDF5 is a free, portable binary format and supporting library developed by the National Center for Supercomputing Applications at the Uni- versity of Illinois in Urbana-Champaign. A single h5 file can contain multiple data sets; by default, h5fromtxt creates a dataset called "data", but this can be changed via the -d option, or by using the syntax HDF5FILE:DATASET. The -a option can be used to append new datasets to an existing HDF5 file. All characters besides the numbers (and associated decimal points, etcetera) in the input are ignored. By default, the data is assumed to be a two-dimensional MxN dataset where M is the number of rows (delimited by newlines) and N is the number of columns. In this case, it is an error for the number of columns to vary between rows. If M or N is 1 then the data is written as a one-dimensional dataset. Alternatively, you can specify the dimensions of the data explicitly via the -n size option, where size is e.g. "2x2x2". In this case, newlines are ignored and the data is taken as an array of the given size stored in row-major ("C") order (where the last index varies most quickly as you step through the data). e.g. a 2x2x2 array would be have the elements listed in the order: (0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1). A simple example is: h5fromtxt foo.h5 <<EOF 1 2 3 4 5 6 7 8 EOF which reads in a 2x4 space-delimited array from standard input. OPTIONS
-h Display help on the command-line options and usage. -V Print the version number and copyright info for h5fromtxt. -v Verbose output. -a If the HDF5 output file already exists, append the data as a new dataset rather than overwriting the file (the default behavior). An existing dataset of the same name within the file is overwritten, however. -n size Instead of trying to infer the dimensions of the array from the rows and columns of the input, treat the data as a sequence of num- bers in row-major order forming an array of dimensions size. size is of the form MxNxLx... (with M, N, L being numbers) and may be of any dimensionality. -T Transpose the input when it is written, reversing the dimensions. -d name Write to dataset name in the output; otherwise, the output dataset is called "data" by default. Alternatively, use the syntax HDF5FILE:DATASET. BUGS
Send bug reports to S. G. Johnson, stevenj@alum.mit.edu. AUTHORS
Written by Steven G. Johnson. Copyright (c) 2005 by the Massachusetts Institute of Technology. h5utils March 9, 2002 H5FROMTXT(1)