Collaborative Filtering on Skewed Datasets


 
Thread Tools Search this Thread
Special Forums News, Links, Events and Announcements UNIX and Linux RSS News Collaborative Filtering on Skewed Datasets
# 1  
Old 05-22-2008
Collaborative Filtering on Skewed Datasets

HPL-2008-50 Collaborative Filtering on Skewed Datasets - Banerjee, Somnath; Ramanathan, Krishnan
Keyword(s): collaborative filtering, skewed dataset, pLSA
Abstract: Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets the state of the art collaborative filtering methods perform worse than a simple probabilistic model. Our test bench inc ...
Full Report

More...
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Who are all opening my datasets,?

Hi, I need a command/script, who opened my dataset, consider a situation like, if a user has opened the dataset few days back then, that command/script should list his/her id. I don't want audit on my dataset, i need only list of users who are using my dataset. Thank you. (10 Replies)
Discussion started by: subbarao12
10 Replies

2. AIX

Problem in ftpying the datasets containing comp values to AIX from mainframe

Hi, When i am trying to ftp COBOL generated data sets which contain comp values to AIX in ASCII mode. the comp values are getting corrupted. If i ftp the data set in binary mode it is working properly, but for this i have to change some compiler options in the COBOL. Also if i want to use the... (5 Replies)
Discussion started by: sekhar gajjala
5 Replies

3. UNIX for Dummies Questions & Answers

ls command showing skewed listing

Hello, I'm running the ls command on an HP-UX 11i platform and am getting skewed listings. In other words, I see 3 columns of perfectly aligned file names, except 1 file is shifted by 2 or 3 bytes. The file to the immediate left of it seems to be causing the problem, for when I do an ls on... (1 Reply)
Discussion started by: bsp18974
1 Replies

4. Solaris

Copy data from zfs datasets

I 've few data sets in my zfs pool which has been exported to the non global zones and i want to copy data on those datasets/file systems to my datasets in new pool mounted on global zone, how can i do that ? (2 Replies)
Discussion started by: fugitive
2 Replies

5. UNIX for Advanced & Expert Users

Planning on writing a Guide to Working with Large Datasets

In a recent research experiment I was handling, I faced this task of managing huge amounts of data to the order of Terabytes and with the help of many people here, I managed to learn quite a lot of things in the whole process. I am sure that many people will keep facing these situations quite often... (2 Replies)
Discussion started by: Legend986
2 Replies
Login or Register to Ask a Question
H5TOTXT(1)							      h5utils								H5TOTXT(1)

NAME
h5totxt - generate comma-delimited text from 2d slices of HDF5 files SYNOPSIS
h5totxt [OPTION]... [HDF5FILE]... DESCRIPTION
h5totxt is a utility to generate comma-delimited text (and similar formats) from one-, two-, or more-dimensional slices of numeric datasets in HDF5 files. This way, the data can easily be imported into spreadsheets and similar programs for analysis and visualization. HDF5 is a free, portable binary format and supporting library developed by the National Center for Supercomputing Applications at the Uni- versity of Illinois in Urbana-Champaign. A single h5 file can contain multiple data sets; by default, h5totxt takes the first dataset, but this can be changed via the -d option, or by using the syntax HDF5FILE:DATASET. By default, the entire dataset is dumped to the output. in row-major order. For 3d datasets, this corresponds to a sequence of yz slices, in order of increasing x, separated by blank lines. If -T is specified, outputs in the transposed (column-major) order instead Often, however, you want only a one- or two-dimensional slice of multi-dimensional data. To do this, you specify coordinates in one or more slice dimensions, via the -xyzt options. The most basic usage is something like 'h5totxt foo.h5', which will output comma-delimited text to stdout from the data in foo.h5. OPTIONS
-h Display help on the command-line options and usage. -V Print the version number and copyright info for h5totxt. -v Verbose output. -o file Send text output to file rather than to stdout (the default). -s sep Use the string sep to separate columns of the output rather than a comma (the default). -x ix, -y iy, -z iz, -t it This tells h5totxt to use a particular slice of a multi-dimensional dataset. e.g. -x causes a yz plane (of a 3d dataset) to be used, at an x index of ix (where the indices run from zero to one less than the maximum index in that direction). Here, x/y/z cor- respond to the first/second/third dimensions of the HDF5 dataset. The -t option specifies a slice in the last dimension, whichever that might be. See also the -0 option to shift the origin of the x/y/z slice coordinates to the dataset center. -0 Shift the origin of the x/y/z slice coordinates to the dataset center, so that e.g. -0 -x 0 (or more compactly -0x0) returns the central x plane of the dataset instead of the edge x plane. (-t coordinates are not affected.) -T Transpose the data (interchange the dimension ordering). By default, no transposition is done. -. numdigits Output numdigits digits after the decimal point (defaults to 16). -d name Use dataset name from the input files; otherwise, the first dataset from each file is used. Alternatively, use the syntax HDF5FILE:DATASET, which allows you to specify a different dataset for each file. You can use the h5ls command (included with hdf5) to find the names of datasets within a file. BUGS
Send bug reports to S. G. Johnson, stevenj@alum.mit.edu. AUTHORS
Written by Steven G. Johnson. Copyright (c) 2005 by the Massachusetts Institute of Technology. h5utils March 9, 2002 H5TOTXT(1)