A first taste of Collaborative filtering


Recently I have become interested in collaborative filtering – the method by which Amazon.com can recommend you books based on your preferences, or Last.fm recommends you songs and artists based on what you’re listening to. Taste is an open source Java library that can be used to construct your own recommender system. In this blog entry, I’ll describe my first steps with the framework.

Getting the stuff

To get started, download the latest release of Taste from the Sourceforge project page. The zip file contains both the source code and the ready-built jar file taste.jar.

Next, we need some test data. Luckily, the GroupLens project offers a free test set of movie preference data. Download the 1,000,000 Data Set from their site (be sure to read the usage license). This gives us more than enough data to play with.

First recommendations

I always like to try a new library by writing some unit tests for it, so that’s what we will do now. Create a new project in your favorite IDE and add this unit test:

package nl.luminis.tastetest;

import static org.junit.Assert.assertEquals;

import java.util.List;

import org.junit.BeforeClass;
import org.junit.Test;

import com.planetj.taste.example.grouplens.GroupLensRecommender;
import com.planetj.taste.recommender.RecommendedItem;
import com.planetj.taste.recommender.Recommender;

public class GroupLensRecommenderTest {

	private static Recommender m_recommender;

	@BeforeClass
	public static void setUp() throws Exception {
		long start = System.currentTimeMillis();
		m_recommender = new GroupLensRecommender();
		System.out.println("Startup took " + (System.currentTimeMillis() - start) + " ms");
	}

	@Test
	public void testGetRecommendations() throws Exception {
		long start = System.currentTimeMillis();
		String userId = "6040";	// Some user ID from users.dat
		int itemsToRecommend = 5;

		List<RecommendedItem> recommendations = m_recommender.recommend(userId, itemsToRecommend);

		assertEquals(itemsToRecommend, recommendations.size());
		for (RecommendedItem recommendedItem : recommendations) {
			System.out.println("Item: " + recommendedItem.getItem().getID()
					+ " value " + recommendedItem.getValue());
		}
		System.out.println("Recommendation took " + (System.currentTimeMillis() - start) + " ms");
	}

}

To get this to compile, you need to have the example code from the Taste distribution on your classpath. I’m using eclipse, so what I did was to import the Taste distribution in an eclipse project and include this project in the classpath of my test project. You also need to have the files movies.dat and users.dat on your classpath, and Junit 4.

Running this test, the first result I got was:

java.lang.OutOfMemoryError: Java heap space

Generating recommendations for a big dataset like this takes a lot of memory, so we’ll give the JVM some more. Restart the unit test, now with the JVM parameters -Xms1024m -Xmx1024m. After some time, you’ll get something like this:

27-sep-2007 12:00:05 com.planetj.taste.impl.model.file.FileDataModel <init>
INFO: Creating FileDataModel for file /tmp/taste.ratings.txt
27-sep-2007 12:00:05 com.planetj.taste.impl.model.file.FileDataModel processFile
INFO: Reading file info...
27-sep-2007 12:00:09 com.planetj.taste.impl.model.file.FileDataModel reload
INFO: Applying transforms...
27-sep-2007 12:00:09 com.planetj.taste.impl.recommender.slopeone.MemoryDiffStorage buildAverageDiffs
INFO: Building average diffs...
Startup took 83793 ms
Item: 3022 value 0.5435813125718508
Item: 2351 value 0.4284638732361068
Item: 922 value 0.4226795625213073
Item: 2203 value 0.3747321044214305
Item: 2731 value 0.3640838125137015
Recommendation took 1219 ms

As you see, the initialization of the engine takes quite some time, 84 seconds on my Macbook Pro. After that, recommendations are generated in 1.2 seconds. The items numbers in the recommendations are the item IDs from the movies.dat file. For user 6040, the top recommended item 3022 is ‘The General’, a 1927 Buster Keaton movie (you can view the movie at Google video, it’s quite funny).

Using your own datamodel

Now that we have seen Taste in action, the next step is to use our own data for the recommendations. Taste expects the data to be provided through the DataModel interface.

The key abstractions behind this DataModel are:

  • User – a user having some ID
  • Item – an item that a user has preferences for, and that can be recommended
  • Preference – a preference value of a user for an item. The preference value can be in any scale. The only rule is that a higher value means that the user likes the item more.

We will create a very simple DataModel called SimpleDataModel. Let’s say we also want to make a movie recommendation service, so our Items are movies, and user rate the movies on a 1-to-5 scale (not very original, I know).

For simple scenarios like this one, Taste provides generic implementations of the DataModel interfaces called GenericDataModel, GenericUser, GenericItem and (you guessed it) GenericPreference. We will use these implementations to construct our own datamodel.

We will use a static list of users and movies (items):

public class SimpleDataModel implements DataModel {

	private Map<String, Integer> m_movies;
	private Map<String, Integer> m_users;

	public SimpleDataModel() {
		initMovies();
		initUserIds();
	}

	public void initMovies() {
		 m_movies = new HashMap<String, Integer>();
		 m_movies.put("Toy Story", 1);
		 m_movies.put("U-571", 2);
		 m_movies.put("Gladiator", 3);
		 m_movies.put("Godfather, The", 4);
		 m_movies.put("Back to the Future", 5);
		 m_movies.put("Quiz Show", 6);
		 m_movies.put("Sleepless in Seattle", 7);
		 m_movies.put("Close Shave, A", 8);
		 m_movies.put("2001: A Space Odyssey", 9);
		 m_movies.put("Top Gun", 10);
	}

	public void initUserIds() {
		m_users = new HashMap<String, Integer>();
		m_users.put("pieter", 1);
		m_users.put("niels", 2);
	}
}

SimpleDataModel will be a wrapper around GenericDataModel, which expects a list of Users in its constructor. A User in turn needs Preferences to be constructed, while a Preference needs an Item. So we start by creating Items:

	private Item createMovieItem(String movieTitle) {
		Integer movieId = m_movies.get(movieTitle);
		if (movieId == null) {
			throw new NullPointerException("No ID for " + movieTitle);
		}
		return new GenericItem<Integer>(m_movies.get(movieTitle));
	}

We use this Item to construct Preferences, taking a Map of movie titles and preference values:

	private List<Preference> createPreferences(Map<String, Integer> prefmap) {
		List<Preference> result = new ArrayList<Preference>(prefmap.size());
		for (Entry<String, Integer> entry : prefmap.entrySet()) {
			String movie = entry.getKey();
			int rating = entry.getValue();
			Preference pref = new GenericPreference(null, createMovieItem(movie), rating);
			result.add(pref);
		}
		return result;
	}

We use this in turn to create Users, taking a username and the prefences map:

	private GenericUser<Integer> createUser(String username, Map<String, Integer> prefmap) {
		GenericUser<Integer> user
		 	= new GenericUser<Integer>(m_users.get(username), createPreferences(prefmap));
		return user;
	}

We now have an easy way of creating a List of Users, that we use to initialize our GenericDataModel. We use it to create two users with their preferences and add them to the DataModel:

	private DataModel m_model;

	public SimpleDataModel() {
		initMovies();
		initUserIds();
		m_model = new GenericDataModel(createUsers());
	}

	private List<GenericUser<Integer>> createUsers() {
		List<GenericUser<Integer>> users = new ArrayList<GenericUser<Integer>>();

		Map<String, Integer> ratingsPieter = new HashMap<String, Integer>();
		ratingsPieter.put("Top Gun", 3);
		ratingsPieter.put("Close Shave, A", 5);
		ratingsPieter.put("Quiz Show", 2);
		ratingsPieter.put("Sleepless in Seattle", 3);
		users.add(createUser("pieter", ratingsPieter));

		Map<String, Integer> ratingsNiels = new HashMap<String, Integer>();
		ratingsNiels.put("Toy Story", 2);
		ratingsNiels.put("U-571", 1);
		ratingsNiels.put("Gladiator", 1);
		ratingsNiels.put("Godfather, The", 1);
		ratingsNiels.put("Back to the Future", 2);
		ratingsNiels.put("Top Gun", 2);
		ratingsNiels.put("Close Shave, A", 4);
		ratingsNiels.put("Quiz Show", 2);
		ratingsNiels.put("Sleepless in Seattle", 3);
		ratingsNiels.put("2001: A Space Odyssey", 2);
		users.add(createUser("niels", ratingsNiels));

		return users;
	}

The remaining methods are the methods from the DataModel interface that delegate to the embedded GenericDataModel. The full code is added as an attachment.

We can use the DataModel to generate preferences in the same way as with the MovieLens data. Of course, we need to use another Recommender, but this is easily done:

	DataModel datamodel = new SimpleDataModel();
	m_recommender = new SlopeOneRecommender(datamodel, false, false,
			new MemoryDiffStorage(datamodel, false, false, 100));

Here we use a Slope One recommender that is one of the several available recommenders in Taste. The false, false parameters in the constructor are there to disable the weighting option in the algorithm, that is necessary because we only have two users in our dataset. The MemoryDiffStorage parameter makes the recommender build a matrix of preference differences in memory (instead of for example a database), which is what we want for our simple test.

The test case for our SimpleDataModel recommender then becomes:

package nl.luminis.tastetest;

import static org.junit.Assert.assertEquals;

import java.util.List;

import org.junit.BeforeClass;
import org.junit.Test;

import com.planetj.taste.impl.recommender.slopeone.MemoryDiffStorage;
import com.planetj.taste.impl.recommender.slopeone.SlopeOneRecommender;
import com.planetj.taste.model.DataModel;
import com.planetj.taste.recommender.RecommendedItem;
import com.planetj.taste.recommender.Recommender;

public class SimpleDataModelRecommenderTest {

	private static Recommender m_recommender;

	@BeforeClass
	public static void setUp() throws Exception {
		DataModel datamodel = new SimpleDataModel();
		m_recommender = new SlopeOneRecommender(datamodel, false, false,
				new MemoryDiffStorage(datamodel, false, false, 100));
	}

	@Test
	public void testGetRecommendations() throws Exception {
		Integer userId = 1;
		int itemsToRecommend = 5;

		long start = System.currentTimeMillis();
		List<RecommendedItem> recommendations = m_recommender.recommend(userId, itemsToRecommend);

		assertEquals(itemsToRecommend, recommendations.size());
		for (RecommendedItem recommendedItem : recommendations) {
			System.out.println("Item: " + recommendedItem.getItem().getID()
					+ " value " + recommendedItem.getValue());
		}
		System.out.println("Recommendation took " + (System.currentTimeMillis() - start) + " ms");
	}
}

Running the test results in:

27-sep-2007 15:41:28 com.planetj.taste.impl.recommender.slopeone.MemoryDiffStorage buildAverageDiffs
INFO: Building average diffs...
Item: 9 value 2.5
Item: 1 value 2.5
Item: 5 value 2.5
Item: 2 value 1.5
Item: 4 value 1.5
Recommendation took 9 ms

The recommendations are hardly useful, but with only two users that’s expected.

Next steps

Now that you’ve seen how to use a Recommender, and how to model your own data for it, the next step would be to experiment with the different algorithms and optimizations that Taste has to offer. Unfortunately there is no best way for all, you’ll have to find the optimum settings for yourself given your own data.

More information can be found at the Taste homepage, that also includes a list of useful links that you should take a look at. Also, the forum is a great source of information.

  1. No comments yet.
(will not be published)
  1. No trackbacks yet.