Belated highlights from RecSys 2021

| #conference #recommender-systems

A 19th-century oil painting of Amsterdam by Charles Euphrasie Kuwasseg.

From Scene of Amsterdam (19th century) by Charles Euphrasie Kuwasseg, via Dorotheum.

In September, I virtually attended ACM’s RecSys conference, which was held in Amsterdam this year. I watched nearly 70 talks, ranging from keynotes to quick poster sessions, and below I’ve collected thoughts on some of the talks that I most enjoyed. I’ll also come back and add links to the talk videos once they’ve been posted.

For those unfamiliar, RecSys focuses on all things related to recommender systems: models which (roughly speaking) predict what items a user will like. Not only are recommenders what I’ve been working on for the last couple of years, but it’s also a topic I find personally fascinating, and I hope you will too.

P.S. Some other folks also published their highlights of the conference. Check out Eugene Yan’s “Papers and Talks to Chew on”, Vicki Boykis’ RecSys 2021 Recap, Balázs Hidasi’s Impressions and Summary for more.

Practical issues

These talks focus on issues surrounding recommender systems in production and, as a result, come largely from industry.

A 20th-century print of two boys and a donkey by Julie de Geluk.

Twee jongens en een ezel (20th century) by Julie de Geluk, via the Rijksmuseum.

Jointly optimize capacity, latency, and engagement in large-scale recommendation systems

Presented by Hitesh Khandelwal from Facebook. Paper at ACM and Facebook Research.

Facebook has been in the news a lot lately for failing to prevent harms caused by their platform, even when their research shows that those harms are real. A user’s experience with Facebook (and Instagram) is heavily mediated through their interaction with a number of different recommender systems, which makes Facebook’s present situation an important case study for anyone interested in ethical issues surrounding recommender systems.

Facebook did give a talk on mitigating long-term effects of recommenders during the conference (they concluded that even weak preferences for borderline content would inevitably be amplified and that the best you can do is try to slow this process down) but that’s not paper I want to talk about.

This particular paper focuses on scale, something Facebook knows better than almost anyone. In particular, how can recommendation models behind Facebook Marketplace be optimized to perform well for their first-order task (presumably engagement, conversion, etc.) while reducing latency and the computational cost of serving over one billion users per month?

All in all, it’s eye-opening to see the amount of engineering that goes into delivering computationally-intensive recommendations at massive scale.

Recommendations at a reasonable scale in a (mostly) serverless and open stack

Presented by Jacopo Tagliabue from Coveo Labs. Paper and video at ACM, code on Github, a similar talk from the same author on YouTube, and a similar blog post from the same author on Towards Data Science.

A diagram of Coveo Labs' architecture for a reasonably-scaled recommender system.

We just saw how much technical muscle Facebook throws at their scale issues, but the chances are that you’re not Facebook, and you don’t need solutions which scale to billions of users.

Tagliabue defines “reasonable scale” as companies with a finite commodity cloud budget, dozens (not hundreds) of engineers, making 9 (not 10) figures per year, with terabyte (not petabyte) scale data. If this sounds like you, then the talk’s accompanying repo offers an architecture for a non-trivial recommender system, including data validation, deployment, monitoring, and orchestration, using open-source tools like dbt, Metaflow, and so on.

Regardless of whether you intend to use any of the architecture, Tagliabue’s principles are valuable. He stresses the importance of data quality and leveraging the many high-quality SaaS/PaaS offerings so you can avoid time-consuming infrastructure maintenance.

RecSysOps: Best practices for operating a large-scale recommender system

Presented by Ehsan Saberian from Netflix. Find the paper at ACM and a video of the talk on YouTube.

This talk was one of a few that stressed the importance of having a good ops backbone when building and maintaining recommender systems. “RecSysOps” is broken into issue detection, prediction, diagnosis, and resolution, all of which are easier said than done.

I particularly liked that a team could take on these suggestions in an iterative approach. For starters, implement best practices and end-to-end monitoring, with a focus on the types of issues that are important to your stakeholders. Once issue detection is (relatively) under control, you might try your hand at predicting future issues.

My favorite piece of advice was: “with every issue, make your RecSysOps better.” In other words, if detection was hard for this type of issue, try to predict it in the future. If resolution was hard, build a tool to automate the solution in the future.

Three talks on organizational issues

If “RecSysOps” is about the software practices which support recommender systems, these two talks are about the communication and cultural practices which support doing data science in a large organization:

Algorithmic advances

These talks are focused more on new ideas, methods, and algorithms, as well as new perspectives on issues related to recommender systems.

A 19th-century watercolor painting of a rowboat at sea by Frans Arnold Breuhaus de Groot.

Roeiboot op zee (19th century) by Frans Arnold Breuhaus de Groot, via the Rijksmuseum.

Top-k contextual bandits with equity of exposure

Presented by Oliver Jeunen for University of Antwerp. Paper at ACM and the author’s website, code on Github.

Olivier Jeunen and Bart Goethals won the Best Student Paper award for their other RecSys paper Pessimistic reward models for off-policy learning in recommendation, but I particularly liked this one on bandits with equitable exposure.

The key idea is that a small difference in the computed relevance between two items might mean a big difference in exposure if one item gets knocked out of the top k items shown in a module. You may have any number of reasons for wanting to ensure equitable exposure among your item population, which makes this a potential issue.

The paper’s exposure-aware arm selection (EARS) algorithm attempts to ensure that a probability of an item being clicked is proportional to its relevance and not unduly confounded by extraneous exposure effects. The math gets a bit above my pay-grade, but the experimental results on Deezer’s carousel dataset suggested that EARS improves equitable exposure with minimal impacts to expected reward. There’s even a hyperparameter to control the fairness-reward trade-off.

Cold start similar artists ranking with gravity-inspired graph autoencoders

Presented by Guillaume Salha-Galven from Deezer. Paper at ACM and Arxiv.

This paper tackles the task of generating a “similar artists” module for new artists on Deezer, who have metadata but few streams.

The authors formulate the problem as a weighted, directed graph of top-k similar artists, and they want to predict edges linking to the cold-start nodes. A graph autoencoder is used to learn graph embeddings from the topology of the graph and the artist features, but graph autoencoders are typically for undirected graphs, learning a metric space where d(x,y)=d(y,x) for any pair of items.

To adapt the method for their directed graph, they learn a vector and a mass for each node, where directed edges are modeled as accelerations due to “gravity” and popular artists end up having larger mass. This makes sense: the Beatles might be a top similar artist for many artists, but there’s a high bar to make it into the similar artists list for the Beatles. I thought this was a particularly intuitive and clever solution and according to the paper it beat a long list of baseline approaches.

Reverse maximum inner product search

Presented by Daichi Amagata from Osaka University. Paper at ACM and Arxiv.

Many recommender systems are variants of matrix factorization or other factorized models: you learn a vector for the user, a vector for the item, and the score for that pair is the dot-product of those vectors. We can take a user vector, multiply it by the entire item matrix, and rank the scores to get an ordered list of recommendations for that user.

But what if you’ve got a new item vector and you want to know which users would have that in their top k recommendations (to know how well a product might perform, or who to market it to)? You could generate the top recommendations for every user, but that’s expensive. This is the gist of the reverse maximum inner product search problem.

The paper proposes an algorithm for doing this quite quickly, without any approximation. I’ll leave the heavy-lifting to the paper, but the trick is to avoid computing the full top-k result set by building a data structure to support computing upper- and lower-bounds on the kth largest inner product for a user. The end result is miles faster than any other exact solution…mostly because the alternatives all involve solving the (forward) maximum inner product search problem for every user.

Given the prevalence of factorized models for recommendation, this seems like a very promising tool to have.


Presented by Gabriel de Souza Pereira Moreira from NVIDIA, with co-authors from NVIDIA and Facebook. Paper at ACM and Facebook Research, with a blog post on Medium and code on Github.

I am not an expert in NLP and I will not attempt to explain attention and transformers to you (though here’s a good walkthrough if you want one). If you’ve heard folks talking about the incredible things that BERT and GPT-3 can do, well the T in those names stands for transformers.

Transformers operate on sequential data, which is a natural fit for language (i.e. sequences of words), but has now been adapted for sequential recommendation contexts: given a sequence of user-item interactions, predict the next item that the user will interact with. Since transformers have quickly become the de facto backbone of the latest state of the art in language models, we shouldn’t be surprised to find that this paper boasts wins in the WSDM and SIGIR recommendation challenges.

Given the model performance, the pedigree of the NVIDIA and Facebook, and the fact that they offer Tensorflow and PyTorch implementations, I suspect that we’ll be seeing a lot of Transformers4Rec in new sequential recommenders and various deep learning architectures for recommendation.

Two talks on theory

Two of my favorite talks were both presented by Minmin Chen, from the Google Brain team.

Other talks of note

At the risk of outlining every single talk from the conference, below I list just a few more talks that all deserve a look for one reason or another.

A slide demonstrating a module of similar items with recommendation explanations drawn from attributes of the items.

I loved getting to hear a wide variety of perspectives on issues in recommendation and I’m looking forward to attending next year, ideally in person.