Code and Literature Repository for Investigating Meta-Learning Algorithms in the Context of Recommender Systems
The main inspiration for this research is based on the work performed by the ADAPT group at the Trinity college in Dublin. Of special note for this project is the research outlined in the paper One-at-a-time: A Meta-Learning Recommender-System for Recommendation-Algorithm Selection on Micro Level.
The DonorsChoose.org dataset of past donations provides a big and feature-rich corpus of users and items. The dataset matches donors to projects in which they might be interested in and hence is intrinsically about recom-mendations. Due to the availability of detailed item-, user- and transaction-features, this corpus represents a suitable candidate for meta-learning ap-proaches to be tested. This study aims at providing an augmented corpus for further recommender systems studies to test and evaluate meta-learning ap-proaches. In the augmentation, metadata of collaborative and content-based filtering techniques is amended to the corpus. It is further extended with ag-gregated statistics for users and transactions and an exemplary meta-learning experiment. The performance in the learning subsystem is measured via the recall of recommended items in a Top-N test set. The augmented dataset and the source code are released into the public domain at GitHub:BeelGroup/Augmented-DonorsChoose.org-Dataset.
The following represents an organized list of names of columns. Each entry in the itemized transaction table provides a value in the listed columns. If the name of a column is deemed not self-explanatory, a short description is amended.
DonationID
DonationIncludedOptionalDonation
DonationReceivedDate
DonationReceivedDateDay
DonationReceivedDateDayOfWeek
DonationReceivedDateMonth
DonationReceivedDateTimeOfDay
DonationReceivedDateYear
DonorCartSequence
DonorID
ProjectID
DonationAmount
(Zip|City|Sate)IsEqual
i.e. whether user and item have identical values
DonorState
DonorCity
DonorZip
DonorIsTeacher
Concat[(DonorState|DonorCity|...)+]
i.e. the concatenated value of mentioned columns
SchoolID
ProjectGradeLevelCategory
ProjectResourceCategory
ProjectSubjectCategoryTree
ProjectSubjectSubcategoryTree
SchoolCity
SchoolMetroType
SchoolPercentageFreeLunch
SchoolState
SchoolZip
- Collaborative Filtering Techniques
AbsoluteErrorSKLearn-(KNN|SVD)
SquareErrorSKLearn-(KNN|SVD)
RecallAtPositionSKLearn-(KNN|SVD)
PredictionSKLearn-(KNN|SVD)
i.e. decomposition of the matrix or the interactions of the neighbor
- Content-based Filtering Techniques
RecallAtPosition(FastText|Tfidf)
Prediction(FastText|Tfidf)
i.e. cosine similarity of user profile and recommendation
- Collaborative recommendations for user-groups
AbsoluteErrorGroupBy[(DonorState|DonorCity|...)+]-SKLearn-SVD
SquareErrorGroupBy[(DonorState|DonorCity|...)+]-SKLearn-SVD
RecallAtPositionGroupBy[(DonorState|DonorCity|...)+]-SKLearn-SVD
PredictionGroupByDonor[(DonorState|DonorCity|...)+]-SKLearn-SVD
- General
isTest
i.e. whether the entry was used for testing during the holdout split
- Values aggregated by User
ValueCountsDonorID
i.e. number of transactionsValueCountsByUserProjectID
i.e. whether the user donated to popular projectsUserMean(DonationAmount|ProjectGradeLevelCategory|...)
MetaPrediction(BaggingRg|GradientBoostingRg|...)RecallAtPosition(SKLearn-SVD|FastText|...)
i.e. prediction of the error of the individual meta-learners in the error prediction stepMetaPrediction(BaggingRg|GradientBoostingRg|...)RecallAtPosition(SKLearn-SVD|FastText|...)
i.e. prediction if the suggested algorithm is selected via error predictionSubalgorithmPrediction(BaggingRg|GradientBoostingRg|...)RecallAtPosition
i.e. prediction of the class in the classification stepMetaSubalgorithmPrediction(BaggingRg|GradientBoostingRg|...)RecallAtPosition(SKLearn-SVD|FastText|...)
i.e. prediction if the suggested algorithm is selected via classificationSubalgorithmCategory
i.e. 'category' of the transaction if assigned to the best performing algorithm
This repository is the single source of truth for the whole scientific exploration of the augmentation and evaluation of the DonorsChoose.org dataset. In addition to the actual code needed for reproduction, the repository contains all relevant status updates. The dedicated folder for documentation is appropriately named doc
. The dataset may be stored in data
. Changes happening within this folder are ignored by the version control system. The main programs are learners.py
and meta-learners.py
with helper functions being outsource to recsys
. The first python-script is dedicated to creating a dataset augmented with results from various filtering techniques. It represents the learning subsystem and performs the computationally most expensive steps. The second python-script further augments the dataset and executes the meta-learning algorithms.
The learning subsystem is contained in the learners.py
script and is adaptable via the configuration file config.yml
. Most parameters should be self-explanatory. If deemed necessary, a small explanatory string is added. Most notably is the dictionary algorithms_args
which specifies all the algorithms from the learning subsystem which to execute on the data.
The final augmentation is performed in meta-learners.py
. The program's behavior can be configured via config-meta-learners.yml
. Its most important option is the dictionary describing the meta-algorithms which to execute on the dataset.
- Non-interactive plotting
import matplotlib as mpl
mpl.use('cairo')
import matplotlib.pyplot as plt
- Prettify plots
import seaborn as sns
sns.set_style('whitegrid')
- Suitable Aspect Ratio for Plots
mpl.rcParams['figure.figsize'] = 6.4, 3.2
- Enforce text rendering via LaTeX and mimic the font of the default matplotlib text
plt.rc('text', usetex=True)
mpl.rcParams['mathtext.fontset'] = 'custom'
mpl.rcParams['mathtext.rm'] = 'Bitstream Vera Sans'
mpl.rcParams['mathtext.it'] = 'Bitstream Vera Sans:italic'
mpl.rcParams['mathtext.bf'] = 'Bitstream Vera Sans:bold'
- Donated amount in bins on a logarithmic scale
items_orig = donations[['ProjectID', 'DonorID', 'DonationAmount']]
plt.figure()
plt.hist(items_orig['DonationAmount'], bins=np.logspace(np.log10(items_orig['DonationAmount'].min()), np.log10(items_orig['DonationAmount'].max()), num=28 + 1), histtype='step')
plt.gca().set_xscale('log')
plt.xlabel('Donated Amount')
plt.ylabel('#Occurrence')
plt.tight_layout()
plt.savefig('DonationAmount - Distribution of the donated amount on a logarithmic scale.pdf', bbox_inches='tight')
plt.close()
- Donated amount in bins on a logarithmic scale for clean subset
items_orig = donations.groupby(['DonorID', 'ProjectID'])['DonationAmount'].sum().reset_index()
# Perform preliminary data cleaning
items_orig = items_orig.drop(items_orig.query('0. <= DonationAmount <= 2.').index)
value_counts = items_orig['DonorID'].value_counts()
items_orig = items_orig[items_orig['DonorID'].isin(value_counts.index[value_counts >= 2])]
plt.figure()
plt.hist(items_orig['DonationAmount'], bins=np.logspace(np.log10(items_orig['DonationAmount'].min()), np.log10(items_orig['DonationAmount'].max()), num=13 + 1), density=True, histtype='step')
plt.gca().set_xscale('log')
plt.xlabel('Donated Amount')
plt.ylabel('Frequency')
plt.tight_layout()
plt.savefig('DonationAmount - Distribution of the donated amount on a logarithmic scale (for donors with at least 2 donations, excluding duplicates and low donations).pdf', bbox_inches='tight')
plt.close()
- Distribution of ratings
# Shrink size and enlarge font
mpl.rcParams['figure.figsize'][0] /= 1.3
plt.figure()
plt.grid(b=False, axis='x')
plt.hist(items['DonationAmount'], bins=5, density=True, histtype='step')
plt.xticks([1.45, 2.2, 3., 3.8, 4.6], np.arange(1, 5+1))
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.tight_layout()
plt.savefig('DonationAmount - Distribution of ratings for logarithmic bins and excluded outliers.pdf', bbox_inches='tight')
plt.close()
mpl.rcParams['figure.figsize'][0] *= 1.3
- Number of user donations
# Shrink size and enlarge font
mpl.rcParams['figure.figsize'][0] /= 1.3
plt.figure()
plt.grid(b=False, axis='x')
user_value_counts = items['DonorID'].value_counts()
# Disregard outliers
user_value_counts = user_value_counts[user_value_counts <= user_value_counts.mean() + user_value_counts.std()]
plt.hist(user_value_counts, bins=30, density=True, histtype='step')
plt.xlabel('Interactions per user')
plt.ylabel('Frequency')
plt.tight_layout()
plt.savefig('DonorID - Distribution of number of interactions per user.pdf', bbox_inches='tight')
plt.close()
mpl.rcParams['figure.figsize'][0] *= 1.3
- RMSE for collaborative filtering techniques
plt.figure()
plt.grid(b=False, axis='x')
algorithms_name = ['zero', 'mean', 'random', 'SKLearn-KNN', 'SKLearn-NMF', 'SKLearn-SVD', 'SciPy-SVD']
average_rmse = [np.sqrt(np.square(np.zeros(items.shape[0]) - items['DonationAmount']).mean()),
np.sqrt(np.square(np.full(items.shape[0], items['DonationAmount'].mean()) - items['DonationAmount']).mean()),
np.sqrt(np.square(np.random.uniform(low=min(items['DonationAmount']), high=max(items['DonationAmount']), size=items.shape[0]) - items['DonationAmount']).mean()),
np.sqrt(items['SquareErrorSKLearn-KNN'].mean()),
np.sqrt(items['SquareErrorSKLearn-NMF'].mean()),
np.sqrt(items['SquareErrorSKLearn-SVD'].mean()),
np.sqrt(items['SquareErrorSciPy-SVD'].mean())]
plt.errorbar(np.arange(len(average_rmse)), average_rmse, xerr=0.45, markersize=0., ls='none')
plt.xticks(np.arange(len(algorithms_name)), algorithms_name)
plt.xlabel('Algorithm')
plt.ylabel('Test RMSE')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Collaborative Filters - RMSE for DIY algorithms and some baselines.pdf', bbox_inches='tight')
plt.close()
- Recall@N for collaborative and content-based filters
plt.figure()
plt.grid(b=False, axis='x')
algorithms_name = ['SKLearn-KNN', 'SKLearn-NMF', 'SKLearn-SVD', 'SciPy-SVD', 'Tfidf']
algorithms_pretty_name = ['SKLearn-KNN', 'SKLearn-NMF', 'SKLearn-SVD', 'SciPy-SVD', 'SKLearn-TF-IDF']
average_recall = [items['RecallAtPosition' + alg_name].mean() for alg_name in algorithms_name]
plt.errorbar(np.arange(len(average_recall)), average_recall, xerr=0.45, markersize=0., ls='none')
plt.xticks(np.arange(len(algorithms_pretty_name)), algorithms_pretty_name)
plt.ylim(ymin=-1)
plt.xlabel('Algorithm')
plt.ylabel('Average position in Top-N test set')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Collaborative and Content-based Filters - Average position in Top-N test set for various algorithms.pdf', bbox_inches='tight')
plt.close()
plt.figure()
plt.grid(b=False, axis='x')
algorithms_name = ['SKLearn-KNN', 'SKLearn-SVD', 'Tfidf']
algorithms_pretty_name = ['SKLearn-KNN', 'SKLearn-SVD', 'SKLearn-TF-IDF']
plt.hist([items['RecallAtPosition' + alg_name] for alg_name in algorithms_name], bins=10, density=True, label=algorithms_pretty_name, histtype='step')
plt.legend(loc=9)
plt.xlabel('Position in Top-N test set')
plt.ylabel('Frequency')
plt.tight_layout()
plt.savefig('Collaborative and Content-based Filters - Distribution of position in Top-N test set for various algorithms.pdf', bbox_inches='tight')
plt.close()
- Learning subsystem Recall@N performance
plt.figure()
plt.grid(b=False, axis='x')
algorithms_name = ['SKLearn-KNN', 'SKLearn-SVD', 'Tfidf', 'FastText']
recall_pos = [items['RecallAtPosition' + alg_name].values for alg_name in algorithms_name] + [items[['RecallAtPosition' + alg_name for alg_name in algorithms_name]].min(axis=1).values]
algorithms_pretty_name = ['KNN', 'SVD', 'TF-IDF', 'FastText', 'Combined']
plt.boxplot(recall_pos, positions=np.arange(len(algorithms_pretty_name)), meanline=True, showmeans=True, showfliers=False)
plt.xticks(np.arange(len(algorithms_pretty_name)), algorithms_pretty_name)
plt.ylim(ymin=-1)
plt.xlabel('Algorithm')
plt.ylabel('Position in Top-N test set')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Learning subsystem - Position in Top-N test set for various algorithms.pdf', bbox_inches='tight')
plt.close()
plt.figure()
plt.grid(b=False, axis='x')
algorithms_name = ['SKLearn-KNN', 'SKLearn-SVD', 'Tfidf', 'FastText']
algorithms_pretty_name = {'SKLearn-KNN': 'KNN', 'SKLearn-SVD': 'SVD', 'Tfidf': 'TF-IDF', 'FastText': 'FastText'}
algorithms_value_counts = items[['RecallAtPosition' + alg_name for alg_name in algorithms_name]].idxmin(axis=1).value_counts().rename(dict(zip(['RecallAtPosition' + alg_name for alg_name in algorithms_name], algorithms_name))).to_dict()
plt.hist([items['RecallAtPosition' + alg_name] for alg_name in algorithms_name], bins=10, density=True, label=['{:<s} ({:<2.2%} overall best)'.format(algorithms_pretty_name[alg_name], algorithms_value_counts[alg_name] / items.shape[0]) for alg_name in algorithms_name], histtype='step')
plt.legend(loc=9)
plt.xlabel('Position in Top-N test set')
plt.ylabel('Frequency')
plt.tight_layout()
plt.savefig('Learning subsystem - Distribution of position in Top-N test set for various algorithms.pdf', bbox_inches='tight')
plt.close()
- Meta-learner performance for classification and error prediction
meta_subset = meta_items.loc[test_idx]
plt.figure()
plt.grid(b=False, axis='x')
meta_algorithms_name = [('Bagging', 'Bagging'), ('DecisionTree', 'DecisionTree'), ('BalancedDecisionTree', 'BalancedDTree'), ('GradientBoosting', 'GradientBoosting'), ('NeuralNetwork', 'NeuralNetwork')]
algorithm_selection_columns = [('MetaSubalgorithmPrediction', 'CL'), ('MetaPrediction', 'EP')]
meta_algorithms_column = np.array([[pre[0] + meta_alg_name[0] for pre in algorithm_selection_columns] for meta_alg_name in meta_algorithms_name]).flatten()
meta_algorithms_pretty_name = np.array([[pre[1] + ' ' + meta_alg_name[1] for pre in algorithm_selection_columns] for meta_alg_name in meta_algorithms_name]).flatten()
average_recall = [meta_subset[c].mean() for c in meta_algorithms_column]
plt.errorbar(np.arange(len(average_recall)), average_recall, color=np.array([[c for _ in range(len(algorithm_selection_columns))] for c in plt.rcParams['axes.prop_cycle'].by_key()['color'][:len(meta_algorithms_name)]]).flatten(), xerr=0.45, markersize=0., ls='none')
plt.axhline(y=meta_subset.lookup(meta_subset.index, meta_subset['SubalgorithmCategory']).mean(), color='orange', linestyle='--')
plt.xticks(np.arange(len(meta_algorithms_pretty_name)), meta_algorithms_pretty_name)
plt.ylim(ymin=-1)
plt.xlabel('Algorithm')
plt.ylabel('Average position in Top-N test set')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Meta-learner as Classifier and Error Predictor - Average position in Top-N test set for various meta-learner algorithms.pdf', bbox_inches='tight')
plt.close()
- Learning subsystem Recall@N performance with augmented filtering techniques
plt.figure()
plt.grid(b=False, axis='x')
algorithms_name = ['SKLearn-KNN', 'SKLearn-SVD', 'GroupByDonorStateCityZip-SKLearn-SVD', 'GroupByDonorStateCity-SKLearn-SVD', 'Tfidf', 'FastText']
recall_pos = [items['RecallAtPosition' + alg_name].values for alg_name in algorithms_name] + [items[['RecallAtPosition' + alg_name for alg_name in algorithms_name]].min(axis=1).values]
algorithms_value_counts = items[['RecallAtPosition' + alg_name for alg_name in algorithms_name]].idxmin(axis=1).value_counts().rename(dict(zip(['RecallAtPosition' + alg_name for alg_name in algorithms_name], algorithms_name))).to_dict()
algorithms_name = algorithms_name + ['Combined']
algorithms_value_counts['Combined'] = items.shape[0]
algorithms_pretty_name = {'SKLearn-KNN': 'KNN', 'SKLearn-SVD': 'SVD', 'GroupByDonorStateCityZip-SKLearn-SVD': 'SVD (State, City, Zip)', 'GroupByDonorStateCity-SKLearn-SVD': 'SVD (State, City)', 'Tfidf': 'TF-IDF', 'FastText': 'FastText', 'Combined': 'Combined'}
plt.boxplot(recall_pos, positions=np.arange(len(algorithms_pretty_name)), meanline=True, showmeans=True, showfliers=False)
# This got a little bit out of hand...
# Actually just the percentage of each algorithm's contribution in the combined best is printed in a smaller font below the algorithm's name
plt.xticks(np.arange(len(algorithms_pretty_name)), [r'{{\fontsize{{1em}}{{3em}}\selectfont{{}}{0:<s}}}{1}{{\fontsize{{0.8em}}{{3em}}\selectfont{{}}{2:<2.2f}\%}}'.format(algorithms_pretty_name[alg_name], '\n', 100 * algorithms_value_counts[alg_name] / items.shape[0]) for alg_name in algorithms_name])
plt.ylim(ymin=-1)
plt.xlabel('Algorithm')
plt.ylabel('Position in Top-N test set')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Learning subsystem - Position in Top-N test set for various algorithms with augmented filtering techniques.pdf', bbox_inches='tight')
plt.close()
- Learning subsystem Recall@N performance
plt.figure()
plt.grid(b=False, axis='x')
algorithms_name = ['SKLearn-KNN', 'SKLearn-SVD', 'Tfidf', 'FastText']
recall_pos = [items['RecallAtPosition' + alg_name].values for alg_name in algorithms_name]
algorithms_value_counts = items[['RecallAtPosition' + alg_name for alg_name in sorted(algorithms_name)]].idxmin(axis=1).value_counts().rename(dict(zip(['RecallAtPosition' + alg_name for alg_name in algorithms_name], algorithms_name))).to_dict()
algorithms_pretty_name = {'SKLearn-KNN': 'KNN', 'SKLearn-SVD': 'SVD', 'GroupByDonorStateCityZip-SKLearn-SVD': 'SVD (State, City, Zip)', 'GroupByDonorStateCity-SKLearn-SVD': 'SVD (State, City)', 'Tfidf': 'TF-IDF', 'FastText': 'FastText', 'Combined': 'Combined'}
plt.boxplot(recall_pos, positions=np.arange(len(algorithms_name)), meanline=True, showmeans=True, showfliers=False)
# This got a little bit out of hand...
# Actually just the percentage of each algorithm's contribution in the combined best is printed in a smaller font below the algorithm's name
plt.xticks(np.arange(len(algorithms_name)), [r'{{\fontsize{{1em}}{{3em}}\selectfont{{}}{0:<s}}}{1}{{\fontsize{{0.8em}}{{3em}}\selectfont{{}}{2:<2.2f}\%}}'.format(algorithms_pretty_name[alg_name], '\n', 100 * algorithms_value_counts[alg_name] / items.shape[0]) for alg_name in algorithms_name])
plt.ylim(ymin=-1)
plt.xlabel('Algorithm')
plt.ylabel('Position in Top-N test set')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Learning subsystem - Position in Top-N test set.pdf', bbox_inches='tight')
plt.close()
- Meta-learner performance for classification and error prediction with augmented learning subsystem filtering techniques
meta_subset = meta_items.loc[test_idx]
plt.figure()
plt.grid(b=False, axis='x')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
meta_algorithms_name = [('MetaSubalgorithmPredictionBaggingCl', 'CL Bagging', colors[0]), ('MetaPredictionBaggingRg', 'EP Bagging', colors[3]), ('MetaSubalgorithmPredictionDecisionTreeCl', 'CL Decision Tree', colors[0]), ('MetaPredictionDecisionTreeRg40', 'EP Decision Tree', colors[3]), ('MetaSubalgorithmPredictionUserClusterKMeans', 'User-Clustering', colors[0]), ('MetaPredictionGradientBoostingRg', 'EP Gradient Boosting', colors[3]), ('MetaSubalgorithmPredictionStackingDecisionTree', 'Stacking DTree', colors[4])]
average_recall = [meta_subset[c].mean() for c in list(zip(*meta_algorithms_name))[0]]
plt.errorbar(np.arange(len(average_recall)), average_recall, color=list(zip(*meta_algorithms_name))[2], xerr=0.45, markersize=0., ls='none')
plt.axhline(y=meta_subset[meta_subset['SubalgorithmCategory'].mode()[0]].mean(), color='orange', linestyle='--')
plt.axhline(y=meta_subset.lookup(meta_subset.index, meta_subset['SubalgorithmCategory']).mean(), color='orange', linestyle='-')
plt.xticks(np.arange(len(meta_algorithms_name)), list(zip(*meta_algorithms_name))[1])
plt.ylim(ymin=-1)
plt.xlabel('Algorithm')
plt.ylabel('Average position in Top-N test set')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Meta-learner as Classifier and Error Predictor - Average position in Top-N test set for various meta-learner algorithms with augmented learning subsystem filtering techniques.pdf', bbox_inches='tight')
plt.close()
- Meta-learner performance
meta_subset = meta_items.loc[test_idx]
plt.figure()
plt.grid(b=False, axis='x')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
meta_algorithms_name = [('MetaSubalgorithmPredictionDecisionTreeRg', 'Classifying DTree', colors[0]), ('MetaPredictionGradientBoostingRg', 'GB Recall Prediction', colors[1]), ('MetaSubalgorithmPredictionUserClusterKMeans', 'K-Means Clustering', colors[2]), ('MetaSubalgorithmPredictionStackingDecisionTree', 'Stacking DTree', colors[3])]
average_recall = [meta_subset[c].mean() for c in list(zip(*meta_algorithms_name))[0]]
plt.errorbar(np.arange(len(average_recall)), average_recall, color=list(zip(*meta_algorithms_name))[2], xerr=0.45, markersize=0., ls='none')
plt.axhline(y=meta_subset[meta_subset['SubalgorithmCategory'].mode()[0]].mean(), color='orange', linestyle='--')
plt.xticks(np.arange(len(meta_algorithms_name)), list(zip(*meta_algorithms_name))[1])
plt.ylim(ymin=meta_subset.lookup(meta_subset.index, meta_subset['SubalgorithmCategory']).mean()-1)
plt.xlabel('Algorithm')
plt.ylabel('Average position in Top-N test set')
plt.gcf().autofmt_xdate()
plt.tight_layout()
plt.savefig('Meta-learner Performance - Average position in Top-N test set for various meta-learner algorithms with augmented learning subsystem filtering techniques.pdf', bbox_inches='tight')
plt.close()
- Find a suitable dataset for meta-learning
- Candidates should provide information about the user, the item and about the context of each transaction
- Adequate sources might be kaggle, Google public datasets and previous RecSys challenges
- Evaluate existing software frameworks for their applicability as meta-feature generators
- Meta-feature algorithms should include collaborative, content based and possibly deep learning based approaches
- Suitable frameworks might be Tensorflow, scikit-learn and higher level libraries like Keras and scikit-surprise
- Train and compare various meta-learning models
- Predict either rating error or reformulate algorithm selection as classification problem
- Evaluate model using appropriate variables, possible candidates might be the normalized discounted cumulative gain or the Kendall rank correlation coefficient
- Decaying rating based on the date of the donation
- Use average algorithm with lowest overall error for each cluster in the user-clustering approach
- Algorithm Selection as ranking task using Meta-Learning