Abstract
Online advertising offers significantly finer granularity, which has been leveraged in state-of-the-art targeting methods, like Behavioral Targeting (BT). Such methods have been fur-ther complemented by recent work in Look-alike Modeling(LAM) which helps in creating models which are customized according to each advertiser’s requirements and each cam-paign’s characteristics, and which show ads to users who are most likely to convert on them, not just click them. In Look-a like Modeling given data about converters and non-converters, obtained from advertisers, we would like to train models automatically for each ad campaign. Such custom models would help target more users who are similar to the set of converters the advertiser provides. The advertisers get more freedom to define their preferred sets of users which should be used as a basis to build custom targeting models.In behavioral data, the number of conversions (positive class) per campaign is very small (conversions per impression for the advertisers in our data set are much less than 10−4),giving rise to a highly skewed training dataset, which has most records pertaining to the negative class. Campaigns with very few conversions are called as tail campaigns, and those with many conversions are called head campaigns.Creation of Look-alike Models for tail campaigns is very challenging and tricky using popular classifiers like Linear SVM and GBDT, because of the very few number of posi-tive class examples such campaigns contain. In this paper,we present an Associative Classification (AC) approach to LAM for tail campaigns. Pairs of features are used to deriverules to build a Rule-based Associative Classifier, with the rules being sorted by frequency-weighted log-likelihood ratio(F-LLR). The top krules, sorted by F-LLR, are then applied to any test record to score it. Individual features can also form rules by themselves, though the number of such rules in the top krules and the whole rule-set is very small. Our algorithm is based on Hadoop, and is thus very efficient in terms of speed.