Internet advertising industry is a big hit nowadays, and its importance in relation to the global economy is expected to grow in decades ahead. Meanwhile, the market is becoming overwhelmingly packed with competitors striving to maximize (or optimize) all the possible profit-related aspects of their operations. And probably the most important aspect of them all is how to ‘convince’ the internet users to click the displayed ads more often (well, at least when the pay-per-click revenue model is used).
Some science can actually be very helpful here. For example, in order to maximize a revenue for search engines, a very crucial task is the in-advance estimation of so-called click-through-rate, or CTR. In the best scenario, this parameter should be estimated for each advertisement. It would not be a problem for a simple website and only if the number of ads was limited to several tens, but not more: you could place the ad manually and observe related click numbers, then choose the most suitable ad. But in search engines basically nothing can be done without at least some level of automation. And here comes the tricky part: how to make a human-unsupervised decision about the efficiency of income from a particular ad?
To solve this problem, algorithms based on artificial neural networks could be used to predict CTR, say the authors of the paper recently published online on arXiv.org. Here, they propose a two-stage click prediction system which combines the artificial neural network approach with the existing framework of decision trees currently used at the Russian search engine Yandex.
According to the authors, this is a relatively new field of research, because most of modern search engines used machine learning-based approaches to accomplish the same task, including logistic regression method and boosted decision trees. However, applications of artificial neural networks (ANNs) in other fields of science show very promising results compared to previously mentioned techniques. The team argues that ANNs typically offer greater modeling strength, ability to ‘capture’ non-linear relationships between the input parameters, and also eliminate some of drawbacks that are characteristic to the currently used algorithms.
In order to construct the prediction system, the scientists chose to use the neural networks of the feed forward type. Since the sponsored search typically uses small textual advertisements displayed directly on the search page, the task of CTR prediction may seem not so complex at first. However, in reality there are quite many parameters in play and this makes it difficult to estimate the exact relationships between separate data inputs. Some of specific model parameters used to construct the prediction system were: user ID, keyword, search query, and advertisement ID parameters, including ad title, words content, position, etc.
The click-through logs of Yandex search engine were used as a data set consisting of approximately 6.6 million examples used to train, validate and test the ANN. The authors note, that it would be not feasible to directly input all the available data into the neural network. For this reason the data dimensionality was reduced by removing infrequent features from initial input parameters and by using hash function to reduce the data dimensionality even further.
The testing of the developed click-prediction system showed that the replacement of linear regression with ANN considerably improves prediction performance. The CTR prediction relevance was measured using statistical measure called precision/recall curve (PRC), calculated as area under the PRC curve. ANN-based implementation resulted in 5.57% better prediction relevance. This result was further improved by using the ensemble of 6 artificial neural networks (6.72%).
The authors say that the initial development of CTR prediction system based on ANNs demonstrated very promising results, and therefore the future research could be done by testing their development using the real-time data. A testing like that could allow to observe exact performance effects. Additional work is also needed to improve the performance of ANN system working with larger data sets and bigger number of input parameters
Written by Alius Noreika