Rechercher dans ce blog

Affichage des articles dont le libellé est Soapbox. Afficher tous les articles
Affichage des articles dont le libellé est Soapbox. Afficher tous les articles

lundi 13 janvier 2014

How many more R-bloggers posts can I expect?

I noticed that the monthly number of posts on R-bloggers stopped increasing over the last year. Indeed, the last couple of months saw a decline in posts compared to the previous year. Thus, has most been said and written about R already?


Who knows? Well, I took a stab at looking into the future. However, I can tell you already that I am not convinced by my predictions. But maybe someone else will be inspired to take this work forward.

First, I have to get the data - that's easy, I can scrape the monthly post counts from the R-bloggers homepage.


Looking at the incremental and cumulative plots, and believing that eventually the number of R posts will decrease, I thought that a logistic growth function would provide a nice fit to the data and also give an asymptotic view of the total number of posts on R-bloggers.

Although the fit, see below, looks reasonable at first glance, I don't believe it provides a sensible prediction of the future. The model would forecast only another 1,269 post by the end of 2016 with not much more to expect after that. Indeed the asymptotic total number of posts K is only 14,396. I don't believe this can be right, not even as a proxy, when the current count of monthly posts is well above 100.



I played around with data and the logistic growth function a little further, using annual instead of monthly data, changing the time horizon and fixing K, yet without much success.

Eventually I recalled a talk by Rob Hyndman's about his forecast package. After all, I have a time series here. So, applying the forecast function to the incremental data provides a somewhat more realistic prediction of 2,695 posts for the next 12 months, but with an increasing trend in monthly posts for 2014, which I find hard to believe given the observations over the last year.



Well, I presented two models here: One predicts a rapid decline in monthly posts on R-bloggers, while the other forecasts an increase. Neither feels right to me. Of course time will tell, but have you got any ideas or views?

Session Info

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] forecast_4.8 xts_0.9-7 zoo_1.7-10 XML_3.95-0.2

loaded via a namespace (and not attached):
[1] colorspace_1.2-4 fracdiff_1.4-2
[3] grid_3.0.2 lattice_0.20-23
[5] nnet_7.3-7 parallel_3.0.2
[7] quadprog_1.5-5 Rcpp_0.10.6
[9] RcppArmadillo_0.3.920.1 tools_3.0.2
[11] tseries_0.10-32

How many more R-bloggers posts can I expect?

I noticed that the monthly number of posts on R-bloggers stopped increasing over the last year. Indeed, the last couple of months saw a decline in posts compared to the previous year. Thus, has most been said and written about R already?


Who knows? Well, I took a stab at looking into the future. However, I can tell you already that I am not convinced by my predictions. But maybe someone else will be inspired to take this work forward.

First, I have to get the data - that's easy, I can scrape the monthly post counts from the R-bloggers homepage.


Looking at the incremental and cumulative plots, and believing that eventually the number of R posts will decrease, I thought that a logistic growth function would provide a nice fit to the data and also give an asymptotic view of the total number of posts on R-bloggers.

Although the fit, see below, looks reasonable at first glance, I don't believe it provides a sensible prediction of the future. The model would forecast only another 1,269 post by the end of 2016 with not much more to expect after that. Indeed the asymptotic total number of posts K is only 14,396. I don't believe this can be right, not even as a proxy, when the current count of monthly posts is well above 100.



I played around with data and the logistic growth function a little further, using annual instead of monthly data, changing the time horizon and fixing K, yet without much success.

Eventually I recalled a talk by Rob Hyndman's about his forecast package. After all, I have a time series here. So, applying the forecast function to the incremental data provides a somewhat more realistic prediction of 2,695 posts for the next 12 months, but with an increasing trend in monthly posts for 2014, which I find hard to believe given the observations over the last year.



Well, I presented two models here: One predicts a rapid decline in monthly posts on R-bloggers, while the other forecasts an increase. Neither feels right to me. Of course time will tell, but have you got any ideas or views?

Session Info

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] forecast_4.8 xts_0.9-7 zoo_1.7-10 XML_3.95-0.2

loaded via a namespace (and not attached):
[1] colorspace_1.2-4 fracdiff_1.4-2
[3] grid_3.0.2 lattice_0.20-23
[5] nnet_7.3-7 parallel_3.0.2
[7] quadprog_1.5-5 Rcpp_0.10.6
[9] RcppArmadillo_0.3.920.1 tools_3.0.2
[11] tseries_0.10-32

mardi 7 janvier 2014

Whale charts - Visualising customer profitability

The Christmas and New Year's break is over, yet there is still time to return unwanted presents. Return to Santa was the title of an article in the Economist that highlighted the impact on online retailers, as return rates can be alarmingly high.

The article quotes a study by Christian Schulze of the Frankfurt School of Finance and Management, which analyses the return habits of customers who bought at least five items over a five year period from a large European online retailer. Although only a few figures are cited I will attempt to create a little model that replicates the customer behaviour and visualises the impact on overall profitability.

The study found that 5% of customers sent back more than 80% of the items they had bought; and that 1% of customers sent back at least 90% of their purchases. Or in other words 95% of customers send back less than 80% and 99% of customers send back less than 90%. To model this behaviour an S-shape curve seems appropriate, such as the logistic curve, as no-one can return more than they bought or less than nothing. With location and scale parameters m and s the logistic function can be fitted to the data, see the R code below.


The return rates do look quite high. However, if the products were shoes rather than books then I find them believable.

Additionally the article cites studies that suggest handling each returned item costs online sellers between $6 and $18, not to mention losses from items that are returned in unsaleable condition. Furthermore, without the cost of returns, online retailer's profits would be almost 50% higher.

Thus, to spin my toy model further, I assume 100 customers with revenues following an exponential distribution (?=1/250), the cost ratio of sold goods to be lognormal (?=-0.1, ?=0.1) and the cost of returns to follow a normal distribution with mean of $12 and standard deviation of $6.

In my simulation I could have made a profit of $1,979 instead of $1,441. Clearly the customers who return many items cause a real dent to my bottom line.

This situation is best visualised in what is often called a Whale Chart. Here I plot the cumulative profit against customers, with the most profitable customer on the left and the least profitable customer on the right. This chart shows me how much profit the first x number of customers generated. Often this graphs looks like a whale coming out of the water - hence its name.


In my little toy simulation I note that the first 20 most profitable customers would have generated more profit than the revenue of all customers. Indeed, profitability could have been 37% higher if it wasn't for loss making customers.

So, what shall I do? Manage my customers, know who I should reward and keep and whose loss wouldn't hurt at all. More customers are not the answer. I need more customers who return less.

R Code

Read more �

Whale charts - Visualising customer profitability

The Christmas and New Year's break is over, yet there is still time to return unwanted presents. Return to Santa was the title of an article in the Economist that highlighted the impact on online retailers, as return rates can be alarmingly high.

The article quotes a study by Christian Schulze of the Frankfurt School of Finance and Management, which analyses the return habits of customers who bought at least five items over a five year period from a large European online retailer. Although only a few figures are cited I will attempt to create a little model that replicates the customer behaviour and visualises the impact on overall profitability.

The study found that 5% of customers sent back more than 80% of the items they had bought; and that 1% of customers sent back at least 90% of their purchases. Or in other words 95% of customers send back less than 80% and 99% of customers send back less than 90%. To model this behaviour an S-shape curve seems appropriate, such as the logistic curve, as no-one can return more than they bought or less than nothing. With location and scale parameters m and s the logistic function can be fitted to the data, see the R code below.


The return rates do look quite high. However, if the products were shoes rather than books then I find them believable.

Additionally the article cites studies that suggest handling each returned item costs online sellers between $6 and $18, not to mention losses from items that are returned in unsaleable condition. Furthermore, without the cost of returns, online retailer's profits would be almost 50% higher.

Thus, to spin my toy model further, I assume 100 customers with revenues following an exponential distribution (?=1/250), the cost ratio of sold goods to be lognormal (?=-0.1, ?=0.1) and the cost of returns to follow a normal distribution with mean of $12 and standard deviation of $6.

In my simulation I could have made a profit of $1,979 instead of $1,441. Clearly the customers who return many items cause a real dent to my bottom line.

This situation is best visualised in what is often called a Whale Chart. Here I plot the cumulative profit against customers, with the most profitable customer on the left and the least profitable customer on the right. This chart shows me how much profit the first x number of customers generated. Often this graphs looks like a whale coming out of the water - hence its name.


In my little toy simulation I note that the first 20 most profitable customers would have generated more profit than the revenue of all customers. Indeed, profitability could have been 37% higher if it wasn't for loss making customers.

So, what shall I do? Manage my customers, know who I should reward and keep and whose loss wouldn't hurt at all. More customers are not the answer. I need more customers who return less.

R Code

Read more �