Rechercher dans ce blog

Affichage des articles dont le libellé est R. Afficher tous les articles
Affichage des articles dont le libellé est R. Afficher tous les articles

lundi 24 mars 2014

Sankey diagrams with googleVis

Sankey diagrams are great for visualising flows from one set of data values to another. Although named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 to show the energy efficiency of a steam engine, the best know Sankey diagram is probably Charles Minard's Map of Napoleon's Russian Campaign of 1812, which he actually produced in 1869.

Thomas Rahlf: Datendesign mit R

The above example from Thomas Rahlf's book Datendesign mit R shows that Minard's plot can be reproduced with base graphics in R. Aaron Berdanier posted in 2010 the SankeyR function and Erik Andrulis published the riverplot package on CRAN that allows users to create static Sankey charts as well.

Interactive Sankey diagram can be generated with rCharts and now also with googleVis (version >= 0.5.0). For my a first example I use UK visitor data from VisitBritain.org. The following diagram visualises the flow of visitors in 2012; where they came from and which parts of the UK they visited. This example illustrates the key concept already. I need a data frame with three columns that explains the flow of data from a source to a target and the strength or weight of the connection.




My next example uses a graph data set that I visualise in the same way again, but here I start to play around with the various parameters of the Google API.




As stated by Google, the Sankey chart may be undergoing substantial revisions in future Google Charts releases.

For more information and installation instructions see the googleVis project site and Google documentation.

Session Info

R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] googleVis_0.5.0-4 igraph_0.7.0

loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.0.3

Sankey diagrams with googleVis

Sankey diagrams are great for visualising flows from one set of data values to another. Although named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 to show the energy efficiency of a steam engine, the best know Sankey diagram is probably Charles Minard's Map of Napoleon's Russian Campaign of 1812, which he actually produced in 1869.

Thomas Rahlf: Datendesign mit R

The above example from Thomas Rahlf's book Datendesign mit R shows that Minard's plot can be reproduced with base graphics in R. Aaron Berdanier posted in 2010 the SankeyR function and Erik Andrulis published the riverplot package on CRAN that allows users to create static Sankey charts as well.

Interactive Sankey diagram can be generated with rCharts and now also with googleVis (version >= 0.5.0). For my a first example I use UK visitor data from VisitBritain.org. The following diagram visualises the flow of visitors in 2012; where they came from and which parts of the UK they visited. This example illustrates the key concept already. I need a data frame with three columns that explains the flow of data from a source to a target and the strength or weight of the connection.




My next example uses a graph data set that I visualise in the same way again, but here I start to play around with the various parameters of the Google API.




As stated by Google, the Sankey chart may be undergoing substantial revisions in future Google Charts releases.

For more information and installation instructions see the googleVis project site and Google documentation.

Session Info

R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] googleVis_0.5.0-4 igraph_0.7.0

loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.0.3

dimanche 23 mars 2014

Reminder: Abstract submission for the 2014 'R in Insurance' conference will close this Friday

Don't forget, this is the final week you can submit an abstract for the second R in Insurance conference.
For more details see http://www.rininsurance.com and perhaps for inspiration review last year's programme.

Reminder: Abstract submission for the 2014 'R in Insurance' conference will close this Friday

Don't forget, this is the final week you can submit an abstract for the second R in Insurance conference.
For more details see http://www.rininsurance.com and perhaps for inspiration review last year's programme.

mardi 18 mars 2014

Timeline charts with googleVis

Last year at the Google I/O conference Mitchell Foley presented new developments of the Google Chart Tools API and one of the new features he mentioned were timeline charts (about 6 min into the talk).



Timeline charts are a great way of visualising different dates/events over time and are now also supported by googleVis from version 0.5.0 onwards (currently only available from GitHub). Here is an example, showing classroom allocation in the afternoon. The exact times and durations are given when you hover over the bars.



I can swap around the bar and row labels to show the rooms by languages:



Here is another example, inspired by Jason Bryer's timeline package, showing the US presidents and UK prime ministers during World War II. For gvisTimeline I have to remove the line breaks in Jason's data.



And finally a more complex example from the Google Chart Tools API documentation showing the terms of the first US presidents with various options set to change the colours and fonts:
Read more �

Timeline charts with googleVis

Last year at the Google I/O conference Mitchell Foley presented new developments of the Google Chart Tools API and one of the new features he mentioned were timeline charts (about 6 min into the talk).



Timeline charts are a great way of visualising different dates/events over time and are now also supported by googleVis from version 0.5.0 onwards (currently only available from GitHub). Here is an example, showing classroom allocation in the afternoon. The exact times and durations are given when you hover over the bars.



I can swap around the bar and row labels to show the rooms by languages:



Here is another example, inspired by Jason Bryer's timeline package, showing the US presidents and UK prime ministers during World War II. For gvisTimeline I have to remove the line breaks in Jason's data.



And finally a more complex example from the Google Chart Tools API documentation showing the terms of the first US presidents with various options set to change the colours and fonts:
Read more �

lundi 10 mars 2014

googleVis code development moved to GitHub

After nearly 4 years of developing googleVis on Google Code with SVN we decided to move to GitHub. The main reason was that Google stopped the facility of hosting pre-CRAN builds of the package for user testing. The devtools package on the other hand makes it really easy to install packages from source hosted on GitHub. Additionally, we hope that GitHub will make collaboration with others more effective. Thus, bookmark http://github.com/mages/googleVis.

Screen shot of some of the new features in googleVis 0.5.0-1.

There are some exciting new features in the development version of 0.5.0-1 of googleVis, reflecting the enhanced Google Chart Tools API:

New Features

  • New functions gvisSankey, gvisAnnotationChart, gvisHistogram, gvisCalendar and gvisTimeline to support the new Google charts of the same names (without 'gvis').
  • New demo Trendlines showing how trend-lines can be added to Scatter-, Bar-, Column-, and Line Charts.
  • New demo Roles showing how different column roles can be used in core charts to highlight data.
  • New vignettes written in R Markdown showcasing googleVis examples and how the package works with knitr.

Changes

  • The help files of gvis charts no longer show all their options, instead a link to the online Google API documentation is given.
  • All googleVis output will be displayed in your default browser. In previous versions of googleVis output could also be displayed in the preview pane of RStudio. This feature is no longer available with the current version of RStudio, but is likely to be introduced again with the release of RStudio version 0.99 or higher.

I will post about the new features and changes in the coming weeks. Please feel free to test the development version already. Visit our GitHub project page for installation instructions and further details.

For the impatient (you will require R >= 3.0.2):
install.packages(c("devtools","RJSONIO", "knitr", "shiny", "httpuv"))
library(devtools)
install_github("mages/googleVis")

googleVis code development moved to GitHub

After nearly 4 years of developing googleVis on Google Code with SVN we decided to move to GitHub. The main reason was that Google stopped the facility of hosting pre-CRAN builds of the package for user testing. The devtools package on the other hand makes it really easy to install packages from source hosted on GitHub. Additionally, we hope that GitHub will make collaboration with others more effective. Thus, bookmark http://github.com/mages/googleVis.

Screen shot of some of the new features in googleVis 0.5.0-1.

There are some exciting new features in the development version of 0.5.0-1 of googleVis, reflecting the enhanced Google Chart Tools API:

New Features

  • New functions gvisSankey, gvisAnnotationChart, gvisHistogram, gvisCalendar and gvisTimeline to support the new Google charts of the same names (without 'gvis').
  • New demo Trendlines showing how trend-lines can be added to Scatter-, Bar-, Column-, and Line Charts.
  • New demo Roles showing how different column roles can be used in core charts to highlight data.
  • New vignettes written in R Markdown showcasing googleVis examples and how the package works with knitr.

Changes

  • The help files of gvis charts no longer show all their options, instead a link to the online Google API documentation is given.
  • All googleVis output will be displayed in your default browser. In previous versions of googleVis output could also be displayed in the preview pane of RStudio. This feature is no longer available with the current version of RStudio, but is likely to be introduced again with the release of RStudio version 0.99 or higher.

I will post about the new features and changes in the coming weeks. Please feel free to test the development version already. Visit our GitHub project page for installation instructions and further details.

For the impatient (you will require R >= 3.0.2):
install.packages(c("devtools","RJSONIO", "knitr", "shiny", "httpuv"))
library(devtools)
install_github("mages/googleVis")

lundi 3 mars 2014

Review: K�lner R Meeting 26 Feburary 2014

Last week's Cologne R user group meeting was all about R and databases. We had three talks from a generic overview on how to connect R to databases, to a specific example with kdb+ and perhaps the future with ArangoDB, a NoSQL database.

Connecting R with databases

Diego de Castillo's talk focused on the use of relational databases, such as PostgreSQL, SQLite and Oracle. For all these databases dedicated R drivers exist on CRAN that can be used in a generic way via the DBI package. This allows for a consistent approach to connect, query and return data to R. A popular alternative on Windows to the DBI framework is the use of the ODBC (Open Database Connectivity) API via RODBC or RJDBC.


R and kdb+

Kim Kuen Tang gave an overview of kdb+, a proprietary database that appears to be popular for time series data. kdb+ comes with its own expressive query language, q. Kim demonstrated how he could analyse large amount of stock market data stored in a kdb+ database using R and q all via sublime.

ArangoDB

Michael Hackstein and Claudius Weinberger introduced us to ArangoDB, a NoSQL (Not only SQL) database. ArangoDB is an open source document database. This means that data is stored as documents, which are similar to JavaScript objects, in so-called "collections". Their slides presented nicely the different concepts outside the traditional relational databases, such as key values stores, document stores and graph data. Claudius mentioned that they had received several requests from users who wanted to connect R to ArangoDB. Although a native driver does not exist for R yet, ArangoDB can be accessed by R using the HTTP-API via the packages bitops, RCurl and RJSONIO.


Next K�lner R meeting

The next meeting is scheduled for 23 May 2014. This will be our 10th meeting, clearly something we need to celebrate!

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Wei� for hosting the event and Revolution Analytics for their sponsorship.

Review: K�lner R Meeting 26 Feburary 2014

Last week's Cologne R user group meeting was all about R and databases. We had three talks from a generic overview on how to connect R to databases, to a specific example with kdb+ and perhaps the future with ArangoDB, a NoSQL database.

Connecting R with databases

Diego de Castillo's talk focused on the use of relational databases, such as PostgreSQL, SQLite and Oracle. For all these databases dedicated R drivers exist on CRAN that can be used in a generic way via the DBI package. This allows for a consistent approach to connect, query and return data to R. A popular alternative on Windows to the DBI framework is the use of the ODBC (Open Database Connectivity) API via RODBC or RJDBC.


R and kdb+

Kim Kuen Tang gave an overview of kdb+, a proprietary database that appears to be popular for time series data. kdb+ comes with its own expressive query language, q. Kim demonstrated how he could analyse large amount of stock market data stored in a kdb+ database using R and q all via sublime.

ArangoDB

Michael Hackstein and Claudius Weinberger introduced us to ArangoDB, a NoSQL (Not only SQL) database. ArangoDB is an open source document database. This means that data is stored as documents, which are similar to JavaScript objects, in so-called "collections". Their slides presented nicely the different concepts outside the traditional relational databases, such as key values stores, document stores and graph data. Claudius mentioned that they had received several requests from users who wanted to connect R to ArangoDB. Although a native driver does not exist for R yet, ArangoDB can be accessed by R using the HTTP-API via the packages bitops, RCurl and RJSONIO.


Next K�lner R meeting

The next meeting is scheduled for 23 May 2014. This will be our 10th meeting, clearly something we need to celebrate!

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Wei� for hosting the event and Revolution Analytics for their sponsorship.

lundi 24 février 2014

Next K�lner R User Meeting: 26 February 2014

The next Cologne R user group meeting is scheduled for tomorrow, 26 February 2014. We are delighted to welcome:
  • Diego de Castillo: R and databases
  • Kim Kuen Tang: Hands on using R and kdb+ together
  • Frank Celler: ArangoDB (Lightning Talk)
Further details and the agenda are available on our K�lnRUG Meetup site.

Please sign up if you would like to come along. Notes from past meetings are available here.


The organisers, Bernd Wei� and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their vector programme.


View Larger Map

Next K�lner R User Meeting: 26 February 2014

The next Cologne R user group meeting is scheduled for tomorrow, 26 February 2014. We are delighted to welcome:
  • Diego de Castillo: R and databases
  • Kim Kuen Tang: Hands on using R and kdb+ together
  • Frank Celler: ArangoDB (Lightning Talk)
Further details and the agenda are available on our K�lnRUG Meetup site.

Please sign up if you would like to come along. Notes from past meetings are available here.


The organisers, Bernd Wei� and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their vector programme.


View Larger Map

mercredi 19 février 2014

R in Insurance 2014 Conference Poster

Here is the poster for the 2nd R in Insurance conference on Monday 14 July 2014 at Cass Business School in London:

R in Insurance 2014 conference poster. Download PDF version

Important dead lines to keep in mind:
For all further information see: www.rininsurance.com.

The programme and the presentation files of the first R in Insurance conference have been published on GitHub.

R in Insurance 2014 Conference Poster

Here is the poster for the 2nd R in Insurance conference on Monday 14 July 2014 at Cass Business School in London:

R in Insurance 2014 conference poster. Download PDF version

Important dead lines to keep in mind:
For all further information see: www.rininsurance.com.

The programme and the presentation files of the first R in Insurance conference have been published on GitHub.

lundi 17 février 2014

Adding labels within lattice panels by group

The other day I had data that showed the development of many products over time. I grouped the products into categories and visualised the data as line graphs in lattice. But instead of adding an extensive legend to the plot I wanted to add labels to each line's latest point. How do you do that? It turns out that panel.groups is there to help again.

Here is my solution:

R code

Read more �

Adding labels within lattice panels by group

The other day I had data that showed the development of many products over time. I grouped the products into categories and visualised the data as line graphs in lattice. But instead of adding an extensive legend to the plot I wanted to add labels to each line's latest point. How do you do that? It turns out that panel.groups is there to help again.

Here is my solution:

R code

Read more �

mercredi 12 février 2014

Registration for the 2014 'R in Insurance' conference has opened


The registration for the second conference on R in Insurance on Monday 14 July 2014 at Cass Business School in London has opened.

This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered may include actuarial statistics, capital modelling, pricing, reserving, reinsurance and extreme events, portfolio allocation, advanced risk tools, high-performance computing, econometrics and more. All topics will be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance.

Invited talks will be given by:
  • Arthur Charpentier, D�partement de math�matiques Universit� du Qu�bec � Montr�al
  • Montserrat Guillen, Dept. Econometrics University of Barcelona together with Leo Guelman, Royal Bank of Canada (RBC Insurance division)
Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

We invite you to submit a one-page abstract for consideration. Both academic and practitioner proposals related to R are encouraged. The submission deadline for abstracts is 28 March 2014.

Details about the registration and abstract submission are given on the dedicated R in Insurance page at Cass Business School.

Sponsors

The organisers, Andreas Tsanakas and Markus Gesmann, gratefully acknowledge the sponsorship of Mango Solutions, Cybaea, PwC and RStudio.



Last year's programme, abstracts and talks are available online.

Registration for the 2014 'R in Insurance' conference has opened


The registration for the second conference on R in Insurance on Monday 14 July 2014 at Cass Business School in London has opened.

This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered may include actuarial statistics, capital modelling, pricing, reserving, reinsurance and extreme events, portfolio allocation, advanced risk tools, high-performance computing, econometrics and more. All topics will be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance.

Invited talks will be given by:
  • Arthur Charpentier, D�partement de math�matiques Universit� du Qu�bec � Montr�al
  • Montserrat Guillen, Dept. Econometrics University of Barcelona together with Leo Guelman, Royal Bank of Canada (RBC Insurance division)
Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

We invite you to submit a one-page abstract for consideration. Both academic and practitioner proposals related to R are encouraged. The submission deadline for abstracts is 28 March 2014.

Details about the registration and abstract submission are given on the dedicated R in Insurance page at Cass Business School.

Sponsors

The organisers, Andreas Tsanakas and Markus Gesmann, gratefully acknowledge the sponsorship of Mango Solutions, Cybaea, PwC and RStudio.



Last year's programme, abstracts and talks are available online.

lundi 3 février 2014

Does sexual activity change with age?

Recently the Guardian's Data Blog reported about the results from the third National Survey of Sexual Attitudes and Lifestyles in the UK. One of the questions asked in the survey was if the participants had sex in the last four weeks. The results - a summary is available in this info graphic - show that the British have their most sexual active period when they are in their 20s - 40s.

The article ended with the Guardian asking its readers to answer the same question over the course of a week. Last Friday they published some high level numbers in a follow-up post. Of course there are many things you may criticise about their survey, e.g. it isn't randomised. However, the data provide a nice little example to get familiar with the prop.test function in R, to test if the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.

Here are the data and a first plot:



Although the numbers of responses vary a lot between age groups, the proportion of those who answered with 'Yes' look more similar:


The function prop.test allows me to test, if the proportions of those who said 'Yes' are the same between different age groups.

Running the test across all age groups shows that if the hypothesis were true, then it would be very unlikely to observe the data by chance; the p-value is less than 2.2e-16. Unlike when I compare the age groups of 25-34 and 35-44 years old. Here the p-value is 11.4% and hence I might accept that the behaviours of the two groups are similar (61% vs. 59%). Adding the next age group on the other hand suggests that the three groups are less likely to have the same proportion (p-value is 0.2%). Still, when I compare the groups of 35-44 and 45-54 years old, then I might accept again that they have a similar sexual activity (p-value 8%).


What do I make of this? Well, most will not notice a change of their sexual activity on a day to day basis. Only when they look back over the decades they will notice a significant change. No surprise there, ageing is a slow process.

Session Info

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] latticeExtra_0.6-26 lattice_0.20-24 RColorBrewer_1.0-5

loaded via a namespace (and not attached):
[1] grid_3.0.2 tools_3.0.2

Does sexual activity change with age?

Recently the Guardian's Data Blog reported about the results from the third National Survey of Sexual Attitudes and Lifestyles in the UK. One of the questions asked in the survey was if the participants had sex in the last four weeks. The results - a summary is available in this info graphic - show that the British have their most sexual active period when they are in their 20s - 40s.

The article ended with the Guardian asking its readers to answer the same question over the course of a week. Last Friday they published some high level numbers in a follow-up post. Of course there are many things you may criticise about their survey, e.g. it isn't randomised. However, the data provide a nice little example to get familiar with the prop.test function in R, to test if the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.

Here are the data and a first plot:



Although the numbers of responses vary a lot between age groups, the proportion of those who answered with 'Yes' look more similar:


The function prop.test allows me to test, if the proportions of those who said 'Yes' are the same between different age groups.

Running the test across all age groups shows that if the hypothesis were true, then it would be very unlikely to observe the data by chance; the p-value is less than 2.2e-16. Unlike when I compare the age groups of 25-34 and 35-44 years old. Here the p-value is 11.4% and hence I might accept that the behaviours of the two groups are similar (61% vs. 59%). Adding the next age group on the other hand suggests that the three groups are less likely to have the same proportion (p-value is 0.2%). Still, when I compare the groups of 35-44 and 45-54 years old, then I might accept again that they have a similar sexual activity (p-value 8%).


What do I make of this? Well, most will not notice a change of their sexual activity on a day to day basis. Only when they look back over the decades they will notice a significant change. No surprise there, ageing is a slow process.

Session Info

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] latticeExtra_0.6-26 lattice_0.20-24 RColorBrewer_1.0-5

loaded via a namespace (and not attached):
[1] grid_3.0.2 tools_3.0.2