Let us all Welcome 2014
With a Bright Smile and Happiness
Share Good deeds to everyone
Happy New Year to all of you!
data.table
package. He was joined by his collaborator Arun Srinivasan, who is based in Cologne. Their talk was followed by Thomas Rahlf on Datendesign mit R (Data design with R).data.table
![]() |
Download slides |
data.table
package is to reduce times; time to write code and to execute code. His talk illustrated how the syntax of data.table
, not unlike SQL, can produce shorter and more readable code that at the same time provides an efficient and fast way to analyse big in memory data sets with R. Arun presented on new developments in data.table 1.8.11
, which not only fixes bugs but adds many new features such as melt/cast and further speed gains. data.table
rocks. For more details see the data.table
home page.![]() |
Thomas Rahlf: Datendesign mit R |
grid
or any add-ons such as lattice
or ggplot2
. The Luxus Schnitzel. Photo by G�nter Faes |
data.table
package. He was joined by his collaborator Arun Srinivasan, who is based in Cologne. Their talk was followed by Thomas Rahlf on Datendesign mit R (Data design with R).data.table
![]() |
Download slides |
data.table
package is to reduce times; time to write code and to execute code. His talk illustrated how the syntax of data.table
, not unlike SQL, can produce shorter and more readable code that at the same time provides an efficient and fast way to analyse big in memory data sets with R. Arun presented on new developments in data.table 1.8.11
, which not only fixes bugs but adds many new features such as melt/cast and further speed gains. data.table
rocks. For more details see the data.table
home page.![]() |
Thomas Rahlf: Datendesign mit R |
grid
or any add-ons such as lattice
or ggplot2
. The Luxus Schnitzel. Photo by G�nter Faes |
data.table
data.table
0.75*(0*0.7 + 1*0.2 + 2*0.1) + 0.25*(0*0.5 + 1*0.3 + 2*0.2) = 0.475
Now a customer of two years asks for his renewal. Suppose he had no claims in the first year and one claim last year. How many claims should I predict for next year? Or in other words, how much credibility should I give him?gRain
package [2]. I start with the contingency probability tables for the driver type and the conditional probabilities for 0, 1 and 2 claims in year 1 and 2. As I assume independence between the years I set the same probabilities. I can now review my model as a mosaic plot (above) and as a graph (below) as well.R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rgraphviz_2.6.0 gRain_1.2-2 gRbase_1.6-12 graph_1.40.0
loaded via a namespace (and not attached):
[1] BiocGenerics_0.8.0 igraph_0.6.6 lattice_0.20-24 Matrix_1.1-0
[5] parallel_3.0.2 RBGL_1.38.0 stats4_3.0.2 tools_3.0.2
0.75*(0*0.7 + 1*0.2 + 2*0.1) + 0.25*(0*0.5 + 1*0.3 + 2*0.2) = 0.475
Now a customer of two years asks for his renewal. Suppose he had no claims in the first year and one claim last year. How many claims should I predict for next year? Or in other words, how much credibility should I give him?gRain
package [2]. I start with the contingency probability tables for the driver type and the conditional probabilities for 0, 1 and 2 claims in year 1 and 2. As I assume independence between the years I set the same probabilities. I can now review my model as a mosaic plot (above) and as a graph (below) as well.R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rgraphviz_2.6.0 gRain_1.2-2 gRbase_1.6-12 graph_1.40.0
loaded via a namespace (and not attached):
[1] BiocGenerics_0.8.0 igraph_0.6.6 lattice_0.20-24 Matrix_1.1-0
[5] parallel_3.0.2 RBGL_1.38.0 stats4_3.0.2 tools_3.0.2
'googleVis.viewer'
that controls the default output of the googleVis plot method. On package load it is set to getOption("viewer")
and if you use RStudio, then its viewer pane will be used for displaying non-Flash and un-merged charts. You can set options("googleVis.viewer" = NULL)
and the googleVis plot function will open all output in the default browser again. Thanks to J.J. from RStudio for the tip. Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] googleVis_0.4.7 XML_3.95-0.2
loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.0.2
'googleVis.viewer'
that controls the default output of the googleVis plot method. On package load it is set to getOption("viewer")
and if you use RStudio, then its viewer pane will be used for displaying non-Flash and un-merged charts. You can set options("googleVis.viewer" = NULL)
and the googleVis plot function will open all output in the default browser again. Thanks to J.J. from RStudio for the tip. Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] googleVis_0.4.7 XML_3.95-0.2
loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.0.2
viewer
. Set options("viewer"=NULL)
and googleVis will plot all output in the browser again.renderGvis
help page of googleVis. For more information about the new viewer pane see the online RStudio documentation. R Under development (unstable) (2013-10-25 r64109)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] googleVis_0.4.6
loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.1.0
viewer
. Set options("viewer"=NULL)
and googleVis will plot all output in the browser again.renderGvis
help page of googleVis. For more information about the new viewer pane see the online RStudio documentation. R Under development (unstable) (2013-10-25 r64109)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] googleVis_0.4.6
loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.1.0
x <- rnorm(100000)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
Saving the plot as a PDF creates a 5.2 MB big file on my computer, while the PNG output is only 62 KB instead. Of course, the PNG doesn't look as crisp as the PDF file. png("100kPoints72dpi.png", units = "px", width=400, height=400)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
png("100kHighRes150dpi.png", units="px", width=400, height=400, res=150)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
png("100kHighRes150dpi2.png", units="px", width=800, height=800, res=150)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
png("100kHighRes300dpi.png", units="px", width=1600, height=1600, res=300)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
x <- rnorm(100000)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
Saving the plot as a PDF creates a 5.2 MB big file on my computer, while the PNG output is only 62 KB instead. Of course, the PNG doesn't look as crisp as the PDF file. png("100kPoints72dpi.png", units = "px", width=400, height=400)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
png("100kHighRes150dpi.png", units="px", width=400, height=400, res=150)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
png("100kHighRes150dpi2.png", units="px", width=800, height=800, res=150)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
png("100kHighRes300dpi.png", units="px", width=1600, height=1600, res=300)
plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
dev.off()
apply
family of functions in R is incredible powerful, yet for newcomers often somewhat mysterious. Thus, Bernd gave an overview of the different apply functions and their cousins. The various functions differ in their object inputs, e.g. vectors, arrays, data frames or lists, and their outputs. Other related functions are by
, aggregate
and ave
. While functions like aggregate
reduce the output size, others like ave
will return as many rows as the input object and repeat the results where necessary. **ply
functions of the plyr
package. The function names are certainly easier to remember, but their syntax can be a little awkward (.()). Bernd's slides, in German, are already available from our Meetup site. read.csv
and write.csv
in R. However, if you have a spreadsheet with multiple tabs and formatted numbers, read.csv
becomes clumsy, as you would have to save each tab without any formatting in separate files. XLConnect
as an alternative to read.csv
or indeed RODBC
for reading spreadsheet data. It uses the Apache POI API as the underlying interface. XLConnect
requires a Java runtime environment on your computer, but no installation of Excel. That makes it a true platform independent solution to exchange data with spreadsheets and R. Not only can you read defined rows and columns from Excel into R, or indeed named ranges, but in the same way data can be stored in Excel files again and to top it all - also graphic output from R.apply
family of functions in R is incredible powerful, yet for newcomers often somewhat mysterious. Thus, Bernd gave an overview of the different apply functions and their cousins. The various functions differ in their object inputs, e.g. vectors, arrays, data frames or lists, and their outputs. Other related functions are by
, aggregate
and ave
. While functions like aggregate
reduce the output size, others like ave
will return as many rows as the input object and repeat the results where necessary. **ply
functions of the plyr
package. The function names are certainly easier to remember, but their syntax can be a little awkward (.()). Bernd's slides, in German, are already available from our Meetup site. read.csv
and write.csv
in R. However, if you have a spreadsheet with multiple tabs and formatted numbers, read.csv
becomes clumsy, as you would have to save each tab without any formatting in separate files. XLConnect
as an alternative to read.csv
or indeed RODBC
for reading spreadsheet data. It uses the Apache POI API as the underlying interface. XLConnect
requires a Java runtime environment on your computer, but no installation of Excel. That makes it a true platform independent solution to exchange data with spreadsheets and R. Not only can you read defined rows and columns from Excel into R, or indeed named ranges, but in the same way data can be stored in Excel files again and to top it all - also graphic output from R.