Web 2.0 Authors: Carmen Gonzalez, Imran Akbar, Elizabeth White, Yeshim Deniz, Liz McMillan

Blog Feed Post

Applications of R at Google

At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google's internal R support list. But what are all these Google employees doing with R? A post from the Google Research team published on Google+ yesterday sheds some light: At Google we use Statistics every day to improve products, optimize infrastructure, and understand users. We’ve built a number of engineering systems to process and store massive amounts of data. These systems often use thousands of computers in parallel to process and manipulate the data. For many of our statisticians and data analysts, however, such systems provide only the first step of an interactive data analysis workflow that also involves filtering, classifying, modeling, visualizing, and forecasting quantitative data across all aspects of our business. R is the main Statistics language at Google, according to Karl Millar. Here are some of the specific applications of R at Google mentioned in the post: Large-scale parallel statistical forecasting in R is used to improve the effectiveness of online display advertising for Google's customers. The same framework is used to study the effectiveness of search advertising at Google, to reveal that search ads drive an additional 89% of web traffic (compared to organic search results alone). Google uses R for large-scale, computationally intensive forecasting in R (as presented in a talk at the R/Finance 2012 conference) Google uses an integration of R and FlumeJava to do very large-scale structured data analysis. (At his presentation at useR!2012 Karl Millar said such analyses are at the terabyte-scale today, and will be at the petabyte scale within two years.) This allows Googlers to do large-scale statistical analysis with code that "reads like R, and scales like Map-Reduce", and runs at 90% of the speed of hand-coding in JavaMR directly. (Karl will be talking about Scaling R to Internet Scale Data at the JSM 2012 conference.) Google participates in many R-related user conferences, user groups, and coding projects. To read the full Google+ post from the Google Research team, follow the link below. Google+: Research at Google

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid