I wrote the tutorial below, because it is what I wish that I had learned first about R in AML. I tried to use Microsoft’s tutorials on R in AML, but I found them to be incomprehensible. Probably because Microsoft’s tutorials are written from a very “machine learning-centric” worldview. I’ve grown up in modeling working across disciplines. Ph.D. in marketing, with Ph.D. minors in electrical engineering, econometrics, statistics, and evolutionary ecology. And my experience is that modeling is very balkanized. Each discipline is insulated from the others. Every discipline does “modeling” but what each discipline means by modeling is distinct and separately evolved from other modeling disciplines.
To be sure, there is convergent evolution, where each discipline separately evolves essentially the same basic approach across disciplines. But, where traditions diverge, attempts to communicate approach incomprehensibility. Think of FINDING NEMO when the dad clownfish is listening to squirt on vortex exiting technique (26 seconds into the clip) and says “You know you’re really cute, but I don’t know what you are saying!”
So this tutorial is an attempt to share my big discovery that Azure Machine Learning has an amazingly powerful, free, R implementation. And to share my discovery step by step so that other non-machine-learning modelers can gain access to an amazing free extension of free R. Basically, AML adds hardware and packaging around R to give you access to a huge machine, and for free. Did I remember to mention that AML is free?
Hope you enjoy!
How to do R in Azure Machine Learning – for humans:
- Log in to studio.azureml.net
- Create a blank experiment (create a free account if you don’t already have a free account).
- Click on the + sign at the lower left of Azure Machine Learning Studio
- Click on “Blank Experiment
- Take a good look at the Azure Machine Learning “Canvas”. As soon as you drag any model building blocks to the canvas, this nice dotted-line template will disappear
- Start with a data set like the MPG data built in data set. Type MPG in the upper left dialog box and then drag the data set to the canvas
- Next, backspace over “mpg” in the AML dialog and type in “Execute” This will bring up “Execute Python Script” and below it “Execute R Script”
- Drag “Execute R Script” on to the canvas and click on it.
- When you click on the Execute R Script a dialog will open on the top right that will look like this:
- Next click in the R Script window to open up the R Script editor which will look like this:
- Then delete lines 3 through 16:
- So you see these lines:
- Now insert a few blank lines so you have some breathing space to drop in regular old R Console command lines:
- Last before you run your script, change the mapl.mapOutputPort(“data.set”) to mapl.mapOutputPort(“dataset1”) to
- Here is the complete code for this model so you can copy and paste:
# Map 1-based optional input ports to variables dataset1 model01<-lm(data=dataset1,MPG~Cyl+Displacement+Horsepower+Weight) options(scipen=999) par(mfrow=c(2,2)) plot(model01) summary(model01) maml.mapOutputPort(“dataset1”);
- Finally, click the “Run” button and watch your experiment compute itself!
- And now the step we have been waiting for, …. O-U-T-P-U-T!!!! Output! Output! Output! First to get the cool visual plot of your model fit, click on the lower right dot on the “Execute R Script”
- Note that the par(mfrow=c(2,2)) command in the R Script allowed all 4 output plots to plot together!
The big deal about this is that R on Azure Machine Learning behaves just like R in R Studio. R is R!
- Next, the R console output, where did it go?
To find your console output, close the visualization of the model plots, and then click on the “Execute R Script” box again. Then look for “View output log” and click on it.
- Copy the output from R then paste into Microsoft Word
- Select all
- Format font as courier new
- Search for “[ModuleOutput]” and replace all with blank
- Search for ^p^p and replace all with ^p and you have unsullied R Console. Proper output for a regression model!!! Looks like this:
- You can copy this block of output and paste it into your reports to get plain text regression output.
lm(formula = MPG ~ Cyl + Displacement + Horsepower + Weight, data = dataset1)
Min 1Q Median 3Q Max -11.5248 -2.7964 -0.3568 2.2577 16.3221
Estimate Std. Error t value Pr(>|t|) (Intercept) 45.7567705 1.5200437 30.102 < 0.0000000000000002 *** Cyl -0.3932854 0.4095522 -0.960 0.337513 Displacement 0.0001389 0.0090099 0.015 0.987709 Horsepower -0.0428125 0.0128699 -3.327 0.000963 *** Weight -0.0052772 0.0007166 -7.364 0.00000000000108 *** — Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1 Residual standard error: 4.242 on 387 degrees of freedom Multiple R-squared: 0.7077, Adjusted R-squared: 0.7046 F-statistic: 234.2 on 4 and 387 DF, p-value: < 0.00000000000000022
- But, for *bonus* points, while you are in MS Word, you can hold down the ALT key, then click, and select *columns* of data to paste into Excel, then do CONTROL-C to copy the column of data.
- Then switch to Excel, right click the cell you want to paste into, and click the 2nd icon from the left
- And Excel will paste the values into your worksheet.
Azure Machine Learning is the most powerful R environment on the planet, that you don’t have to buy. Just the free account in Azure Machine Learning will give you more RAM (56 GB) more disk storage (10GB) more model components (up to 100 blocks of which with R, you will use 2 or 3), and up to an hour of compute per model run. All, for free. Did I remember to mention that AML is free?
If you have any questions, please email email@example.com and let me know your questions, and I’ll revise this blog post to cover what I’ve missed above.
Feel the data science (free) love!