data:image/s3,"s3://crabby-images/3cdc0/3cdc0d454ace1ef7c04d4a6cf336abc4f1512c8c" alt="SQL Server 2017 Machine Learning Services with R"
Installing new R packages
An R package is a container of functions that serve a particular purpose with the relevant binary source code (usually C, C++, or Fortran), documentation, and sample data. A package is a collection of these files that reside in a library folder. If you navigate to your R library folder, you will see all the packages installed for your R Engine. A package might also be called a binary package or a tarball, depending on the operating system.
A package is not equivalent to a library, nor should it be mistaken for one. In the R language, when installing a package the command install.packages is used. A library() is a function that loads functions in particular packages into your R environment. Deriving from the Windows OS, shared objects are called Dynamic-link library's (DLLs). Hence, the word library is used and refers to common and shared objects. So, to load a package into your R environment, the function library() is used, with the name of the package specified in brackets.
Referring to an R library in an R script is super easy; simply add the library or use the require() method. But in the system, the library must be, in the first place, installed.
Libraries are installed by installing packages available in common repositories, such as CRAN, Biocondutor, GitHub, and MRAN in the case of the Microsoft repository. In the R language, a library is installed by invoking the following command:
install.packages("Package_Name")
In SQL Server 2016, the installation of such packages was not possible by running an R script with an external stored procedure and the code returned an error, as follows:
--Install Package using sp_execute_external_script EXECUTE sp_execute_external_script @language = N'R' ,@script = N'install.packages("AUC")'
This T-SQL code returns an error, saying that this package is not available for my R version. However, we will later see how to install the same package:
data:image/s3,"s3://crabby-images/5847c/5847c7fa0362d3e4ab30d58cae56fbb2eae1e155" alt=""
So, we can extend the original stored procedure, as follows:
EXECUTE sp_execute_external_script @language = N'R' ,@script = N' library(Hmisc) u <- unlist(rcorr(Customers_by_invoices$InvoiceV, Customers_by_invoices$CustCat, type="spearman")) statistical_significance<-as.character(u[10]) OutputDataSet <- data.frame(statistical_significance)' ,@input_data_1 = N'SELECT SUM(il.Quantity) AS InvoiceQ ,SUM(il.ExtendedPrice) AS InvoiceV ,c.CustomerID AS Customer ,c.CustomerCategoryID AS CustCat FROM sales.InvoiceLines AS il INNER JOIN sales.Invoices AS i ON il.InvoiceID = i.InvoiceID INNER JOIN sales.Customers AS c ON c.CustomerID = i.CustomerID GROUP BY c.CustomerID ,c.CustomerCategoryID' ,@input_data_1_name = N'Customers_by_invoices' WITH RESULT SETS (( statistical_significance FLOAT(20) )); GO
If we do this, we will be able to calculate the statistical significance for the correlation between two variables. The point here is that we are referring to the R function library(Hmisc) with the appropriate package name. The following is the detailed part of the script:
-- part of R script with reference to call method library ,@script = N' library(Hmisc) u <- unlist(rcorr(Customers_by_invoices$InvoiceV, Customers_by_invoices$CustCat, type="spearman")) statistical_significance <-as.character(u[10]) OutputDataSet <- data.frame(statistical_significance)'
When referring to a library, we need to have the package already preinstalled; otherwise, you will receive an error stating that the package does not exist. The same error will be received if you misspell the package name and, since the R language is case-sensitive, in the case of the Hmisc package, an error will occur when mistyping it as hmisc (without the capital letter H):
data:image/s3,"s3://crabby-images/edd4e/edd4e687a340522bdc617fa64efd6e62da0381b8" alt=""