Microsoft expands R support and adds Python for developers who aren’t also data scientists Microsoft last week announced a wave of new features for its data platform, along with the SQL Server 2017 name and what Microsoft calls a “production quality” beta release. Other important changes include a new containerized deployment model for databases, which simplifies installation on Windows and Linux. But it was SQL Server’s new machine learning tools that grabbed my attention. Machine learning remains one of Microsoft’s big themes for 2017, and it’s an important segment of SQL Server 2017. Mixing code and data has always been part of SQL Server, first with T-SQL, then with the Azure-focused U-SQL, which extended T-SQL with C# elements. SQL Server 2016 added support for embedded R code, and SQL Server 2017 continues that evolution by improving its support for R and adding Python. (By renaming SQL Server 2016’s R Services to Machine Learning Services in SQL Server 2017, Microsoft has made clear where it’s aiming its SQL tools.) Including R and Python in SQL Server works well for both the existing SQL Server audience and for data scientists who are unlikely to have experience with T-SQL. The two languages have become important data science tools, with statistical analysis baked deep into their DNA. R remains clearly focused on statistical analysis, while Python adds statistical tools to a popular and flexible scripting language. With R aimed at statistical analysis experts, Python is perhaps the easiest on-ramp to analytical programming for the rest of us, especially with a wide choice of relevant packages that add new data analysis features to a familiar language. With Python inside SQL Server, you can bring existing data and code together. Data is accessible directly, so there’s no need to extract query data sets, moving from storage to application. It’s a useful approach, especially where there are issues of data sovereignty and compliance. Your code runs inside the SQL Server security boundaries, triggered by a single call from T-SQL stored procedures. Installing the Python options adds not only a Pythion interpreter, but also several commonly used machine learning tools, including a subset of the Anaconda Python-based data science tools and Microsoft’s own RevoScalePy package. Anaconda is an interesting choice because it comes from a big data background. Designed for working with Hadoop and Amazon S3 clusters, it includes packages for visualization and machine learning. There’s a lot in it, probably more than you’re going to initially use inside SQL Server. Because it’s one of the more popular big data tool sets for Python, its inclusion lets data scientists quickly bring their existing skills (in concert with Hadoop) and code to SQL Server. If you want or need extras, use Python’s built-in package-management tools to download more. You can even opt for SQL Server’s Python tools to work large-scale open source machine learning packages like Microsoft’s Cognitive Toolkit and Google’s TensorFlow, adding GPU compute to the mix. Running Python code inside SQL Server lets you take advantage of numerous Microsoft performance and scaling features, with direct access to its in-memory database features, speeding up OLAP queries. Because the code runs as stored procedures, database developers don’t have to be Python experts; they simply bring existing code into the database and let data scientists do what they’re best at while ensuring the data remains secure. To get started, developers can work with a set of data extracts. Once written, the same code can run locally, in on-premises databases, and in the cloud. Adding support for both languages in its main data platform is a logical move for Microsoft. Because it runs both on-premises and in the cloud (and now on Linux and MacOS), SQL Server can work with not only the traditional big data sources, but with all your data. Because it builds on existing R support that arrived with SQL Server 2016, SQL Server developers and admins aren’t starting from scratch. Another, more commercial aspect of these new capabilities is giving databases feature parity with statistical analysis tooling like SAS. Microsoft wants you to ask: Why should you install a hefty and expensive third-party system, when all you need is already in your database? Especially when you add further intelligence via templated solutions? The beta version of SQL Server 2017 is a taste of what Microsoft hopes to bring to answer that question in its favor. Related content analysis How Azure Functions is evolving Microsoft has delivered major updates to its serverless compute service to align it more closely with development trends including Kubernetes and generative AI. By Simon Bisson Jul 11, 2024 7 mins Azure Functions Microsoft Azure Serverless Computing analysis Understanding DiskANN, a foundation of the Copilot Runtime Microsoft is adding a fast vector search index to Windows. It’s tailor-made for fast, local small language models like Phi Silica. By Simon Bisson Jul 04, 2024 7 mins Software Deployment Generative AI Artificial Intelligence analysis AI development on a Copilot+ PC? Not yet Microsoft’s new AI-infused hardware shows promise for developers, but the platform still needs work to address a fragmented toolchain. By Simon Bisson Jun 27, 2024 9 mins Visual Studio Code Software Deployment Generative AI analysis Inside today’s Azure AI cloud data centers At Build, Microsoft described how Azure is supporting large AI workloads today, with an inference accelerator, high-bandwidth connections, and tools for efficiency and reliability. By Simon Bisson Jun 20, 2024 7 mins Microsoft Azure Technology Industry Artificial Intelligence Resources Videos