R is the hottest topic in SQL Server 2016. If you want to learn how to use it for advanced analytics, join my seminar at SQL Nexus conference on my 1st in Copenhagen. Although there is still nearly a month before the seminar, there are less than half places still available. You are also very welcome to visit my session Using R in SQL Server, Power BI, and Azure ML during the main conference.
For beginners, I have another session in the same week, just this time in Budapest. You can join me at the Introducing R session on May 6th at SQL Saturday #626 Budapest.
Here is the description of the seminar.
As being an open source development, R is the most popular analytical engine and programming language for data scientists worldwide. The number of libraries with new analytical functions is enormous and continuously growing. However, there are also some drawbacks. R is a programming language, so you have to learn it to use it. Open source development also means less control over code. Finally, the free R engine is not scalable.
Microsoft added support for R code in SQL Server 2016 and, Azure Machine Learning, or Azure ML, and in Power BI. A parallelized highly scalable execution engine is used to execute the R scripts. In addition, not every library is allowed in these two environments.
Attendees of this seminar learn to program with R from the scratch. Basic R code is introduced using the free R engine and RStudio IDE. Then the seminar shows some more advanced data manipulations, matrix calculations and statistical analysis together with graphing options. The mathematics behind is briefly explained as well. Then the seminar switches more advanced data mining and machine learning analyses. Attendees also learn how to use the R code in SQL Server, Azure ML, and create SQL Server Reporting Services (SSRS) reports that use R.
- The seminar consists of the following modules:
- Introduction to R
- Data overview and manipulation
- Basic and advanced visualizations
- Data mining and machine learning methods
- Scalable R in SQL Server
- Using R in SSRS, Power BI, and Azure ML
Hope to see you there!
I can proudly announce that it is already possible to preorder the SQL Server 2016 Developer’s Guide book (https://www.amazon.com/SQL-Server-2016-Developers-Guide-ebook/dp/B01MS5L01Q/ref=sr_1_20?ie=UTF8&qid=1488533994&sr=8-20&keywords=SQL+Server+2016).
This is the 14th book I authored or coauthored This time, my coauthors are Miloš Radivojević (@MilosSQL) and William Durkin (@sql_williamd). It was not an easy job, but it was very nice to work with them, so thank you both! Hope that we will work together soon again.
Here is a very brief description of the fourteen chapters of the book. Hopy you will enjoy reading it!
Chapter 1: Introduction to SQL Server 2016
Many new improvements have been made to SQL Server 2016. In this chapter we’ll cover very briefly most important features and enhancements, not only those for developers. We want to show the whole picture and to point where the things are moving on. Although this book is for developers and covers developer related features, it is pretty clear that in 5-10 years all developers need to deal with some features which are currently developed and are under development within the business intelligence scope. Therefore, it is important to show tendencies so that everyone could consider to embrace some of them. We will also present how Microsoft plan to deliver services and products in the future.
Chapter 2: Review of SQL Server Features for Developers
A brief recapitulation of the features available for developers in previous versions of SQL Server in this chapter serves as a foundation for explanation of the many new features in SQL Server 2016. Some best practices are covered as well.
Chapter 3: SQL Server Tools
Understanding changes in the release management of SQL Server tools and exploring small and handy enhancements in SQL Server Management Studio (SSMS). Using new fancy feature live query statistics. Exploring SQL Server Data Tools (SSDT) and its support for continuous integration and deployment automation.
Chapter 4: Transact-SQL Enhancements
Exploring Transact-SQL enhancements: new functions and syntax extensions, discovering ALTER TABLE improvements for online operations and considering new query hints for query tuning.
Chapter 5: JSON Support
Supporting JSON data was the most requested feature on the Microsoft SQL Server connect site. This feature has been finally added in SQL Server 2016. Having JSON support built into SQL Server should make it easier for applications to exchange JSON data with SQL Server.
Chapter 6: Stretch Database
Understanding how to migrate historical or less accessed data transparently and securely to the Microsoft Azure by using Stretch Database (Stretch DB) feature.
Chapter 7: Temporal Tables
SQL Server 2016 introduces support for system-versioned temporal tables based on the SQL:2011 standard. We’ll explain how this implemented in SQL Server is and demonstrates some use cases for it (time-travel application). We’ll also discuss what is still missing for the full Temporal Data support in SQL Server.
Chapter 8: Tightening the Security
SQL Server 2016 introduces three new security features. With Always Encrypted SQL Server finally enables full data encryption, so that no tools or persons regardless their database and server permissions can read encrypted data except the client application with an appropriate key. Row-level security on the other side restricts which data in a table can be seen by specific user. This is very useful in multi-tenant environments where you usually want to avoid data-reading intersection between different customers. Dynamic data masking is a soft feature that limits sensitive data exposure by masking it to non-privileged users.
Chapter 9: Query Store
Understanding how to use Query Store to troubleshoot and fix performance problems that are related to execution plan changes. Although this is primarily DBA feature, it can be also very useful for developers to identify most expensive and queries with regressed (or heavily changed) execution plans. It will also help them to analyze and become more familiar with the workload patterns generated by their applications and services.
Chapter 10: Columnstore Indexes
Columnar storage was first added to SQL Server in version 2012. It included nonclustered columnstore indexes (NCCI) only. Clustered columnstore indexes (CCI) were added in version 2014. In this chapter, the readers revise the columnar storage and then explore huge improvements for columnstore indexes in SQL Server 2016: updateable nonclustered columnstore indexes, columnstore indexes on in-memory tables, and many other new features for operational analytics.
Chapter 11: Introducing SQL Server In-Memory OLTP
Understanding in SQL Server 2014 introduced, but still underused In-Memory database engine that provides significant performance gains for OLTP workloads.
Chapter 12: In-Memory OLTP Improvements in SQL Server 2016
With the new SQL Server 2016 release many of the issues that might block the adoption of In-Memory OLTP have been eliminated: supporting foreign keys, check and unique constraints, parallelism, recommended maximum size of In-Memory tables has been increased to 2 TB, tables, stored procedures and indexes can be altered… Also Transact-SQL constructs support for by In-Memory tables and compiled stored procedures has been extended. All these improvements extend the number of potential use cases and allow the implementation with less development effort and risk.
Chapter 13: Supporting R in SQL Server
SQL Server R Services combines the power and flexibility of the open source R language with enterprise-level tools for data storage and management, workflow development, and reporting and visualization. This chapter introduces the R Services and the R language.
Chapter 14: Data Exploration and Predictive Modeling with R in SQL Server
Just knowing that you can use the R language inside SQL Server does not help much. After R and R support in SQL Server were introduced in the previous chapter, this chapter shows how you can use R for advanced data exploration and manipulation and for statistical analysis and predictive modeling way beyond the possibilities when using T-SQL language only.
So it is over:-) The fourth SQL Saturday in Ljubljana, Slovenia, the last SQL Saturday in Europe this year… SQL Saturday #567. Probably still some time is needed to collect and sort the impressions. However, from my perspective, the conference was a huge success.
We have been all together, the attendees, the speakers, the sponsors, and the organizers, 240 out of 278 registered, having 86.33% attendance and less than 14% drop off rate. The drop-of rate makes our SQL Saturday again one of the most successful in the world. For me, this is very important. I really like when registered attendees show respect to speakers, sponsors and organizers, and simply come. I don’t like events where the drop off rate is 50% or even more. Although SQL Saturday is a free event, everybody should respect the fact that the speakers and the organizers are giving their free time, and the sponsors are giving the money.
The two pre-conference seminars help us closing the budget. Therefore, special thanks goes to the pre-conference speakers. Of course, besides the seminars, the sponsors are the ones who enabled the event.
On Saturday, we started with a short keynote. Surprisingly, most of the attendees, and not so surprisingly, all of the organizers were there:-)
Then we continued with the regular sessions. Except for the first time slot, when we had some issues with wireless, everything went smoothly. With 30 sessions in 5 tracks, the day was quite intensive.
Of course, we provided food and drinks. So far, there were no complaints, seems like the food was really good. No surprise for me, in Slovenia food is important, with bad food you get immediately bad evaluations, no matter of the quality of the presentations. Seems like the body food is more important than the food for the soul:-)
One of the special traditions of our SQL Saturday is also wine and schnapps tasting after the raffle, at the end of the event.
And we finished with speakers dinner and a party that for some lasted till the morning. Let’s skip the details here. For now, thanks to everybody involved in this great event!
Hope we all meet again next year!
Time to expose our wonderful sponsors, who enabled the event! Please check their companies and their great product and services.
And here is the list, with the link to the SQL Saturday #567 Sponsors page, where you can click individual links.
We would like to make an appeal to all of you who are registered to the PASS SQL Saturday #567 Slovenia event: please come. Please remember that this conference was made possible because of the speakers, who are using their time and come on their own expenses to give you state of the art presentations, because of the sponsors, who are giving us and financing the venue, the food, the raffle awards, and more, and of course, because of many volunteers who spend their free time to help with the organization. We are also paying a fixed number of meals to the catering company; therefore, we would throw the money away for those who are registered and would not come. In short: all you need to do is to wake up, get out of bed, get into a good mood, and come to the event to get top presentations, good food and meet friends!
However, if you are registered and already know that you will not be able to attend: please unregister and make room for those who would like to attend, but are on the waiting list or did not register yet. use the Register Now button, and if you are already registered, you should get an option to unregister.
Thank you very much for understanding,
Matija, Mladen and Dejan
It’s been awhile since I wrote the last blog on the data mining / machine learning algorithms. I described the Neural Network algorithm. In addition, it is a good time to write another post in order to remind the readers of the two upcoming seminars about the algorithms I have in Oslo, Friday, September 2nd, 2016, and in Cambridge, Thursday, September 8th. Hope to see you in one of the seminars. Finally, to conclude this marketing part: if you are interested in the R language, I am preparing another seminar “EmbRace R”, which will cover R from basics to advanced analytics. Stay tuned.
Now for the algorithm. If you remember the post, a Neural network has an input, an output, and one or more hidden layers. The Neural Network algorithm uses the hyperbolic tangent activation function in the hidden layer and the sigmoid function in output layer. However, the Sigmoid function is called the Logistic function as well. Therefore, describing the Logistic Regression algorithm is simple after I described the Neural Network. If a neural network has only input neurons that are directly connected to the output neurons, it is a Logistic Regression. Or, to repeat the same thing in a different way: Logistic Regression is Neural Network with zero hidden layers.
This was quick:-) To add more meat to the post, I am adding the formulas and the graphs for the hyperbolic tangent and sigmoid functions.
I am closing my plan for the second semester of this year. Before listing the events I plan to attend, just a quick comment. I had conversation about some specific events and why don’t I visit them many times, especially about the events in vicinity. My answer is pretty simple. I try to plan my events for six months in advance. My schedule for the year 2016 is full. I simply can’t visit the events that are announced only couple of months in advance. I prefer long-term planning.
Anyway, here is my list, pretty long again.
- SQL Grill, Lingen, Germany, August 19th: one presentation - Statistics with T-SQL
- SQLSaturday #532 - Oslo 2016, September 2nd-3rd:
- SQLSaturday #520 - Cambridge 2016, September 8th-10th:
- SQLSaturday #555 - Munich 2016, October 8th: not confirmed yet.
- SQLSaturday #538 - Sofia 2016, October 15th: not confirmed yet.
- PASS Summit 2016, October 25th-28th, Seattle, WA:
- SQLSaturday #569 - Prague 2016, December 3rd: not confirmed yet.
- SQLSaturday #567 - Slovenia 2016, December 9th-10th: since I am one of the organizers, this one is confirmed:-)
And this should be enough for this year:-)
So we are back again
The leading event dedicated to Microsoft SQL Server in Slovenia will take place on Saturday, 10th December 2016, at the Faculty of Computer and Information Science of the University of Ljubljana, Večna pot 113, Ljubljana (http://www.fri.uni-lj.si/en/about/how_to_reach_us/).
As always, this is an English-only event. We don’t expect the speakers and the attendees to understand Slovenian However, this way, our SQL Saturday has become quite well known especially in the neighboring countries. Therefore, expect not only international speakers, expect international attendees as well. There will be 30 top sessions, two original and interesting pre-conference seminars, a small party after the conference, an organized dinner for the speakers and sponsors… But first of all, expect a lot of good vibrations, mingling with friends, smiling faces, great atmosphere. You might also consider visiting Ljubljana and Slovenia for couple of additional days. Ljubljana is a very beautiful and lively city, especially in December.
In cooperation with Kompas Xnet d.o.o. we are once again organizing two pre-conference seminars by three distinguished Microsoft SQL Server experts:
The seminars will take place the day before the main event, on Friday, 9th December 2016, at Kompas Xnet d.o.o., Stegne 7, Ljubljana. The attendance fee for each seminar is 149.00 € per person; until 31st October 2016 you can register for each seminar for 119.00 € per person.
Hope we meet at the event!
This is a tip that should help installing SQL Server 2016 (tested on CTP33, RC2 and RC3) Master Data Services. The documentation is pretty old and incomplete (I already sent the feedback).
The page “Web Application Requirements (Master Data Services)” (https://msdn.microsoft.com/en-us/library/ee633744.aspx) should be seriously updated.
First of all, there should be documented also how to use operating systems Windows Server 2012 R2 and Windows 10. I managed to install it on Windows Server 2012 R2. However, there is a bullet missing in the Role and Role Services part. In the Performance section, only Static Content Compression is mentioned. However, Dynamic Content Compression is needed as well.
I managed to get it up and running
I got some questions about virtual machine / notebook setup for my Business Intelligence in SQL Server 2016 DevWeek post-conference workshop. I am writing this blog because I want to spread this information as quickly as possible.
There will be no labs during the seminar, no time for this. However, I will make all of the code available. Therefore, if the attendees would like to test the code, they need to prepare their own setup. I will use the following SW:
Windows Server 2012 R2
SQL Server 2016 components
- Database Engine
- R Services
- SQL Server Management Studio (this is not included in SQL Server setup anymore)
- SQL Server Data Tools
- R Tools for Visual Studio
- R Studio
Excel 2016 Professional Plus with add-ins
- MDS add-in
- Power Pivot
- Power Query
- Power Map
- Power View
- Azure ML add-in
Excel 2013 Professional Plus with add-ins
- Data Mining add-in (this add-in does not work for Excel 2016 yet, this one is announced for Excel 2016 only later this year, after SQL Server 2016 release)
Power BI Apps and Services
- Power BI Desktop
- Power BI Service (they need to create a free account at PowerBI.com)
- Azure ML (they need to create a free account at AzureML.com)
AdventureWorks demo databases version 2016, 2014 or 2012
I know the list is long:-) However, nobody needs to test everything. Just pick the parts you need and you want to learn about.
See you soon!
Traditionally, I write down a list of presentations I am giving on different events every semester.
This semester, I am already a bit late. I am still missing some info. So here is the list of the events I am planning to attend. I will add events and correct the list as needed later. Here is the updated info. Of course, more updates will come when I get the relevant information.
- Bulgarian UG meeting, Sofia, January 14th: presentation Introducing R and Azure ML
- Slovenian UG meeting, Ljubljana, February 18th: presentation Introducing R and Using R in SQL Server 2016, Power BI, and Azure ML
- SQL Server Konferenz 2016, Darmstadt, February 23rd – 25th:
- pre-conference seminar Data Mining Algorithms in SSAS, Excel, R, and Azure ML
- presentation SQL Server & Power BI Geographic and Temporal Predictions
- PASS SQL Saturday #495, Pordenone, February 27th:
- presentation SQL Server 2012-2016 Columnar Storage
- presentation Enterprise Information Management with SQL Server 2016
- DevWeek 2016, London, April 22nd – 26th:
- post-conference seminar Business Intelligence in SQL Server 2016
- presentation Using R in SQL Server 2016 Database Engine and Reporting Services
- presentation SQL Server Isolation Levels and Locking
- SQL Nexus, Copenhagen, May 2nd – 4th: presentation Identity Mapping and De-Duplicating
- SQL Bits 2016, Liverpool, May 4th – 7th: presentation Using R in SQL Server 2016 Database Engine and SSRS
- SQL Day, Wroclaw, May 16th – 18th:
- pre-conference seminar Data Mining Algorithms in SSAS, Excel, R, and Azure ML
- presentation: Statistical Analysis with T-SQL
- presentation: Anomaly Detection
- PASS SQL Saturday #508, Kyiv, May 21st: information to follow.
- PASS SQL Saturday #510, Paris, June 25th: information to follow.
- PASS SQL Saturday #520, Cambridge, September 10th: information to follow. And yes, this is already quarter 3, but I am late with this ist anyway
A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform "intelligent" tasks similar to those performed by the human brain. Neural networks resemble the human brain in the following two ways:
- A neural network acquires knowledge through learning
- A neural network's knowledge is stored within inter-neuron connection strengths known as synaptic weights
The Neural Network algorithm is an artificial intelligence technique that explores more possible data relationships than other algorithms. Because it is such a thorough technique, the processing of it is usually slower than the processing of other classification algorithms.
A neural network consists of basic units modeled after biological neurons. Each unit has many inputs that it combines into a single output value. These inputs are connected together, so the outputs of some units are used as inputs into other units. The network can have one or more middle layers called hidden layers. The simplest are feed-forward networks (pictured), where there is only a one-way flow through the network from the inputs to the outputs. There are no cycles in the feed-forward networks.
As mentioned, units combine inputs into a single output value. This combination is called the unit’s activation function. Consider this example: The human ear can function near a working jet engine. Yet, if it were only 10 times more sensitive, you would be able to hear a single molecule hitting the membrane in your ears! What does that mean? When you go from 0.01 to 0.02, the difference should be comparable with going from 100 to 200. In biology, there are many types of non-linear behavior.
Thus, an activation function has two parts. The first part is the combination function that merges all of the inputs into a single value (weighted sum, for example). The second part is the transfer function, which transfers the value of the combination function to the output value of the unit. The linear transfer function would do just the linear regression. The transfer functions are S-shaped, like the sigmoid function:
Sigmoid(x) = 1 / (1 + e(-x)).
A single hidden layer is optimal, so the Neural Network algorithm always uses a maximum of one (or zero for Logistic Regression).
The Neural Network algorithm uses the hyperbolic tangent activation function in the hidden layer and the sigmoid function in output layer. You can see a Neural Network with a single hidden layer in the following picture.
Training a neural network is the process of setting the best weights on the inputs of each of the units. This backpropagation process does the following:
- Gets a training example and calculates outputs
- Calculates the error – the difference between the calculated and the expected (known) result
- Adjusts the weights to minimize the error
Like the Decision Trees algorithm, you can use the Neural Network algorithm for classification and prediction. The interpretation of the Neural Network algorithm results is somewhat more complex than the interpretation of the Decision Trees algorithm results. Consequently, the Decision Trees algorithm is more popular.
Our third SQL Saturday in Ljubljana is over. Two weeks seems to be enough time to sleep over and see things a bit from a distance. Without any further delays, I can declare that it is clear that the event was a pure success
Let me start with the numbers, comparing total number of people, including speakers, sponsors, attendees, and organizers, with previous two SQL Saturdays in Ljubljana:
- 2013: 135 people from 12 countries
- 2014: 215 people from 16 countries
- 2015: 253 people from 20 countries
You can clearly see the growth. Even the keynote was full, like the following picture shows.
We again experienced very small drop rate; more than 90% or registered attendees showed up. That’s very nice, showing respect to the speakers and sponsors. So thank you, attendees, for being fair and respectful again!
We had more sponsors than previous years. This was extremely important, because this time we did not get the venue for free, and therefore we needed more money than for the previous two events. Thank you, sponsors, for enabling the event!
Probably the most important part of these SQL Saturday events are the speakers. We got 125 sessions submitted by 51 speakers from 20 countries! We were really surprised. We take this a sign of our good work in the past. 30 great sessions with two state of the art precon seminars is more than we expected, yet still not enough to accommodate all speakers that sent the submissions. Thank you all speakers, those who were selected and those who were not! I hope we see you again in Slovenia next year. You can see some of the most beautiful speakers and volunteers in the following picture (decide by yourself if there is somebody spoiling the picture).
Next positive surprise were the volunteers. With these number of speakers and attendees, we would not be able to handle the event without them. We realized that we have a great community, consisting of some really helpful people, that we can always count on. Thank you all!
I think I can say for all three organizers, Mladen Prajdić, Matija Lah, and me, that we were more tired than any year before. However, hosting a satisfied crowd is the best payback you can imagine And the satisfaction level was high even among the youngest visitors, as you can see from the following picture.
Of course, we experienced also some negative things. However, just a day before the New Year evening, I am not going to deal with them now. Let me finish this post in a positive way
Decision Trees is a directed technique. Your target variable is the one that holds information about a particular decision, divided into a few discrete and broad categories (yes / no; liked / partially liked / disliked, etc.). You are trying to explain this decision using other gleaned information saved in other variables (demographic data, purchasing habits, etc.). With limited statistical significance, you are going to predict the target variable for a new case using its known values of the input variables based on results of your trained model.
Recursive partitioning is used to build the tree. The data is split into partitions using a certain value of one of the explaining variables. The partitions are then split again and again. Initially the data is in one big box.
The algorithm tries all possible breaks of both input (explaining) variables for the initial split. The goal is to get purer partitions considering the classes of the target variable. You know intuitively that purity is related to the percentage of the cases in each class of the target variable. There are many better, but more complicated measures of the purity, for example entropy or information gain.
The tree continues to grow using the two new partitions as separate starting points and splitting them more. You have to stop the process somewhere. Otherwise, you could get a completely fitted tree that has only one case in each class. The class would be, of course, absolutely pure. This would not make any sense. The results could not be used for any meaningful prediction. This phenomenon is called “over-fitting”. There are two basic approaches to solve this problem: pre-pruning (bonsai) and post-pruning techniques.
The pre-pruning (bonsai) methods prevent growth of the tree in advance by applying tests at each node to determine whether a further split is going to be useful; the tests can be simple (number of cases) or complicated (complexity penalty). The post-pruning methods allow the tree to grow and then prune off the useless branches. The post-pruning methods tend to give more accurate results, but they require more computation than pre-pruning methods.
Imagine the following example. You have the answers to a simple question: Did you like the famous Woodstock movie? You also have some demographic data: age (20 to 60) and education (ranged in 7 classes from the lowest to the highest). In all, 55% of the interviewees liked the movie and 45% did not like it.
Can you discover factors that have an influence on whether they liked the movie?
Starting point: 55% of the interviewees liked the movie and 45% did not like it.
After checking all possible splits, you find the best initial split made at the age of 35.
With further splitting, you finish with a full-grown tree. Note that not all branches lead to purer classes. Some of them are not useful at all and should be pruned.
Decision trees are used for classification and prediction. Typical usage scenarios include:
- Predicting which customers will leave
- Targeting the audience for mailings and promotional campaigns
- Explain reasons for a decision
- Answering questions such as “What movies do young female customers buy?”
Decision Trees is the most popular data mining algorithm. This is because the results are very understandable and simple to interpret, and the quality of the predictions is usually very high.