I ordered a 512 GB SSD so that I can get great performance running Windows 8.1. I’ll have all of my spinning drives intact, so I’m not worried about actual data loss. The concern is in preserving things like IE favorites for every family member. The User State Migration Tool (USMT) has ScanState.exe to capture what you want to preserve and LoadState.exe to migrate the saved state to the new installation.
Before forging ahead with USMT, it’s advisable to first check your PC for any potential issues. Download and run the Upgrade Assistant to find out if you have any compatibility issues. Go to this page to both obtain it and find out more about it.
Figure 1. Upgrade Assistant.
I didn’t have any hardware or driver issues. The only issues were with programs I don’t want on the new Windows 8.1 installation anyway.
USMT is found in the Windows Assessment and Deployment Kit (ADK) for Windows 8, which you download from here. If the installation defaults are used, you will NOT obtain the USMT as the screen capture shows.
Figure 2. ADK installation defaults. Notice that USMT is not selected.
If you have particular applications that you are interested in migrating, you might want to also select the Application Compatibility Toolkit. I didn’t actually use it, but just wanted to make you aware of its existence.
Figure 3. ADK with USMT selected.
Be sure to use the right executables for your machine’s architecture. Look in the appropriate folder for your processor.
32-bit C:\Program Files (x86)\Windows Kits\8.0\Assessment and Deployment Kit\User State Migration Tool\x86
64-bit C:\Program Files (x86)\Windows Kits\8.0\Assessment and Deployment Kit\User State Migration Tool\amd64
I copied all of the files in the …\amd64 folder to C:\USMT on my Windows 7 machine. If you read the USMT documentation (which is advisable), it recommends that you take this approach. Open an administrative command prompt and navigate to that folder. Specify an output XML file. Here is my session:
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
SCANSTATE.EXE Version 6.2.9200.16384
(C) 2012 Microsoft Corporation. All rights reserved.
Log messages are being sent to 'C:\USMT\scanstate.log'
Starting the migration process
I examined the XML file and realized that I had the following users on my machine that I’d completely forgotten about:
Taking the defaults when running LoadState would migrate these users. There’s no point in doing that because I’m going to be reinstalling SQL Server after installing Windows 8.1 from scratch.
You can use the /ui switch to include only the specified users when running either ScanState or LoadState. There is also a /ue switch to exclude the specified users. Read more about the switches here. It’s a tradeoff between space and flexibility. If you take the defaults with ScanState, you grab everything. It takes more space, but you can migrate any user later. If you restrict what you capture with ScanState, you will use less space but can’t later LoadState for a user you didn’t capture with LoadState in the first place. My recommendation is to exclude the SQL Server users shown above from the ScanState. I definitely won’t be migrating them because my objective is to make a full backup of Windows 8.1 before installing any applications. I don’t want any clutter with superfluous user accounts in that initial system image.
It’s easy to find what calls a stored procedure in SQL Server. Finding where it is called outside of SQL Server is an entirely different matter. I use great free utilities for either use case.
Of course, you can use built-in features in SSMS to find out what calls a stored procedure, for example. Select an object in the Object Explorer, right-click, select View Dependencies.
Figure 1. View Dependencies in SSMS
You can also script a solution, either in T-SQL or PowerShell. Dependencies are found in sys.sysdepends in SQL Server 2005 and above or sysdepends in SQL Server 2000. What I really like to use is the free SQL Search utility from Red Gate. It’s powerful, intuitive, works well, and free (just in case you missed that very important point).
Things get more challenging when you need to know what items outside of SQL Server are referencing SQL Server objects. If you have .NET code, you can search all of your code using Visual Studio. But you might not have Visual Studio and even if you do, you could have a large collection of batch jobs (could be non-Microsoft) scattered around your filesystem as I experienced today.
My favorite free tool for searching the filesystem is FileLocator Lite. I’ve been using it for many years, but didn’t blog about it when I first discovered it because it had an awful name – Agent Ransack. Try telling your software compliance people that you want some freeware named Agent Ransack – that’s just not going to happen. Fortunately, this great tool was rebranded as FileLocator Lite.
I had to search all of the many files on a large network share for all of the stored procedures in a particular database. First I got a list of the names of all of the stored procedures.
where type = 'P';
The list looked something like this:
FileLocator Lite can search for multiple strings at once, provided they are separated by the correct Boolean operator, which is an OR in my case. It appeared to me that Boolean operators need to be in all uppercase to work properly. I used regular expressions with SSMS’s Search and Replace option to quickly edit my long list in a single pass.
Figure 2. Global search and replace using regular expressions.
I substituted OR \n for \n and ended up with something like this which I pasted into FileLocator Lite:
Figure 3. FileLocator Lite search results.
FileLocator Lite will search subfolders if you check the box to do so. It also supports regular expressions. What I find particularly useful is the context sensitive pane on the right where you can preview and see if the found string really is what you want. You can save your search criteria or your results to a file.
When you are dealing with the United States federal government, accessibility isn’t a nice to have, it is the law. As I was researching the law, I came across a related item, the Bond (as in 007) rule, which was proposed by Sidney Smith in the journal Human Factors in 1977.
The federal government has an entire website section508.gov describing the 1998 law that requires federal agencies to make their applications and websites accessible. I encourage you to go to the site and learn more. Of particular interest to application developers is the Technology Tools page. It contains a lot of great links. You’ll find references to many things things including some I’ve blogged about before – color blindness and epilepsy.
Smith discovered that to be readable by the masses, a character needs to be at least .007 radians. That’s why some people refer to this as the Bond Rule. Using radians to specify character height provides a measurement independent of the viewing distance. To know the exact character height needed to be visible at a specific distance, multiply the viewing distance by 0.007 to find the character height in the same units as the viewing distance. If you have 20/40 vision and are reading text that is 100 cm away, the character size should be at least 0.7 cm (7 mm). 20/40 is used instead of 20/20 because 20/40 is more representative of the real world than the idealized 20/20.
SQL Server 2012 simplifies the processing of unstructured data found in files. The FileTable enhancements to FILESTREAM allow documents to be known to SQL Server full-text search by simply copying them to a FileTable network share. This is much more convenient than it was in SQL Server 2008 R2 where the files had to be explicitly loaded into SQL Server as blobs before they could be processed by full-text search.
Although I’m a full-time IT consultant, I spend most of my spare time pursuing graduate studies in nursing. I’m always looking for ways to use technology to improve my efficiency as a student and also improve my effectiveness as a consultant. Some of the technical tips and tricks I’ve figured out for school have been useful to my clients in corporate environments. In this blog post, I’ve taken PowerPoint slides from my pharmacology course to evaluate the usefulness of using FileTable to process unstructured data found in the same kinds of files used in corporate environments.
Step 1. Access your SQL Server instance’s properties to enable FILESTREAM.
The first checkbox is to enable FILESTREAM in SQL Server. The second checkbox is to allow Windows to read and write FILESTREAM data – in other words, make your files accessible to Windows. The third checkbox is to allow remote users to access your FILESTREAM data. This is the option for real world usage. Typically a SQL Server is on a remote machine, not locally installed.
Figure 2. Check everything if you want maximum functionality.
Changing FILESTREAM settings requires a restart of the SQL Server service.
You must create a FILESTREAM enabled database or add FILESTREAM to an existing database. For this blog post, I chose to create a new database.
CREATE DATABASE School
PRIMARY (NAME = School, FILENAME = 'd:\school\dbFiles\school.mdf')
CONTAINS FILESTREAM(NAME = schoolFilestream, FILENAME = 'd:\school\schoolFilestream')
LOG ON (NAME = SchoolLog, FILENAME = 'd:\school\dbFiles\school.ldf');
The folder for the data and log files must exist. The subfolder for the FILESTREAM must not exist and will be created when the database is created.
Figure 3. Folder structure showing that D:\school\dbFiles exists and D:\school\schoolFilestream does not exist.
After creating the database with the above T-SQL statement, the D:\school\schoolFilestream folder is created as shown below. As the dialog box shows, it is considered a system folder by Windows.
Figure 4. Click Continue to access the new folder and its contents.
Figure 5. The new D:\school\schoolFilestream folder and its contents.
Figure 6. FILESTREAM Directory Name (which must NOT be a full path) and Non-Transacted Access options.
The options specified created a folder fileContainer under a new mssqlserver share as shown below.
Figure 7. Network showing the fileContainer FILESTREAM directory name under the default share name of mssqlserver.
At this point the fileContainer folder is empty. Time to create a FileTable. Notice that the CREATE TABLE statement does not specify any columns. The default is to create a folder with the same name as the FileTable, but the example shows how to use FILETABLE_DIRECTORY to override the default and use pharmFiles for the folder name. Notice that the WITH clause is also used to create a unique constraint.
CREATE TABLE pharm AS FileTable
WITH (FILETABLE_DIRECTORY = N'pharmFiles'
,FILETABLE_STREAMID_UNIQUE_CONSTRAINT_NAME = ui_file_stream);
Figure 8. The CREATE TABLE statement created a new folder pharmFiles.
Within SSMS, you can see that a FileTable has predefined columns.
Figure 9. FileTable columns are predefined.
Now it is time to copy all of the pharmacology PowerPoint files to the pharmFiles folder on the network share.
Figure 10. PowerPoint files added to the previously empty pharmFiles folder.
Let’s examine the contents of the pharm FileTable after the file copy.
Figure 11. Notice that SQL Server recognizes all of the files that were copied to the pharmFiles folder on the mssqlserver share.
Notice that the files are of type pptx. SQL Server 2012 by default does not index pptx files. Use the following query to find out what file types are indexed.
SELECT * FROM sys.fulltext_document_types
ORDER BY document_type;
It is necessary to add an IFilter, which you will probably find is more than a one step process. As of the date of publication, the latest and greatest IFilter pack is the Microsoft Office 2010 Filter Pack (a.k.a. Filter Pack 2.0). It’s possible that when you read this, a newer version may be available, so do your due diligence and check. After installing the Filter Pack, you may need to install a service pack. There’s more than one service pack that has been released for Filter Pack 2.0. I will not provide any links because I don’t want to provide you with outdated information. Microsoft recommends that you obtain service packs by running Windows Update, which is what I did. As a best practice, after installing something from Windows Update, you should check for additional updates that could be required after what you just finished installing. I had three consecutive updates to do. You could have more or less than that.
The Office Filter Pack does not have a dependency on Microsoft Office. You don’t need to have Office installed on your SQL Server machine for the Filter Pack to work. It’s not a good idea to have Office installed on a server.
After installing and updating the Filter Pack, run the following command to load the new filters.
exec sp_fulltext_service 'load_os_resources', 1;
Update the system metadata about what filters are installed.
exec sp_fulltext_service 'update_languages';
Restart the filter daemon.
exec sp_fulltext_service 'restart_all_fdhosts’;
Confirm that the new filters were installed. SELECT
* FROM sys.fulltext_document_types ORDER BY document_type
Create a fulltext catalog and a fulltext index.
CREATE FULLTEXT CATALOG pharmCat AS DEFAULT;
CREATE FULLTEXT INDEX ON pharm (file_stream TYPE COLUMN file_type) KEY INDEX ui_file_stream;
Here’s a query to find out what files in the pharm folder contain the drug name Vancomycin.
WHERE CONTAINS (file_stream, 'Vancomycin');
A single row is returned because only one file, Anti-infective.pptx mentions that drug. A future post will address using FileTable to make sense of unstructured data.
While researching an upcoming blog post on using SQL Server 2012 FileTable, I stumbled across Project Tuva. Bill Gates bought the rights to some of Nobel Laureate Richard Feynman’s physics lectures and made them available for free at http://research.microsoft.com/apps/tools/tuva/.
Speaking of recorded lectures, I highly recommend that students listen to podcasts of lectures during their commutes. The main reason I made it through pharmacology and obstetrics was because I listened to the podcasts of my lectures several times during my normal commuting. The repetition really reinforced the topics. From repetition came familiarity with the material and from the familiarity came understanding.
The International Classifications of Diseases (ICD) is a global standard administered and copyrighted by the World Health Organization (WHO). The 10th revision, ICD-10 is the current revision. Some countries adapt the WHO standard. In the United States, the National Center for Health Statistics (NCHS) has made two modifications to ICD-10 known as ICD-10-CM (Coordination and Maintenance) and ICD-10-PCS (Procedure Coding System). ICD-10 was originally scheduled to replace ICD-9 as the U.S. standard on October 1, 2013, but the implementation date was changed to October 1, 2014.
Systems must be modified to accommodate this new standard. Schemas must be changed, mappings between old and new must take place. Much work is yet to be done, which is why implementation was delayed by a full year.
The granularity and specificity of ICD-10 codes has been ridiculed by the press. It is helpful to understand ICD-10 instead of laugh at it if you want to win ICD conversion business. Code W22.02XA is the code for “walked into a lamppost, initial encounter” and code W22.02XD is the code for “walked into a lamppost, subsequent encounter”. In clinical parlance, an initial encounter is the first time the patient seeks treatment. Followup visits about the same condition are known as subsequent encounters. The term “subsequent encounter” has nothing to do with how many times a patient has walked into a lamppost, which some people have misinterpreted.
The number of codes increased by an order of magnitude when going from ICD-9 to ICD-10. Data professionals should be able to understand and appreciate the intention behind having more specificity to diagnosis codes. If only W22 (striking against a stationary object) was used, the data would be too vague to help with prevention efforts. Just for sake of discussion, let’s consider what if the data showed that most people who walk into something actually walk into lampposts instead of walls (W22.01XA) and furniture (W22.03XA). In this contrived example, it might be in the public interest to see if it would be cost effective to develop a lamppost injury prevention program (just kidding trying to make a point about data mining).
Fine grained data lends itself to analysis. Federal payment programs (Medicaid and Medicare) and private insurance companies are already scrutinizing diagnosis codes and withholding payments in some cases. If a patient without an infection is admitted to a hospital and contracts an infection while being treated in the hospital (this is called a nosocomial infection), payment for treating the infection is likely to be denied. This gives hospitals an incentive to be more diligent in preventing infections.
Having more detail gives hospitals and practitioners an improved ability to indicate the extra complexity of a case and get reimbursed at a higher rate. This could give providers an incentive to convert to ICD-10 perhaps even ahead of the October 1, 2014 deadline. More codes means more data, which can lead to projects to expand and optimize systems to accommodate the impending data explosion.
MUMPS is a NoSQL database and programming language that does not receive widespread attention. I searched for lists of the most important or most used NoSQL databases and MUMPS did not make the lists, although it should.
The Massachusetts General Hospital Utility Multi-Programming System, often referred to as just M, is a key-value database that was developed in 1966. MUMPS became the foundation for the Decentralized Hospital Computer Program (DHCP) developed at the U.S. Veterans Health Administration (VHA). Notice that I said developed at the VHA, not by the VHA. At one time there were direct orders prohibiting further development of this system and work continued in secret. Eventually, DHCP became VistA, which is one of the most widely used Electronic Medical Records (EMR) systems in the world.
VistA is not open source software, it is public domain software. That means that anybody can do anything to it, including making proprietary changes under no obligation to give the changes back to the public. Most VistA implementations use the SQL friendly Caché version of MUMPS from InterSystems, which you can download for free. Caché is also used by the dominant commercial EMR vendor, Epic Systems, as well as TD Ameritrade and the European Space Agency.
VistA does not include a gui. At VA Hospitals, the Computerized Patient Record System (CPRS) is a separate application that provides a minimalist gui on top of VistA. VistA supports the Health Level Seven (HL7) messaging standards used for healthcare informatics interoperability. It also supports SNOMED CT, a healthcare taxonomy supporting exchange of healthcare data. The VA has an ongoing project to enable VistA to support both ICD-10 and Meaningful Use. ICD codes are used to classify and categorize both medical diagnoses and inpatient procedures. Meaningful Use is part of the Patient Protection and Affordable Care Act (PPACA, otherwise known as Obamacare) and affects the reimbursement rates healthcare providers receive.
SNOMED CT, HL7, and ICD-10 are standards used around the world in healthcare. Meaningful Use is specific to the United States. I’ll be discussing all of these topics in greater detail in future blog posts.
Many people have asked me various questions about Personal Health Information (PHI) which is covered by the Health Insurance Portability and Accountability Act (HIPAA, not HIPPA). The process of removing data (de-identification or anonymization) that could violate someone’s privacy is complex. This is particularly true when there is unstructured data (i.e., free text). The U.S. Department of Health and Human Services has detailed guidelines on proper de-identification techniques, which are found here or at Bing’s cached copy here.
I found the guidelines to be very informative. The discussion on zip codes was interesting. Zip codes, particularly in areas that aren’t densely populated, have to be abbreviated to the first three digits. Even when you restrict a zip code to the first three digits, there is a list of 17 specific three digit zip codes that you cannot use at all.
The document says that age must be removed from a patient’s record if the patient’s age is greater than 89. Can you imagine a patient summary beginning with “The patient is a 107 year old man…”? For supercentenarians, age does provide a clue as to who they are. Changing a patient’s age or date of birth helps greatly in de-identification, but care must be taken. You don’t want to make an adult a minor or vice-versa.
Consider this statement in the medical record: “The patient became ill after eating a [insertNameOfReligiousHolidayHere] meal.” One could argue that removing the name of a religious holiday makes for a neutral record. That might be the appropriate thing to do, but there could be clinical value in knowing the religious holiday or the religion of the patient. It could be useful to know if certain things would be eaten or definitely not eaten.
There are published algorithms for processing textual data and de-identifying it. You can download Perl regular expression scripts from PhysioNet for free. The download also includes several dictionaries that the scripts use. Notice there is a dictionary of medical terms and several dictionaries of people’s names. Obviously there is value in knowing if a word is a person’s name or a medical term. DeBakey appears in the SNOMED dictionary as a medical term. There is a DeBakey clamp, a DeBakey pump, and a DeBakey graft. But what if the patient’s name was DeBakey? DeBakey does not appear in the dictionary of common names. Would the scripts recognize DeBakey as a medical term and not remove what actually is the patient’s name?
A known weakness of processing textual data against dictionaries is misspelled words and names. Some names are particularly difficult to spell and will not always be found in the dictionaries because of the inevitable misspellings.
This post is about the political issues involved with using multiple languages in a global organization and how to troubleshoot the technical details. The CHAR and VARCHAR data types are NOT suitable for global data. Some people still cling to CHAR and VARCHAR justifying their use by truthfully saying that they only take up half the space of NCHAR and NVARCHAR data types. But you’ll never be able to store Chinese, Korean, Greek, Japanese, Arabic, or many other languages unless you either use NCHAR and NVARCHAR which are double byte Unicode data types or use specific collations with CHAR and VARCHAR. Using collations is something I will address in a future post. Using CHAR and VARCHAR with collations does not save space and is trickier, which is why this post is focusing on using Unicode.
In troubleshooting a problem displaying Chinese characters, I need a test character to experiment with. I picked 坡 (decimal 22369, hexadecimal 5761) as my test character. Why? Because it looked Chinese to me! Now doesn’t that remind you of Act 1, Scene 2 of William Shakespeare’s Julius Caesar where Casca says to Cassius “but, for mine own part, it was Greek to me”? It really looked Chinese to me, but then, what do I know? That presented a problem. What if it meant something really offensive? After consulting a Chinese immigrant, I was relieved to know that my randomly chosen character was politically correct. And just to be extra safe, I double checked with fellow SQL Server MVP Greg Low. Thanks, Greg.
I wrote the following test code to show you what happens when Unicode (which is always double byte) is implicitly converted to single byte CHAR or VARCHAR.
create table #c (sqlString NVARCHAR(30), sqlChar NCHAR(1));
insert into #c values (N'insert into #c values ( ''坡'')', '坡'); -- invalid single byte syntax
insert into #c values (N'insert into #c values (N''坡'')', N'坡'); -- valid double byte Unicode syntax
, UNICODE(sqlChar) as N'UNICODE(sqlChar)'
drop table #c;
sqlString sqlChar UNICODE(sqlChar)
------------------------------ ------- ----------------
insert into #c values ( '坡') ? 63
insert into #c values (N'坡') 坡 22369
Here is another test you can run:
select'坡' , N'坡'
The point is that if you start with a Unicode character such as 坡 but you insert it as '坡', it is no longer a double byte character. You’ve crammed it into a single byte and corrupted it. You have to insert it as N'坡' to prevent corruption and you must have a double byte destination for it. By the way, the N stands for National Language. You may also see references to National Language Setting(s) or even National Character Set (NCS).
The Character Map utility in Windows is helpful when working with Unicode characters.
Figure 1. Character Map utility in Windows.
Personally I prefer the richer functionality of BabelMap, which is available as a free download, online application, or portable application.
Figure 2. BabelMap free Unicode utility.
Microsoft Project does not allow a comma to be added to a resource name. In healthcare, the norm is to refer to people using the pattern of Name, Title which in my case is John Cook, RN. Not all commas are equal. By substituting a different comma for the one Project doesn’t like, it’s possible to add a comma to a resource name.
Figure 1. Error message after trying to add a comma to a resource name in Microsoft Project 2013.
The error message refers to “the list separator character” that is commonly known as a comma. The comma that we can’t use in Project resource names is technically known as U+002C, which is its number in the Unicode character set. There is another comma known as U+201A. They look the same, but they aren’t treated the same internally.
Microsoft Project doesn’t provide a way to directly enter Unicode characters, but the Character Map tool included with Windows gives you a way to copy a Unicode character to your copy/paste buffer. To invoke the Character Map utility, you can use the search box to find it.
Figure 2. Finding the Character Map utility.
Once the Character Map is visible, there are four quick and easy steps to follow:
1. Check the Advanced view checkbox.
2. Enter 201A in the Go to Unicode: textbox.
3. Click the Select button.
4. Click the Copy button.
Figure 3. Using the Character Map utility.
Now the special comma is stored in your copy/paste buffer ready to be pasted into the Resource Information dialog box.
Position your cursor where you want the comma to be and paste it using whatever technique you prefer. I used Ctrl-V.
Figure 4. Resource name field with a U+201A comma pasted after my name and before my title.
Figure 5. Assign Resources dialog box with commas and titles added to all resources.
When there is only one resource with a comma and a title, the workaround looks perfect. When there are multiple resources, the commas that are resource separators do not have a preceding space, which causes things to appear slightly jumbled up as shown in the screen capture below.
Figure 6. How the comma separated titles appear on the project chart.
Excel add-ins Data Explorer and GeoFlow work well together, mainly because they just work. Simple, fast, and powerful. I started Excel 2013, used Data Explorer to search for, examine, and then download latitude-longitude data and finally used GeoFlow to plot an interactive 3-D visualization. I didn’t use any fancy Excel commands and the entire process took less than 3 minutes.
You can download the GeoFlow preview from here. It can also be used with Office 365.
Start by clicking the DATA EXPLORER tab. Click Online Search.
Figure 1. DATA EXPLORER tab in Excel 2013 Office Professional Plus.
I entered latitude longitude mountains as a search string. Once you like the way a dataset looks in preview mode, click USE to download it into Excel.
Figure 2. Wikipedia’s list of French mountains.
While the data is downloading, the worksheet is gray.
Figure 3. Download in progress.
After the download completes, the colors are restored. Go to the INSERT tab and click Map. Select Launch GeoFlow (it is a Map menu item not shown).
Figure 4. Click Map under the GeoFlow icon on the INSERT tab.
Zoom and rotate as desired.
Figure 5. GeoFlow’s interactive globe.
I decided to plot the names of the mountains on the map, which is why Name was selected. Under GEOGRAPHY, I selected Other because I didn’t see a category name that really matched name. After making a GEOGRAPHY selection, click Map It.
Figure 6. Mapping by mountain name and coordinates.
The locations of the mountains are shown in 2D – not very exciting.
Figure 7. Initial 2-D map.
To make the map 3-D, the Height column from the spreadsheet was selected.
Figure 8. Adding mountain height provides the third dimension.
After rotating a bit, the mouse cursor was placed over the tallest peak, Mont Blanc. Notice the elevation in meters in the annotation box.
Figure 9. Mouseover showing name and height.
If you prefer a heat map, change the CHART TYPE.
Figure 10. CHART TYPE changed to HeatMap.
It’s really as simple as it looks. It just works.
The BETWEEN operator is a handy SQL construct, but it can cause
unexpected results when it isn’t understood. Consider the following code
where x between .9 and 1.10
One of the questions you should ask is this: What is x?
What if x has a float, real, or double data type? These data types do not store exact representations of numbers, only approximations. When 0.9 is stored in a real column or variable, it may be between 0.9 and 1.1. Or it may not. When you set a real to 0.9, internally it becomes something very close to but maybe not quite equal to 0.9. Even though you entered 0.9, it could be stored as something slightly less than 0.9.
Now consider this:
where x between 9 and 11
If x is an integer and is set to 9, it really is 9 and it will pass the test. Because of data types, never let it be said that all numbers are created equal.
Let's refactor the statement to be free of hardcoded values:
where x between y and z
What's wrong with that? I'll simplify it for you. Everything is an integer, x, y, and z are all declared with the integer data type. Once again, the question to ask is what are x, y, and z? Or to be a little clearer, ask what are the values of x, y, and z. The uncertainty arises over the potential values of y and z. Do you know that y will never be more than z? After performing substitutions, what if we have the following?
where 10 between 9 and 11
The statement would be true. But what if after performing substitutions we would have obtained this?
where 10 between 11 and 9
The statement would be false even though 10 is between 9 and 11. We see that order matters - no surprise if you've ever read the documentation. Order is easy to see and deal with when the values are hardcoded. But when you have variables, what if you really don't know a priori if y <= z? You need to think of BETWEEN as a number line. Look at these next two snippets to see what happens when the order of the BETWEEN values is changed:
where -10 between -9 and -11 -- this does NOT work
where -10 between -11 and -9 -- this works
Here is a workaround to this dilemma of not knowing the relative positions on y and z on the number line:
where x between y and z or x between z and y
I've left you with a workaround and a challenge. I like interactivity with my readers, so I ask you to post your workaround as a comment. Another workaround is to refactor the problem into checking for a +/- 10% change. Here's something you can copy and paste to get started:
declare @x int = 10;
declare @y int = 9;
declare @z int = 11;
select 'this works' where @x between @y and @z or @x between @z and @y;
There is one more thing that about the first code snippet shown above that caught my attention. The two numbers .9 and 1.10 violate the Joint Commission's guidelines on numbers used in prescriptions. The Joint Commission issues guidelines for healthcare institutions to prevent errors and not kill patients, among other things. There is a "Do Not Use" list of abbreviations and formatting styles that have been proven to cause medical errors. A number such as .9 is more likely to be seen as 9 than 0.9 is. Numbers formatted as 1.10 or 1.0 are also dangerous. The 1.0 is more likely to be misinterpreted as 10 than 1. is. Misinterpretations of decimal numbers have repeatedly been proven to kill patients. These formatting guidelines were originally developed for handwritten prescriptions. Some people think the guidelines should also be applied to computerized forms. I'm not suggesting you change your T-SQL code, but I thought those of you who do user interface design might find this discussion interesting.
Since Windows Vista, Windows users have had the ability to add clocks. I’m in the U.S. Central time zone and have no problem thinking about scheduling meetings with people in the Eastern and Pacific time zones. Hyderabad is a different matter because of the half hour. London is easy except right now when the U.S. is on Daylight time and the U.K. is not.
You can add two clocks using the Additional Clocks tab in Date and Time.
Figure 1. Additional Clocks tab.
Figure 2. Mouseover after adding clocks.
Figure 3. Left mouse click in System Tray after adding clocks.
While the extra clocks are interesting, I personally find Hyderabad needs more visibility. Go to Gadgets and either double-click the clock or right-click and select Add to add it to your desktop. Set the Options for the clock to assign a name to it.
Figure 4. Clock gadget.
Figure 5. Use Options to put a name on your desktop clock. Either right-click the clock and select Options or click the wrench icon.
Data Explorer will speed up and simplify your data analysis by at least an order of magnitude. It makes data just work for me. No fighting or struggling, it just works.
Step 1. Go here and download the Data Explorer Preview for Excel 2013 or Excel 2010.
Step 2. Install Data Explorer.
Step 3. Start Excel. Don’t hold back, go all out and Enable Advanced Query Editing.
Figure 1. Check Enable Advanced Query Editing.
Step 4. Open a workbook.
Step 5. Click DATA EXPLORER.
Figure 2. New DATA EXPLORER tab in Excel 2013. Click Online Search to quickly find data.
Step 6. Click Online Search. Search for something that interests you. I picked one of my research interests, infant mortality.
Figure 3. Search results showing online data sources.
Until you do this, you will not appreciate how ultra cool this really is. When you do a mouseover on any of your search results, the Preview tab shows you what the data will look like when pulled into Excel. True WYSIWYG.
Notice that the search result that has mouse focus has a USE hyperlink. Click USE to actually bring the data into Excel. Again, until you do this, it’s difficult to appreciate how well this feature works. Go take a look here at the data I selected. The text on the Wikipedia page was consumed and loaded into Excel. That is extremely powerful and useful.
Here’s how to use Data Explorer to navigate data on the web.
Figure 4. Click From Web and enter the a URL of a web page containing data.
The Query Navigator appears. Notice that it provides navigation to all of the consumable data sources on the page.
Figure 5. Query Navigator.
Easy. Powerful. Quick. Once Data Explorer was added to Excel, it took me much less than 2 minutes to search for my data, preview it, select it, import it, and then examine it further with the Navigator.
There are many reasons you can have different results in your production environment than in your development environment. In database applications, two major causes are differences in database schemas and differences in the data. Application code and user differences can also causes differences, but these are out of scope for this discussion. This discussion is limited to differences in the data and the database schema.
Using nursing as a paradigm, before diagnosis comes assessment. A thorough assessment must be done before an accurate diagnosis can be made. A proper assessment requires the proper tools. There are tools for comparing database schemas and tools for comparing data. Today I’m focusing primarily on database schema comparisons. Schema comparisons are usually easier to assess quickly and thoroughly than data comparisons. Because of that, I recommend starting with a schema comparison before performing a data comparison.
Ideally your database schema is version controlled, you have the latest version of Visual Studio, TFS, and maybe Red Gate’s SQL Source Control. You may even have Red Gate’s SQL Compare, a tool I have used extensively with great benefit at multiple clients. As Peter Schott points out below, using SSDT is a really great idea, assuming you are allowed to download it, which isn’t a given in environments with extremely tight control of admin rights. If you have all of these tools, you probably already know how to compare database schemas. As a consultant, I know that most shops aren’t doing version control of their databases. Many small shops don’t have Visual Studio at all. They also may not have a budget for any software purchases, no matter what the ROI is. Rational thought does not always prevail in budgetary matters.
I’ve seen just about every workaround possible at low budget clients, not all of them good. You don’t have to have two instances of SSMS open side by side to look at the production machine’s version of a stored procedure and the development machine’s version. You can split SSMS into two vertical windows within a single instance of SSMS. Go to the SSMS menu bar and select the Window submenu to split the query window vertically. Most of you know this, but I’ve been asked “how did you do that?”. I respect anybody who asks a question in order to learn.
Figure 1. SSMS with a vertical split showing development code on the left and production code on the right.
The split window approach has limitations. You need to already know where you should be looking and which object(s) you should be comparing. If you are so lucky as to know that, then the amount of code needs to be very small. Visually scanning from left to right and back again ad nauseum is inefficient if the number of statements to compare is large. The risk of missing a difference is significant.
If you don’t have any first class tools to compare your schemas (Visual Studio has a good facility for comparing code versions), I recommend saving the results to files. In the following example, I save each of the stored procedures above to files. Use the free WinMerge differencing tool to compare the files. It is available as a portable application in case you are prevented from installing applications on your machine.
Figure 2. WinMerge comparison of the two script files from SSMS.
It’s much easier to see differences when using a differencing tool. Your time is valuable and you need to use the right tools to be efficient and effective. Scripting an individual object to a file only works well when you have a small number of objects and you know which objects you need to compare. There is a feature to script your entire database schema from SSMS into one file. You can compare one schema’s file to another schema’s file.
I have had people tell me that they tried to script their database schema to a file but it only scripted the database file creations statements and none of the objects inside the database. That’s not the option I’m talking about here. Scripting the objects that are in the database is a different menu option altogether.
Figure 3. This is NOT the option to script all of the objects in a database.
Figure 4. How to script all of the objects in a database.
By default when database objects are created using this wizard, a descriptive header appears before each object. The header is a SQL comment that includes the date and time the object was scripted. That’s a problem because you won’t be creating all of the development object scripts at the exact same time you create the production object scripts. You’ll have a false positive difference for each object, which is quite annoying. Fortunately you can used the Advanced button on the third screen of the scripting wizard to prevent this problem from occurring in the first place.
Figure 5. click the Advanced button to suppress the object creation comments.
Locate the Include Descriptive Headers and set it to False.
Figure 6. Set Include Descriptive Headers to False.
Once you have a script for the development schema and another for the production schema, use WinMerge to do an entire database comparison, assuming you don’t have one of the first class comparison tools mentioned earlier.
Just to clarify, you should use a proper database comparison tool to troubleshoot problems that may be caused by unintended database schema differences. But if you don’t have the proper tools or you’re not allowed to download and install SSDT (there really are companies that have such highly restrictive policies), scripting the schema to a file and using WinMerge is an acceptable alternative that works sufficiently well. I highly recommend that after deploying schema changes to a production database, you should use some sort of process to compare schemas to make certain that the changes made were exactly as intended. Your primary comparison should be to compare the production schema before the change to the production schema after the change. If you find unintended differences and can’t resolve the discrepancies quickly, you can obtain a resume template from here.