THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

  • Microsoft Mathematics for Work and School

    Quite by accident, I discovered a really nice free software product from Microsoft called Microsoft Mathematics. You can download the standalone version from here. There is also a Microsoft Mathematics add-in for Microsoft Word and OneNote with the 2010 version found here and the 2013 version found here.

    image

    Figure 1. Standalone version of Microsoft Math showing the Formula and Equations dropdown menu selections.

    image

    Figure 2. Gravitational formula with Plot this equation menu option selected.

    image

    Figure 3. Graph clearly showing that the force of gravity declines with increasing distance.

    The add-in for Word and OneNote adds the Mathematics tab to the ribbon, which builds upon existing features for entering equations. The add-in provides computational and graphing features.

    image

    Figure 4. Mathematics tab selected with the mouse over the Equation menu.

    After selecting an equation, the DESIGN tab is added to the ribbon.

    image

    Figure 5. Microsoft Mathematics DESIGN tab in Word.

    image

    Figure 6. 3D graph showing how the area is a function of the square of the radius.

    The add-in provides an option to insert a graph into Word or OneNote.

    image

    Figure 7. Graph embedded into Word document.

    I found Microsoft Mathematics to be easy to use.

  • Hour Of Code

    Learn to program. It only takes an hour and it’s free. You can learn how to code at an Apple store on December 11, 2014 or the entire week at a Microsoft retail store. Learn more about this project here.

    Do you already know how to code? Teach someone who doesn’t. You can spare an hour to empower someone else.

  • Windows 8.1 Installation Notes

    If your machine doesn’t have a DVD drive, you’ll either need to do a network boot or boot from USB. Fortunately making bootable USB media from an iso file is much easier. Microsoft has a free utility shown below that does it painlessly.

    When installing Windows 8.1 or any operating system, you want to be thorough and get as much right the first time as you can. Since I wanted a clean, fresh install, I knew I’d be reinstalling all of my applications from scratch. This was a great opportunity to install an SSD. Whenever you get a new motherboard, you want the latest BIOS and whenever you get a new SSD, you want the latest firmware. That means you need to know what firmware you have. Update your firmware and BIOS before installing the operating system.

    There’s a lot of bad advice on the internet for finding the firmware version of your SSD. You can reboot your machine and watch the POST process and hopefully see the firmware version in the instant before it disappears. There’s a more relaxed way of finding the firmware version. Some people advise you to install a program that will show you the firmware version, but that’s not necessary. You should be able to get the information you need from Device Manager.

    image

    Figure 1. Open Disk drive in the Device Manager, select your SSD, right-click, select Properties.

    image

    Figure 2. Select the Details tab and then select Hardware Ids.

    image

    Figure 3. The firmware version of this SSD is 1.04

    I updated the firmware of my SSD to 1.05. I was very disappointed in Plextor’s firmware update utility. It displayed three identical messages about failing to “re-IDENTIFY”, whatever that is. I assumed that the upgrade didn’t work, so I ran the update utility again. It said I was at 1.05. Plextor should write a better utility.

    Since I have an iso file instead of media, I wanted a way to copy the iso to a USB device and boot from it. Fortunately Microsoft has a free utility for doing this here. Don’t worry about it saying Windows 7.

    image

    Figure 4. Microsoft ISO to bootable USB utility.

    image

    Figure 5. Pick your USB device.

    image

    Figure 6. Your USB device will be wiped clean.

    image

    Figure 7. Do you really have a choice if you want to do this?

    image

    Figure 8. How long this takes depends on the quality of the USB device.

    image

    Figure 9. All done!

    I did notice a significant performance difference between an old USB device and a new, high quality USB device in terms of I/O performance.

    Since Windows 8.1 is going onto a new drive, I decided to keep my Windows 7 installation. I’ll probably dual boot with a default for Windows 7. One OS will have SQL Server 2012, the other SQL Server 2014. Storage is affordable and I feel better being able to revert back to the Windows 7 system if necessary.

  • USMT for upgrade to Windows 8.1 and how it works with SQL Server

    I ordered a 512 GB SSD so that I can get great performance running Windows 8.1. I’ll have all of my spinning drives intact, so I’m not worried about actual data loss. The concern is in preserving things like IE favorites for every family member. The User State Migration Tool (USMT) has ScanState.exe to capture what you want to preserve and LoadState.exe to migrate the saved state to the new installation.

    Before forging ahead with USMT, it’s advisable to first check your PC for any potential issues. Download and run the Upgrade Assistant to find out if you have any compatibility issues. Go to this page to both obtain it and find out more about it.

    image

    Figure 1. Upgrade Assistant.

    I didn’t have any hardware or driver issues. The only issues were with programs I don’t want on the new Windows 8.1 installation anyway.

    USMT is found in the Windows Assessment and Deployment Kit (ADK) for Windows 8, which you download from here. If the installation defaults are used, you will NOT obtain the USMT as the screen capture shows.

    image

    Figure 2. ADK installation defaults. Notice that USMT is not selected.

    If you have particular applications that you are interested in migrating, you might want to also select the Application Compatibility Toolkit. I didn’t actually use it, but just wanted to make you aware of its existence.

    image

    Figure 3. ADK with USMT selected.

    Be sure to use the right executables for your machine’s architecture. Look in the appropriate folder for your processor.

    32-bit C:\Program Files (x86)\Windows Kits\8.0\Assessment and Deployment Kit\User State Migration Tool\x86
    64-bit C:\Program Files (x86)\Windows Kits\8.0\Assessment and Deployment Kit\User State Migration Tool\amd64

    I copied all of the files in the …\amd64 folder to C:\USMT on my Windows 7 machine. If you read the USMT documentation (which is advisable), it recommends that you take this approach. Open an administrative command prompt and navigate to that folder. Specify an output XML file. Here is my session:

    Microsoft Windows [Version 6.1.7601]
    Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

    C:\Windows\system32>cd ..\..

    C:\>cd USMT

    C:\USMT>scanstate /genmigxml:C:\USMT\Results\genMig.xml

    SCANSTATE.EXE Version 6.2.9200.16384
    (C) 2012 Microsoft Corporation. All rights reserved.

    Log messages are being sent to 'C:\USMT\scanstate.log'

    Starting the migration process

    I examined the XML file and realized that I had the following users on my machine that I’d completely forgotten about:

    MsDtsServer110
    MSSQLFDLauncher
    MSSQLSERVER
    MSSQLServerOLAPService
    SQLSERVERAGENT

    Taking the defaults when running LoadState would migrate these users. There’s no point in doing that because I’m going to be reinstalling SQL Server after installing Windows 8.1 from scratch.

    You can use the /ui switch to include only the specified users when running either ScanState or LoadState. There is also a /ue switch to exclude the specified users. Read more about the switches here. It’s a tradeoff between space and flexibility. If you take the defaults with ScanState, you grab everything. It takes more space, but you can migrate any user later. If you restrict what you capture with ScanState, you will use less space but can’t later LoadState for a user you didn’t capture with LoadState in the first place. My recommendation is to exclude the SQL Server users shown above from the ScanState. I definitely won’t be migrating them because my objective is to make a full backup of Windows 8.1 before installing any applications. I don’t want any clutter with superfluous user accounts in that initial system image.

  • Searching for database objects

    It’s easy to find what calls a stored procedure in SQL Server. Finding where it is called outside of SQL Server is an entirely different matter. I use great free utilities for either use case.

    Of course, you can use built-in features in SSMS to find out what calls a stored procedure, for example. Select an object in the Object Explorer, right-click, select View Dependencies.

    image

    Figure 1. View Dependencies in SSMS

    You can also script a solution, either in T-SQL or PowerShell. Dependencies are found in sys.sysdepends in SQL Server 2005 and above or sysdepends in SQL Server 2000. What I really like to use is the free SQL Search utility from Red Gate. It’s powerful, intuitive, works well, and free (just in case you missed that very important point).

    Things get more challenging when you need to know what items outside of SQL Server are referencing SQL Server objects. If you have .NET code, you can search all of your code using Visual Studio. But you might not have Visual Studio and even if you do, you could have a large collection of batch jobs (could be non-Microsoft) scattered around your filesystem as I experienced today.

    My favorite free tool for searching the filesystem is FileLocator Lite. I’ve been using it for many years, but didn’t blog about it when I first discovered it because it had an awful name – Agent Ransack. Try telling your software compliance people that you want some freeware named Agent Ransack – that’s just not going to happen. Fortunately, this great tool was rebranded as FileLocator Lite.

    I had to search all of the many files on a large network share for all of the stored procedures in a particular database. First I got a list of the names of all of the stored procedures.

    select name
    from sys.sysobjects
    where type = 'P';

    The list looked something like this:

    uspA
    uspB
    uspC
    uspD
    uspE
    uspF

    FileLocator Lite can search for multiple strings at once, provided they are separated by the correct Boolean operator, which is an OR in my case. It appeared to me that Boolean operators need to be in all uppercase to work properly. I used regular expressions with SSMS’s Search and Replace option to quickly edit my long list in a single pass.

    image

    Figure 2. Global search and replace using regular expressions.

    I substituted OR \n for \n and ended up with something like this which I pasted into FileLocator Lite:

    uspA
    OR uspB
    OR uspC
    OR uspD
    OR uspE
    OR uspF

    image

    Figure 3. FileLocator Lite search results.

    FileLocator Lite will search subfolders if you check the box to do so. It also supports regular expressions. What I find particularly useful is the context sensitive pane on the right where you can preview and see if the found string really is what you want. You can save your search criteria or your results to a file.

  • Accessibility, Section 508, and the Bond rule for character size

    When you are dealing with the United States federal government, accessibility isn’t a nice to have, it is the law. As I was researching the law, I came across a related item, the Bond (as in 007) rule, which was proposed by Sidney Smith in the journal Human Factors in 1977.

    The federal government has an entire website section508.gov describing the 1998 law that requires federal agencies to make their applications and websites accessible. I encourage you to go to the site and learn more. Of particular interest to application developers is the Technology Tools page. It contains a lot of great links. You’ll find references to many things things including some I’ve blogged about before – color blindness and epilepsy.

    Smith discovered that to be readable by the masses, a character needs to be at least .007 radians. That’s why some people refer to this as the Bond Rule. Using radians to specify character height provides a measurement independent of the viewing distance. To know the exact character height needed to be visible at a specific distance, multiply the viewing distance by 0.007 to find the character height in the same units as the viewing distance. If you have 20/40 vision and are reading text that is 100 cm away, the character size should be at least 0.7 cm (7 mm). 20/40 is used instead of 20/20 because 20/40 is more representative of the real world than the idealized 20/20.

  • Processing nonstructured data using FILESTREAM and FileTable

    SQL Server 2012 simplifies the processing of unstructured data found in files. The FileTable enhancements to FILESTREAM allow documents to be known to SQL Server full-text search by simply copying them to a FileTable network share. This is much more convenient than it was in SQL Server 2008 R2 where the files had to be explicitly loaded into SQL Server as blobs before they could be processed by full-text search.

    Although I’m a full-time IT consultant, I spend most of my spare time pursuing graduate studies in nursing. I’m always looking for ways to use technology to improve my efficiency as a student and also improve my effectiveness as a consultant. Some of the technical tips and tricks I’ve figured out for school have been useful to my clients in corporate environments. In this blog post, I’ve taken PowerPoint slides from my pharmacology course to evaluate the usefulness of using FileTable to process unstructured data found in the same kinds of files used in corporate environments.

    image

    Step 1. Access your SQL Server instance’s properties to enable FILESTREAM.

    The first checkbox is to enable FILESTREAM in SQL Server. The second checkbox is to allow Windows to read and write FILESTREAM data – in other words, make your files accessible to Windows. The third checkbox is to allow remote users to access your FILESTREAM data. This is the option for real world usage. Typically a SQL Server is on a remote machine, not locally installed.

    image

    Figure 2. Check everything if you want maximum functionality.

    Changing FILESTREAM settings requires a restart of the SQL Server service.

    You must create a FILESTREAM enabled database or add FILESTREAM to an existing database. For this blog post, I chose to create a new database.

    CREATE DATABASE School
    ON
    PRIMARY (NAME = School, FILENAME = 'd:\school\dbFiles\school.mdf')
    ,FILEGROUP FileStreamGroup
     CONTAINS FILESTREAM(NAME = schoolFilestream, FILENAME = 'd:\school\schoolFilestream')
     LOG ON (NAME = SchoolLog, FILENAME = 'd:\school\dbFiles\school.ldf');
    GO

    The folder for the data and log files must exist. The subfolder for the FILESTREAM must not exist and will be created when the database is created.

    image

    Figure 3. Folder structure showing that D:\school\dbFiles exists and D:\school\schoolFilestream does not exist.

    After creating the database with the above T-SQL statement, the D:\school\schoolFilestream folder is created as shown below. As the dialog box shows, it is considered a system folder by Windows.

    image

    Figure 4. Click Continue to access the new folder and its contents.

    image

    Figure 5. The new D:\school\schoolFilestream folder and its contents.

     

    image

    Figure 6. FILESTREAM Directory Name (which must NOT be a full path) and Non-Transacted Access options.

    The options specified created a folder fileContainer under a new mssqlserver share as shown below.

    image

    Figure 7. Network showing the fileContainer FILESTREAM directory name under the default share name of mssqlserver.

    At this point the fileContainer folder is empty. Time to create a FileTable. Notice that the CREATE TABLE statement does not specify any columns. The default is to create a folder with the same name as the FileTable, but the example shows how to use FILETABLE_DIRECTORY to override the default and use pharmFiles for the folder name. Notice that the WITH clause is also used to create a unique constraint.

    CREATE TABLE pharm AS FileTable
    WITH (FILETABLE_DIRECTORY = N'pharmFiles'
             ,FILETABLE_STREAMID_UNIQUE_CONSTRAINT_NAME = ui_file_stream);

     

    image

    Figure 8. The CREATE TABLE statement created a new folder pharmFiles.

    Within SSMS, you can see that a FileTable has predefined columns.

    image

    Figure 9. FileTable columns are predefined.

    Now it is time to copy all of the pharmacology PowerPoint files to the pharmFiles folder on the network share.

    image

    Figure 10. PowerPoint files added to the previously empty pharmFiles folder.

    Let’s examine the contents of the pharm FileTable after the file copy.

    image

    Figure 11. Notice that SQL Server recognizes all of the files that were copied to the pharmFiles folder on the mssqlserver share.

    Notice that the files are of type pptx. SQL Server 2012 by default does not index pptx files. Use the following query to find out what file types are indexed.

    SELECT * FROM sys.fulltext_document_types
    ORDER BY document_type;

    It is necessary to add an IFilter, which you will probably find is more than a one step process. As of the date of publication, the latest and greatest IFilter pack is the Microsoft Office 2010 Filter Pack (a.k.a. Filter Pack 2.0). It’s possible that when you read this, a newer version may be available, so do your due diligence and check. After installing the Filter Pack, you may need to install a service pack. There’s more than one service pack that has been released for Filter Pack 2.0. I will not provide any links because I don’t want to provide you with outdated information. Microsoft recommends that you obtain service packs by running Windows Update, which is what I did. As a best practice, after installing something from Windows Update, you should check for additional updates that could be required after what you just finished installing. I had three consecutive updates to do. You could have more or less than that.

    The Office Filter Pack does not have a dependency on Microsoft Office. You don’t need to have Office installed on your SQL Server machine for the Filter Pack to work. It’s not a good idea to have Office installed on a server.

    After installing and updating the Filter Pack, run the following command to load the new filters.

    exec sp_fulltext_service 'load_os_resources', 1;

    Update the system metadata about what filters are installed.

    exec sp_fulltext_service 'update_languages';

    Restart the filter daemon.

    exec sp_fulltext_service 'restart_all_fdhosts’;

    Confirm that the new filters were installed.

    SELECT * FROM sys.fulltext_document_types
    ORDER BY document_type;

    Create a fulltext catalog and a fulltext index.

    CREATE FULLTEXT CATALOG pharmCat AS DEFAULT;

    CREATE FULLTEXT INDEX ON pharm (file_stream TYPE COLUMN file_type) KEY INDEX ui_file_stream;

    Here’s a query to find out what files in the pharm folder contain the drug name Vancomycin.

    SELECT *
    FROM dbo.pharm
    WHERE CONTAINS (file_stream, 'Vancomycin');

    A single row is returned because only one file, Anti-infective.pptx mentions that drug. A future post will address using FileTable to make sense of unstructured data.

  • [OT] Project Tuva

    While researching an upcoming blog post on using SQL Server 2012 FileTable, I stumbled across Project Tuva. Bill Gates bought the rights to some of Nobel Laureate Richard Feynman’s physics lectures and made them available for free at http://research.microsoft.com/apps/tools/tuva/.

    Speaking of recorded lectures, I highly recommend that students listen to podcasts of lectures during their commutes. The main reason I made it through pharmacology and obstetrics was because I listened to the podcasts of my lectures several times during my normal commuting. The repetition really reinforced the topics. From repetition came familiarity with the material and from the familiarity came understanding.

  • Why ICD codes matter to data professionals

    The International Classifications of Diseases (ICD) is a global standard administered and copyrighted by the World Health Organization (WHO). The 10th revision, ICD-10 is the current revision. Some countries adapt the WHO standard. In the United States, the National Center for Health Statistics (NCHS) has made two modifications to ICD-10 known as ICD-10-CM (Coordination and Maintenance) and ICD-10-PCS (Procedure Coding System). ICD-10 was originally scheduled to replace ICD-9 as the U.S. standard on October 1, 2013, but the implementation date was changed to October 1, 2014.

    Systems must be modified to accommodate this new standard. Schemas must be changed, mappings between old and new must take place. Much work is yet to be done, which is why implementation was delayed by a full year.

    The granularity and specificity of ICD-10 codes has been ridiculed by the press. It is helpful to understand ICD-10 instead of laugh at it if you want to win ICD conversion business. Code W22.02XA is the code for “walked into a lamppost, initial encounter” and code W22.02XD is the code for “walked into a lamppost, subsequent encounter”. In clinical parlance, an initial encounter is the first time the patient seeks treatment. Followup visits about the same condition are known as subsequent encounters. The term “subsequent encounter” has nothing to do with how many times a patient has walked into a lamppost, which some people have misinterpreted.

    The number of codes increased by an order of magnitude when going from ICD-9 to ICD-10. Data professionals should be able to understand and appreciate the intention behind having more specificity to diagnosis codes. If only W22 (striking against a stationary object) was used, the data would be too vague to help with prevention efforts. Just for sake of discussion, let’s consider what if the data showed that most people who walk into something actually walk into lampposts instead of walls (W22.01XA) and furniture (W22.03XA). In this contrived example, it might be in the public interest to see if it would be cost effective to develop a lamppost injury prevention program (just kidding trying to make a point about data mining).

    Fine grained data lends itself to analysis. Federal payment programs (Medicaid and Medicare) and private insurance companies are already scrutinizing diagnosis codes and withholding payments in some cases. If a patient without an infection is admitted to a hospital and contracts an infection while being treated in the hospital (this is called a nosocomial infection), payment for treating the infection is likely to be denied. This gives hospitals an incentive to be more diligent in preventing infections.

    Having more detail gives hospitals and practitioners an improved ability to indicate the extra complexity of a case and get reimbursed at a higher rate. This could give providers an incentive to convert to ICD-10 perhaps even ahead of the October 1, 2014 deadline. More codes means more data, which can lead to projects to expand and optimize systems to accommodate the impending data explosion.

  • NoSQL, MUMPS, HL7, SNOMED, Meaningful Use and Electronic Medical Records

    MUMPS is a NoSQL database and programming language that does not receive widespread attention. I searched for lists of the most important or most used NoSQL databases and MUMPS did not make the lists, although it should.

    The Massachusetts General Hospital Utility Multi-Programming System, often referred to as just M, is a key-value database that was developed in 1966. MUMPS became the foundation for the Decentralized Hospital Computer Program (DHCP) developed at the U.S. Veterans Health Administration (VHA). Notice that I said developed at the VHA, not by the VHA. At one time there were direct orders prohibiting further development of this system and work continued in secret. Eventually, DHCP became VistA, which is one of the most widely used Electronic Medical Records (EMR) systems in the world.

    VistA is not open source software, it is public domain software. That means that anybody can do anything to it, including making proprietary changes under no obligation to give the changes back to the public. Most VistA implementations use the SQL friendly Caché version of MUMPS from InterSystems, which you can download for free. Caché is also used by the dominant commercial EMR vendor, Epic Systems, as well as TD Ameritrade and the European Space Agency.

    VistA does not include a gui. At VA Hospitals, the Computerized Patient Record System (CPRS) is a separate application that provides a minimalist gui on top of VistA. VistA supports the Health Level Seven (HL7) messaging standards used for healthcare informatics interoperability. It also supports SNOMED CT, a healthcare taxonomy supporting exchange of healthcare data. The VA has an ongoing project to enable VistA to support both ICD-10 and Meaningful Use. ICD codes are used to classify and categorize both medical diagnoses and inpatient procedures. Meaningful Use is part of the Patient Protection and Affordable Care Act (PPACA, otherwise known as Obamacare) and affects the reimbursement rates healthcare providers receive.

    SNOMED CT, HL7, and ICD-10 are standards used around the world in healthcare. Meaningful Use is specific to the United States. I’ll be discussing all of these topics in greater detail in future blog posts.

  • De-identification of Personal Health Information

    Many people have asked me various questions about Personal Health Information (PHI) which is covered by the Health Insurance Portability and Accountability Act (HIPAA, not HIPPA). The process of removing data (de-identification or anonymization) that could violate someone’s privacy is  complex. This is particularly true when there is unstructured data (i.e., free text). The U.S. Department of Health and Human Services has detailed guidelines on proper de-identification techniques, which are found here or at Bing’s cached copy here.

    I found the guidelines to be very informative. The discussion on zip codes was interesting. Zip codes, particularly in areas that aren’t densely populated, have to be abbreviated to the first three digits. Even when you restrict a zip code to the first three digits, there is a list of 17 specific three digit zip codes that you cannot use at all.

    The document says that age must be removed from a patient’s record if the patient’s age is greater than 89. Can you imagine a patient summary beginning with “The patient is a 107 year old man…”? For supercentenarians, age does provide a clue as to who they are. Changing a patient’s age or date of birth helps greatly in de-identification, but care must be taken. You don’t want to make an adult a minor or vice-versa.

    Consider this statement in the medical record: “The patient became ill after eating a [insertNameOfReligiousHolidayHere] meal.” One could argue that removing the name of a religious holiday makes for a neutral record. That might be the appropriate thing to do, but there could be clinical value in knowing the religious holiday or the religion of the patient. It could be useful to know if certain things would be eaten or definitely not eaten.

    There are published algorithms for processing textual data and de-identifying it. You can download Perl regular expression scripts from PhysioNet for free. The download also includes several dictionaries that the scripts use. Notice there is a dictionary of medical terms and several dictionaries of people’s names. Obviously there is value in knowing if a word is a person’s name or a medical term. DeBakey appears in the SNOMED dictionary as a medical term. There is a DeBakey clamp, a DeBakey pump, and a DeBakey graft. But what if the patient’s name was DeBakey? DeBakey does not appear in the dictionary of common names. Would the scripts recognize DeBakey as a medical term and not remove what actually is the patient’s name?

    A known weakness of processing textual data against dictionaries is misspelled words and names. Some names are particularly difficult to spell and will not always be found in the dictionaries because of the inevitable misspellings.

  • Shakespeare and storing Unicode characters

    This post is about the political issues involved with using multiple languages in a global organization and how to troubleshoot the technical details. The CHAR and VARCHAR data types are NOT suitable for global data. Some people still cling to CHAR and VARCHAR justifying their use by truthfully saying that they only take up half the space of NCHAR and NVARCHAR data types. But you’ll never be able to store Chinese, Korean, Greek, Japanese, Arabic, or many other languages unless you either use NCHAR and NVARCHAR which are double byte Unicode data types or use specific collations with CHAR and VARCHAR. Using collations is something I will address in a future post. Using CHAR and VARCHAR with collations does not save space and is trickier, which is why this post is focusing on using Unicode.

    In troubleshooting a problem displaying Chinese characters, I need a test character to experiment with. I picked 坡 (decimal 22369, hexadecimal 5761) as my test character. Why? Because it looked Chinese to me! Now doesn’t that remind you of Act 1, Scene 2 of William Shakespeare’s Julius Caesar where Casca says to Cassius “but, for mine own part, it was Greek to me”? It really looked Chinese to me, but then, what do I know? That presented a problem. What if it meant something really offensive? After consulting a Chinese immigrant, I was relieved to know that my randomly chosen character was politically correct. And just to be extra safe, I double checked with fellow SQL Server MVP Greg Low. Thanks, Greg.

    I wrote the following test code to show you what happens when Unicode (which is always double byte) is implicitly converted to single byte CHAR or VARCHAR.

    create table #c (sqlString NVARCHAR(30), sqlChar NCHAR(1));

    insert into #c values (N'insert into #c values ( ''坡'')', '坡'); -- invalid single byte syntax

    insert into #c values (N'insert into #c values (N''坡'')', N'坡'); -- valid double byte Unicode syntax

    select
      sqlString
    , sqlChar
    , UNICODE(sqlChar) as N'UNICODE(sqlChar)'
    from #c;

    drop table #c;

    sqlString                      sqlChar UNICODE(sqlChar)
    ------------------------------ ------- ----------------
    insert into #c values ( '坡')       ?     63
    insert into #c values (N'坡')       坡 22369

    Here is another test you can run:

    select'坡' , N'坡'

    The point is that if you start with a Unicode character such as 坡 but you insert it as '坡', it is no longer a double byte character. You’ve crammed it into a single byte and corrupted it. You have to insert it as N'坡' to prevent corruption and you must have a double byte destination for it. By the way, the N stands for National Language. You may also see references to National Language Setting(s) or even National Character Set (NCS).

    The Character Map utility in Windows is helpful when working with Unicode characters.

    image

    Figure 1. Character Map utility in Windows.

    Personally I prefer the richer functionality of BabelMap, which is available as a free download, online application, or portable application.

    image

    Figure 2. BabelMap free Unicode utility.

  • Adding a comma to a resource name in Microsoft Project

    Microsoft Project does not allow a comma to be added to a resource name. In healthcare, the norm is to refer to people using the pattern of Name, Title which in my case is John Cook, RN. Not all commas are equal. By substituting a different comma for the one Project doesn’t like, it’s possible to add a comma to a resource name.

    image

    Figure 1. Error message after trying to add a comma to a resource name in Microsoft Project 2013.

    The error message refers to “the list separator character” that is commonly known as a comma. The comma that we can’t use in Project resource names is technically known as U+002C, which is its number in the Unicode character set. There is another comma known as U+201A. They look the same, but they aren’t treated the same internally.

    Microsoft Project doesn’t provide a way to directly enter Unicode characters, but the Character Map tool included with Windows gives you a way to copy a Unicode character to your copy/paste buffer. To invoke the Character Map utility, you can use the search box to find it.

    image

    Figure 2. Finding the Character Map utility.

    Once the Character Map is visible, there are four quick and easy steps to follow:

    1. Check the Advanced view checkbox.

    2. Enter 201A in the Go to Unicode: textbox.

    3. Click the Select button.

    4. Click the Copy button.

    image

    Figure 3. Using the Character Map utility.

    Now the special comma is stored in your copy/paste buffer ready to be pasted into the Resource Information dialog box.

    Position your cursor where you want the comma to be and paste it using whatever technique you prefer. I used Ctrl-V.

    image

    Figure 4. Resource name field with a U+201A comma pasted after my name and before my title.

    image

    Figure 5. Assign Resources dialog box with commas and titles added to all resources.

    When there is only one resource with a comma and a title, the workaround looks perfect. When there are multiple resources, the commas that are resource separators do not have a preceding space, which causes things to appear slightly jumbled up as shown in the screen capture below.

    image

    Figure 6. How the comma separated titles appear on the project chart.

  • Excel 2013 Data Explorer and GeoFlow make 3-D maps quick and easy

    Excel add-ins Data Explorer and GeoFlow work well together, mainly because they just work. Simple, fast, and powerful. I started Excel 2013, used Data Explorer to search for, examine, and then download latitude-longitude data and finally used GeoFlow to plot an interactive 3-D visualization. I didn’t use any fancy Excel commands and the entire process took less than 3 minutes.

    You can download the GeoFlow preview from here. It can also be used with Office 365.

    Start by clicking the DATA EXPLORER tab. Click Online Search.

    image

    Figure 1. DATA EXPLORER tab in Excel 2013 Office Professional Plus.

    I entered latitude longitude mountains as a search string. Once you like the way a dataset looks in preview mode, click USE to download it into Excel.

    image

    Figure 2. Wikipedia’s list of French mountains.

    While the data is downloading, the worksheet is gray.

    image

    Figure 3. Download in progress.

    After the download completes, the colors are restored. Go to the INSERT tab and click Map. Select Launch GeoFlow (it is a Map menu item not shown).

    image

    Figure 4. Click Map under the GeoFlow icon on the INSERT tab.

    Zoom and rotate as desired.

    image

    Figure 5. GeoFlow’s interactive globe.

    I decided to plot the names of the mountains on the map, which is why Name was selected. Under GEOGRAPHY, I selected Other because I didn’t see a category name that really matched name. After making a GEOGRAPHY selection, click Map It.

    image

    Figure 6. Mapping by mountain name and coordinates.

    The locations of the mountains are shown in 2D – not very exciting.

    image

    Figure 7. Initial 2-D map.

    To make the map 3-D, the Height column from the spreadsheet was selected.

    image

    Figure 8. Adding mountain height provides the third dimension.

    After rotating a bit, the mouse cursor was placed over the tallest peak, Mont Blanc. Notice the elevation in meters in the annotation box.

    image

    Figure 9. Mouseover showing name and height.

    If you prefer a heat map, change the CHART TYPE.

    image

    Figure 10. CHART TYPE changed to HeatMap.

    It’s really as simple as it looks. It just works.

  • Problems using BETWEEN

    The BETWEEN operator is a handy SQL construct, but it can cause unexpected results when it isn’t understood. Consider the following code snippet:

    where x between .9 and 1.10

    One of the questions you should ask is this: What is x?

    What if x has a float, real, or double data type? These data types do not store exact representations of numbers, only approximations. When 0.9 is stored in a real column or variable, it may be between 0.9 and 1.1. Or it may not. When you set a real to 0.9, internally it becomes something very close to but maybe not quite equal to 0.9. Even though you entered 0.9, it could be stored as something slightly less than 0.9.

    Now consider this:

    where x between 9 and 11

    If x is an integer and is set to 9, it really is 9 and it will pass the test. Because of data types, never let it be said that all numbers are created equal.

    Let's refactor the statement to be free of hardcoded values:

    where x between y and z

    What's wrong with that? I'll simplify it for you. Everything is an integer, x, y, and z are all declared with the integer data type. Once again, the question to ask is what are x, y, and z? Or to be a little clearer, ask what are the values of x, y, and z. The uncertainty arises over the potential values of y and z. Do you know that y will never be more than z? After performing substitutions, what if we have the following?

    where 10 between 9 and 11

    The statement would be true. But what if after performing substitutions we would have obtained this?

    where 10 between 11 and 9

    The statement would be false even though 10 is between 9 and 11. We see that order matters - no surprise if you've ever read the documentation. Order is easy to see and deal with when the values are hardcoded. But when you have variables, what if you really don't know a priori if y <= z? You need to think of BETWEEN as a number line. Look at these next two snippets to see what happens when the order of the BETWEEN values is changed: 

    where -10 between  -9 and -11 -- this does NOT work
    where -10 between -11 and  -9 -- this works

    Here is a workaround to this dilemma of not knowing the relative positions on y and z on the number line:

    where x between y and z or x between z and y

    I've left you with a workaround and a challenge. I like interactivity with my readers, so I ask you to post your workaround as a comment. Another workaround is to refactor the problem into checking for a +/- 10% change. Here's something you can copy and paste to get started:

    declare @x int = 10;
    declare @y int =  9;
    declare @z int = 11;

    select 'this works' where @x between @y and @z or @x between @z and @y;

    There is one more thing that about the first code snippet shown above that caught my attention. The two numbers .9 and 1.10 violate the Joint Commission's guidelines on numbers used in prescriptions. The Joint Commission issues guidelines for healthcare institutions to prevent errors and not kill patients, among other things. There is a "Do Not Use" list of abbreviations and formatting styles that have been proven to cause medical errors. A number such as .9 is more likely to be seen as 9 than 0.9 is. Numbers formatted as 1.10 or 1.0 are also dangerous. The 1.0 is more likely to be misinterpreted as 10 than 1. is. Misinterpretations of decimal numbers have repeatedly been proven to kill patients. These formatting guidelines were originally developed for handwritten prescriptions. Some people think the guidelines should also be applied to computerized forms. I'm not suggesting you change your T-SQL code, but I thought those of you who do user interface design might find this discussion interesting.

     

     

     

     

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement