Last February I wrote an article for Simple-Talk called Let PowerShell do an Inventory of your Servers. This script was useful to me then, and it continues to be so, but it recently ran into a glitch.
Here is the code in the script to gather data on each file used by a database:
$fgs = $db.FileGroups
foreach ($fg in $fgs) {
$files = $fg.Files
$outnm = ".\" + $svr + "\" + $instnm + $dbtype + "_" + $dbname + "_DataFiles.csv"
$files | select $db.Name, Name, FileName, Size, UsedSpace | export-csv -path $outnm -noType
}
It's nice and succinct, and as "set-based" as it can be, given that it has to iterate through the file groups. The problem is that underneath the hood, SMO sends a set of queries to SQL Server to create a temp table, then run "DBCC ShowFileStats", then join that temp table with sys.filegroups and sys.master_files catalog views to return the results in the Files collection.
The problem is that the underlying code contains a WHERE clause filtering the results FOR EACH FILE.
When this command runs the Reads counter (on my system) in Profiler averages around 78000, the CPU between 300 and 500, and Duration between 700 and 900, and this is run for each file. At a client site one database has over 10,000 files spread across almost 400 filegroups. I created a similarly configured database on my test system and the above code took about 2 hours to run. I replaced it with the following code:
$df = new-object "System.Data.DataSet" "dsConfigData"
$cn = new-object system.data.SqlClient.SqlConnection("Data Source=$inst;Integrated Security=SSPI;Initial Catalog=$dbname");
$q = "create table #tmpspc"
$q = $q + " ("
$q = $q + " Fileid int,"
$q = $q + " FileGroup int,"
$q = $q + " TotalExtents int,"
$q = $q + " UsedExtents int,"
$q = $q + " Name sysname,"
$q = $q + " FileName nchar(520)"
$q = $q + " ) "
$q = $q + "insert #tmpspc EXEC ('dbcc showfilestats')
"
$q = $q + " SELECT s.name AS [Name],"
$q = $q + " s.file_id AS [ID],"
$q = $q + " s.physical_name AS [FileName],"
$q = $q + " s.size * CONVERT(float, 8) AS [Size],"
$q = $q + " CASE when s.max_size = -1 then -1"
$q = $q + " else s.max_size * CONVERT(float, 8)"
$q = $q + " END AS [MaxSize],"
$q = $q + " CAST(CASE s.is_percent_growth"
$q = $q + " WHEN 1 THEN s.growth"
$q = $q + " ELSE s.growth * 8"
$q = $q + " END AS float) AS [Growth],"
$q = $q + " s.is_media_read_only AS [IsReadOnlyMedia],"
$q = $q + " s.is_read_only AS [IsReadOnly],"
$q = $q + " CAST(case s.state"
$q = $q + " when 6 then 1"
$q = $q + " else 0"
$q = $q + " end AS bit) AS [IsOffline],"
$q = $q + " s.is_sparse AS [IsSparse],"
$q = $q + " CAST(CASE when s.growth = 0 THEN 99"
$q = $q + " ELSE s.is_percent_growth"
$q = $q + " END AS int) AS [GrowthType],"
$q = $q + " CAST(CASE s.file_id"
$q = $q + " WHEN 1 THEN 1"
$q = $q + " ELSE 0"
$q = $q + " END AS bit) AS [IsPrimaryFile],"
$q = $q + " CAST(tspc.UsedExtents * convert(float, 64) AS float) AS [UsedSpace],"
$q = $q + " ( tspc.TotalExtents - tspc.UsedExtents ) * convert(float, 64) AS [AvailableSpace]"
$q = $q + " FROM sys.filegroups AS g"
$q = $q + " INNER JOIN sys.master_files AS s ON ( s.type = 0"
$q = $q + " and s.database_id = db_id()"
$q = $q + " and ( s.drop_lsn IS NULL )"
$q = $q + " )"
$q = $q + " AND ( s.data_space_id = g.data_space_id )"
$q = $q + " LEFT OUTER JOIN #tmpspc tspc ON tspc.Fileid = s.file_id
"
$q = $q + " drop table #tmpspc"
$daf = new-object "System.Data.SqlClient.SqlDataAdapter" ($q, $cn)
$daf.Fill($df)
$dtFiles = new-object "System.Data.DataTable" "dtFileData"
$dtFiles = $df.Tables[0]
$outnm = ".\" + $svr + "\" + $instnm + $dbtype + "_" + $dbname + "_DataFiles.csv"
$dtFiles | select $db.Name, Name, FileName, Size, UsedSpace | export-csv -path $outnm -noType
Once this code was in place I ran it and the process that took 2 hours originally finished in 9 seconds.
I've filed a Connect item for this here.
Just remember the important thing is that there's not ever just one tool to use. Understand when it's better to use T-SQL than PowerShell, or vice versa.
Allen