Category Archives: Infrastructure

Windows Server 2019, SharePoint Server 2019 and a broken Blob Cache

I am one of those SharePoint early adopters. My first MOSS 2007, SP2010 and SP2013 commercials projects all went live before the main product hit General Availability, so I’m very much used to those niggling issues you get before a product hits the shelves.

However, in this case I believe I have found a configuration bug relating to commercial GA products.

The builds where this error is known to occur (and I have replicated this on a brand new VM build just to double check) are:

  • Windows Server 2019 Standard (Version: 1809 | Build: 17763.168)
  • SharePoint Server 2019 (Version: 16.0.10337.12109 – RTM Build)

Update : I have also today replicated the same issue on a fully-patch server setup (updated on 22nd January 2019):

  • Windows Server 2019 Standard (Version: 1809 | Build: 17763.253)
  • SharePoint Server 2019 (Version: 16.0.10339.12102 – December 2018 CU)

Disclaimer : My findings below are purely what I have observed with my own installs. Your own mileage may vary, but if you DO see this happening, then I have a fix described which should get you rolling again!

For those who aren’t sure if their BlobCache settings are actually correct – you should definitely make sure your web.config values are correct by checking out the Microsoft Docs article: Configure cache settings for a web application in SharePoint Server

So what is the problem with BlobCache, exactly?

In a nutshell what I have observed is that the BlobCache will not initialise. The designated blobcache folder is never populated and features which rely on it do not work (i.e. SharePoint Publishing Cache and Image Renditions).

I should note that this is not a permission problem with the actual blobcache folder itself (the default location being C:\BlobCache\14). I triple checked the permissions and they were all correct. In fact if you delete that folder and perform an IISRESET you’ll even find it gets re-created .. just it won’t have any contents (and “blob caching” as a function will not work).

I could not find any related issues in the browser console, ULS logs or IIS logs. The only error I found was in the Windows Event Viewer (under “Web Content Management”, Event ID 5538).

An error occured in the blob cache.  The exception message was ‘Retrieving the COM class factory for remote component with CLSID {2B72133B-3F5B-4602-8952-803546CE3344} from machine <machinename> failed due to the following error: 80070005 <machinename>.’.

Anyone with a history of COM errors will probably get an instant shiver and feelings of dread about going through DCOM configuration! And you would be exactly right .. this is a DCOM error. The error code (80070005) also tells us that this is an “Access Denied” error .. so we know that this relates to permissions.

In order to test this premise I did something very simple:

  • Add the application pool accounts to Local Administrators

I performed the usual IISRESET /noforce .. and voila! BlobCache started working .. hurrah!

But before you cheer, this really is NOT an ideal scenario to be in (app pool accounts should NOT be in the local admin group). So I removed those accounts from the administrators group .. and kept on digging ..

DCOM and Registry Permissions .. what decade is it again?

So yes .. this feels like the “good old days” when you wasted days tracking down erroneous GUIDs and DCOM config weirdisms .. and it seems in the year of 2019 this is going to pester us all once again.

However, as is always the case with COM components, things are not as straight forward as they might at first seem.

Having done some searching around for that CLSID (2B72133B-3F5B-4602-8952-803546CE3344) it appears to be a reference to a remote configuration component for IIS. This is NOT a DCOM component. However, much google-fu later (after reading a whole bunch of posts about automating Azure deployments .. yeh it seems the error above is fairly common when dealing with IIS config) I found that this is actually a COM class which  tries to call a DCOM components. Specifically the Application Host Admin API for IIS 7.0 (also known as “ahadmin”).

The error code we are seeing (80070005 = access denied) means the application pool account does not have permissions to launch or execute this DCOM component.

Now .. this is the point where, if you haven’t had to deal with DCOM permissions before, you might get a little stuck. You see, if you open up Component Services, find the “ahadmin” DCOM config entry .. you will notice that the “Security” options are all greyed out. You can’t modify them.

So now .. we need to ninja up our Registry skills.

We can see from the Component Services window the “Application ID” of “ahadmin” is {9fa5c497-f46d-447f-8011-05d03d7d7ddc}. So this allows us to build the Registry Key where this particular application is defined:

So we can launch RegEdit and look for the following path:

HKLM\SOFTWARE\Classes\AppID\{9fa5c497-f46d-447f-8011-05d03d7d7ddc}

(you will know it is the right one because the default key will read “ahadmin”). Right click on the node and select “Permissions”, and go to “Advanced” settings (you might need to wait a while for it to resolve all of the security principals before “Advanced” becomes available).

What you will need to do is two things:

  • Change the “Owner” (by default this will be “TrustedInstaller” – I changed this to the local “Administrators”)
  • Update permissions for “Administrators” so that they have “Full Control” (in addition to “Read”).

Having done this, reload “Component Services” and you should find that you can now edit the “ahadmin” DCOM permissions.

Change the following permissions for “Launch and Activation” as well as “Access Permissions”

  • Grant the local WSS_WPG group “Local” and “Remote” permissions (i.e. all of them)

Note – The WSS_WPG (Windows SharePoint Services – Worker Process Group) is a special group which SharePoint maintains to include all of the designated “application pool accounts” for the SharePoint Web Applications. So it is much better to use this group rather than permission each account individually. 

Now you just need a trusty IISRESET and we will have taken care of that nasty COM permissions error …

What do you mean .. more?

Yeh .. sorry .. you will probably find having done all that your BlobCache STILL doesn’t work.

This time however, the error in Windows Event Viewer will have changed (same “Web Content Management” category, and Event ID 5538)

An error occured in the blob cache.  The exception message was ‘Filename: redirection.config’

 

Error: Cannot read configuration file due to insufficient permissions

This one is a little bit easier to resolve, and should be a fairly quick solve.

It is referring to the “redirection.config” in the IIS config folder.

C:\Windows\System32\inetsrv\Config

Change the permissions on that folder and grant the same “WSS_WPG” local group “read” access to that folder (it doesn’t need more than “read” as it won’t be making changes to the IIS config files).

One more IISRESET and you should be done!

Voila … BlobCache joy once more!

BlobCache should now be correctly populating, and your web applications in SharePoint should be creating Blob Cache entries specific to their web.config entries.

The Windows Event Viewer should now show that it is creating the Blob Cache folders correctly (this time a slightly different Event ID : 7358)

Creating new cache folder ‘C:\BlobCache\14\1381895183\iXPJhtlC+ECvfzPRK3EHMA\’.

And you can see this folder created happily in the file system

And Image Rendition editing should also be working…

(and serving the correct images)

Summary

So for a quick summary .. we had to do some DCOM Config changes, Registry Key permissions and IIS config permissions.

Ironically this error was nothing to do with file system permissions to the Blob Cache folder itself (which is typically the only thing which has needed troubleshooting).

So .. quick crib-sheet:

  • Take ownership of the “ahadmin” Registry Key, and grant admins Full Control
  • Grant WSS_WPG {Launch | Activation | Access} permissions for the “ahadmin” DCOM component, accessed from “Component Services”
  • Grant WSS_WPG “read” permissions to the IIS “Config” folder in the file system

Hopefully this article was helpful! Chime in the comments if you have any questions (or if you’ve seen this issue before yourself!)

Update – 23/01/19

One of my colleagues at Content and Code reached out to Microsoft to report this issue and find out if they were aware of it.

Suffice to say Microsoft are indeed aware of this issue and are currently working on a fix.

SharePoint Search {User.Property} Query Variables and Scalability

This was something I stumbled into when working on a large global Intranet (>100k user platform) being built on SharePoint 2013. This is a WCM publishing site using “Search Driven” content leveraging Managed Metadata tagging combined with {User.Property} tokens to deliver “personalised” content. Now .. if there were 2 “buzz words” to market SharePoint 2013 they would be “search driven content” and “personalised results”, so I was surprised at what I found.

The Problem

So we basically found that page load times were >20 seconds and our SharePoint Web Servers were maxed out at 100% CPU usage. The load test showed that performance was very good with low load, but once we started ramping the load up CPU usage went up extremely quickly and rapidly ended up being almost un-usable.

It is worth bearing in mind that this is a completely “cloud friendly” solution, so zero server-side components, using almost exclusively “out of the box” web parts (mostly “Search Result Web Parts”, they would have been “Content by Search” but this was a Standard SKU install). We also use Output caching and blob caching, as well as minified and cached assets to slim down the site as much as possible,

Also worth noting that we have 10 (ten) WFE servers, each with 4 CPU cores and 32GB RAM (not including a whole battery of search query servers, index servers, and other general “back-end” servers). So we weren’t exactly light on oomph in the hardware department.

We eventually found it was the search result web parts (we have several on the home page) which were flattening the web servers. This could be easily proved by removing those web parts from the page and re-running our Load Tests (at which point CPU load dropped to ~60% and page load times dropped to 0.2 sec per page even above our “maximum capacity” tests).

What was particularly weird is that the web servers were the ones maxing out their CPU. The Search Query Component servers (dedicated hardware) were not too heavily stressed at all!

Query Variables anyone?

So the next thing we considered is that we make quite liberal use of “Query Variables” and in particular the {User.Property} ones. This allows you to use a “variable” in your Search Query which is swapped out “on the fly” for the values in that user’s SharePoint User Profile.

In our example we had “Location” and “Function” in both content and the User Profile Database, all mapped to the same MMS term sets. The crux of if is that it allows you to “tag” a news article with a specific location (region, country, city, building) and a specific function (e.g. a business unit, department or team) and when users hit the home page they only see content “targeted” at them.

To me this is what defines a “personalised” intranet .. and is the holy grail of most comms teams

However, when we took these personalisation values out (i.e. replacing {User.Location} with some actual Term ID GUID values) performance got remarkedly better! We also saw a significant uplift in the CPU usage on our Query Servers (so they were approaching 100% too).

So it would appear that SOMETHING in the use of Query Variables was causing a lot of additional CPU load on the Web Servers!

It does what??

So, now we get technical. I used JetBrains “DotPeek” tool to disassemble some of the SharePoint Server DLLs to find out what on earth happens when a Query Variable is passed in.

I was surprised at what I found!

I ended up delving down into the Microsoft.Office.Server.Search.Query.SearchExecutor class as this was where most of the “search” based activity went on, in particular in the PreExecuteQuery() method. This in turn referred to the Microsoft.SharePoint.Publishing.SearchTokenExpansion class and its GetTokenValue() method.

It then hits a fairly large switch statement with any {User.Property} tokens being passed over to a static GetUserProperty() method, which in turn calls GetUserPropertyInner(). This is where the fun begins!

The first thing it does is call UserProfileManager.GetUserProfile() to load up the current users SharePoint profile. There doesn’t appear to be any caching here (so this is PER TOKEN instance. If you have 5 {user.property} declarations in a single query, this happens 5 times!).

The next thing that happens is that it uses profile.GetProfileValueCollection() to load the property values from the UPA database, and (if it has the IsTaxonomic flag set) calls GetTaxonomyTerms() to retrieve the term values. These are full-blown “Term” objects which get created from calls to either TaxonomySession.GetTerms() or TermStore.GetTerms(). Either way, this results in a service/database roundtrip to the Managed Metadata Service.

Finally it ends up at GetTermProperty() which is just a simple bit of logic to build out the Keyword Query Syntax for Taxonomy fields (the “#0” thing) for each Term in your value collection.

So the call stack goes something like this:

SearchExecutor::PreExecuteQuery()
=> SearchTokenExpansion::GetTokenValue()
=> GetUserProperty()
=> GetUserPropertyInner()
=> UserProfileManager::GetUserProfile()
=> UserProfile::Properties.GetPropertyByName().CoreProperty.IsTaxonomic
If it is (which ours always are) then …
=> UserProfile::GetProfileValueCollection()::GetTaxonomyTerms()
=> TermStore::GetTerms()
Then for each term in the collection
=> SearchTokenExpansion::GetTermProperty()
This just builds out the “#0” + term.Id.ToString() query value

So what does this really mean?

Well lets put a simple example here.

Lets say you want to include a simple “personalised” search query to bring back targeted News content.

{|NewsFunction:{User.Function}} AND {|NewsLocation:{User.Location}}

This looks for two Search Managed Properties (NewsFunction and NewsLocation) and queries those two fields using the User Profile properties “Function” and “Location” respectively. Note – This supports multiple values (and will concatenate the query with “NewsFunction: OR NewsFunction:” as required)

On the Web Server this results in:

  • 2x “GetUserProfile” calls to retrieve the user’s profile
  • 2x “GetPropertyByName” calls to retrieve the attributes of the UPA property
  • 2x “GetTerms” queries to retrieve the term values bound to that profile

And this is happening PER PAGE REFRESH, PER USER.

So … now it suddenly became clear.

With 100k users hitting the home page it was bottlenecking the Web Servers because every home page hit resulted in double the amount of server-side lookups to the User Profile Service and Managed Metadata Service (on top of all of the other standard processing).

So how to get round this?

The solution we are gunning for is to throw away the Search Web Parts and build our own using REST calls to the Search API and KnockoutJS for the data binding.

This allows us to use client-side caching of the query (including any “expanded” query variables, and caching of their profile data) and we can even cache the entire search query result if needed so “repeat visits” to the page don’t result in additional server load.

Finally…
This was a fairly high profile investigation, including Microsoft coming in for a bit of a chat about some of the problems we’re facing. After some investigation they did confirm another option (which didn’t work for us, but useful to know) which is this:

  • Query Variables in the Search Web Part are processed by the Web Server before being passed to the Query Component
  • The same query variables in a Result Source or Query Rule will be processed on the Query Server directly!

So if you have a requirement which you can compartmentalise into a Query Rule or Result Source, you might want to look at that approach instead to reduce the WFE processing load.

Cheers! And good luck!

Windows 8, Hyper-V, BitLocker and “Cannot connect to virtual machine configuration storage”

So I am now working at a new professional services company in South East England (Ballard Chalmers) who use Hyper-V throughout their DEV / TEST environments. I have previously been a VMWare Workstation person myself (and I still think the simplicity and ease of the user interface is unmatched) but for the foreseeable time I will be running Windows 8.1 Pro on my laptop as a Hyper-V host.

Before we get started it is worth describing my setup:

  • Windows 8.1 Pro
  • 3rd Gen Intel i7-3820QM CPU
  • 32GB DDR3 RAM
  • Two physical disk drives
    • C:\ SYSTEM – 512GB SSD (for Operating System, Files and Applications)
    • D:\ DATA – 512GB SSD (for Hyper-V Images and MSDN ISOs) (running in an “Ultra-Bay” where the Optical Drive used to be)

Now like most modern laptops I have a TPM (Trusted Platform Module) on my machine so I also have BitLocker encryption running on both my C: and D: drives (for those who are interested I barely notice any performance drop at all .. and I can still get 550 MB/s sustained read even with BitLocker enabled).

Saved-Critical – Cannot connect to virtual machine configuration storage

Now I noticed from time to time that my Virtual Machines were showing error messages when my computer started up. I noticed it here and there until Thomas Vochten (@ThomasVochten) also mentioned he was getting it every time he started his machine up.

Hyper-V Error

Note – You can get this error for all sorts of reasons, particularly if you have recently changed the Drive Letters, re-partitioned your hard disks or moved a VM. In this case I was getting the error without doing anything other than turning my laptop on!

Read more »

64GB of RAM in a Laptop, and why I want it …

Well, the rumour mills have been well and truly circulating recently about the potential for high capacity DRAM chips which could allow laptops to have up to 64GB of memory. I was recently directed to this article (https://www.anandtech.com/show/7742/im-intelligent-memory-to-release-16gb-unregistered-ddr3-modules) from the ArsTechnica forums.

This article basically describes a new method of DRAM stacking (as opposed to the standard method of NAND stacking) which allows the production of 16GB SODIMMs chips. My current laptop has four SODIMM slots (like pretty much every other high-end laptop on the market) so with the current maximum of 8GB SODIMMs my laptop supports 32GB RAM. If I could use 16GB SODIMMs then I could theoretically swap those chips out for a straight 4x 16GB SODIMMs (i.e. 64GB of RAM).

The best news is that these chips could be on the market this year!

“Mass production is set to begin in March and April, with initial pricing per 16GB module in the $320-$350 range for both DIMM and SO-DIMM, ECC being on the higher end of that range.” (source: Anandtech article linked above)

Read more »