I spent a bunch of time helping out one of my devs yesterday with a problem that arose from an absolutely trivial change. Isn’t that how all debugging sessions start? “It was a simple change, I don’t understand why these tests don’t work!” In his defense, the change really was trivial (adding a bool property to a service contract). So what happened? Let’s start with NuGet.
At Mimeo, we have a large infrastructure of backend WCF services to power our Photo business. The services are generally very well componentized, allowing us to push out new builds of components without (or with very little) impact to other dependent services. All of this is managed with NuGet. We use NuGet extensively to create and consume packages for our service contract dependencies. Stable components have packages built out of their Main branch and are considered normal “Release” packages. When certain sprints require breaking changes across several components, we create “pre-release” packages for the other components within that branch to consume. NuGet works pretty damn well. And when it works, it’s magical. When it doesn’t, we have problems.
Yesterday, my developer made a trivial change to one of our components. He simply added a new bool to the service’s data contract. Let’s call this service OIS. The change was checked in, the continuous integration system built and published a new pre-release NuGet package and all looked good. He then updated the NuGet dependency in an upstream service to pick up this newly built NuGet package with his data contract change. It was a simple NuGet package update (it was already using a pre-release version of OIS from last week). The update succeeded, the local build was successful. Like a good dev, he attempted to run the smoke and BVT tests on his box to verify nothing broke. And it broke. The tests simply would not run.
The tests bombed out with the error, “Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly ‘OIS, Version=126.96.36.199, Culture=en-US, PublicKey=null’ or one of its dependencies.” The reference was there in the project, and the solution built successfully, but the DLL could not be loaded? So we looked in the TestResults folder to see what binaries got deployed with the tests. Sure enough, the assembly was not there. Back to the References – Copy Local was set to true. WTF?
Let’s Start Over
Step 1 in debugging: go back to the baseline and make sure the original results are expected. We undid all of the changes and reverted back to last week’s OIS NuGet package. Build succeeded, tests…passed! Interesting. Now we know there’s a problem with the new NuGet package. We did a side by side of the old and new OIS NuGet packages in the NuGet Package Explorer. Everything looked *exactly* the same except for the timestamps, of course. We checked the ‘packages’ folder in the solution to make sure it expanded correctly, and it did. The package was fine?! The assembly is there to build against, but for some reason it’s simply not getting copied as a dependency during test execution.
Isolate the Problem
Step 2 in debugging: isolate the problem. I then created a new console app solution, pulled in the newly built NuGet package, and then just instantiated one of the data contract classes. Build succeeded, execution…failed! Nice, now we’re getting somewhere. The failure was exactly the same – FileNotFoundException. The assembly was not there in the bin\debug folder for the console app at all. For shits and giggles, we said screw it – let’s copy the damn assembly there and see what happens. We ran the console app and execution…failed! WTF? The assembly is right there, but it can’t find it? We then verified that the assembly version and public key of the assembly matched the exception message. Of course it did. This is clearly some sort of assembly binding issue. Time for the big guns.
Reach Into Your Tool Belt
The .NET Assembly Binding Log Viewer (FusLogVw) is a tool that allows one to trace assembly loads for .NET applications. It shows rich debugging information for where it searches for a particular dependent assembly, where it finally loads it from, and what errors it encounters along the way. This can also be accomplished by flipping a regkey, which is what the tool really does. But the tool shows the events in simple Winforms app so that you don’t have to look at it in Explorer. So anyway, back to the issue – why isn’t OIS.dll loading? I turned on the option to Log All Binds and executed our test console app. Here’s what we saw for the attempted load of OIS.dll:
Well that’s interesting, why is it trying to pick it up from en-US? Why does it think it’s a satellite resource assembly? Again, for shits and giggles, we said screw it – let’s copy the damn assembly to an ‘en-us’ subfolder and see what happens. We ran the console app and execution…passed!! Whoa.
Slam Head Firmly Against Wall. Repeat.
How did adding a simple bool cause the assembly to be considered a satellite assembly? Let’s check this out in ILDasm and compare the manifest metadata between last week’s working OIS.dll and today’s new OIS.dll. ILDasm showed us the following new addition to the latest OIS.dll that wasn’t there in the old one:
.hash algorithm 0x00008004
.locale = (65 00 6E 00 2D 00 55 00 53 00 00 00) // e.n.-.U.S…
So the assembly *is* being built differently now. This made no sense. Something must have changed outside of my dev’s simple bool. After looking at OIS’s change history, we detected another seemingly minor addition to its AssemblyInfo.cs made the Friday evening by another developer:
Blargh!! This innocent addition completely changes everything. The AssemblyCulture attribute marks the assembly as a satellite assembly. These are assemblies that should only contain localized resources that the main assembly would read from during program execution. For more information consult the documentation on MSDN. The fix was simple – remove the attribute from OIS’s AssemblyInfo.cs, rebuild the NuGet package, consume, enjoy.