Data Execution Prevention
Recently we were investigating why any of the Sage 300 ERP Financial Reporter dialogs would crash when launched from within Excel 2013. It turned out that they were running afoul of Window’s Data Execution Prevention (DEP). DEP is a security feature that has been added to newer operating systems, basically to stop malware programs from figuring out a way download code into a data area and then somehow causing it to execute, usually by overwriting the stack by taking advantage of a memory overrun bug.
OK but Sage 300 ERP would certainly never try to do anything like that, so why would it crash with this sort of exception?
The Sage 300 ERP VB screens are built out of a number of ActiveX controls that provide data binding from Sage 300 Business Objects to the UI elements, so that we don’t have to write any code for most data fields, we just need to wire them up in the screen editor.
When we created these controls as part of creating version 5.0A, there were a number of ways of doing this and the one we chose was Microsoft’s Active Template Library (ATL) where you wrote the controls in C++ in an object oriented manner. And it turns out that ATL puts code into the data segment and then executes it.
So why does ATL do this? The basic problem with object oriented frameworks on Windows is that the core Windows kernel is not object oriented. Basically Windows sends a notification for a Window where the Window is specified by its Windows handle. So how do you know which Window object in your framework should get this notification message? Microsoft’s MFC framework solved this problem by keeping a table of Windows handles to Windows objects, and then when each message comes in, it looks up which object it’s for and then calls that object. This then gave MFC a reputation for being slow, since there are a lot of such messages and MFC then spends all its time looking up objects. But on the good side this is quite a safe and sure method of doing things and has never broken. ATL decided to get tricky. For each Window you can add a custom 32 bit value, so ATL made this a memory pointer to the object code for the object to call. Then when the message comes in ATL would create data for an assembler jump instruction and append this 32 bit address and then pass control to the jump instruction to call the object. Notice that this is done very quickly with no table lookup. But it does mean building a bit of code in data memory and then executing it. Generally this is referred to as “thunking”.
So basically ATL (and early versions of the .Net framework) are executing a design pattern utilized by modern viruses. This is a very clever and fast way to do things, but unfortunately needed to be blocked.
Newer versions of ATL (version 8 and above) now allocate a small block of memory from the operating system with the correct security attributes so that they can still do the same trick, but now the program has let Windows know that this is desired and correct behavior.
Current versions of Sage 300 ERP have their controls compiled using ATL 3.0 which came with the Visual C++ 98 compiler. The correct way to fix the problem is to compile with a later version of the compiler namely we chose Visual Studio 2005 because most other things in our system are compiled with this and it uses ATL 8.1 which then works fine with DEP.
Sound simple. But there are twenty controls or so in the system and there are quite a few differences introduced with newer versions of the C/C++ compiler and with ATL. Generally moving to these newer versions is a good thing, but it introduced a few problems and we needed to ensure the system still worked correctly.
One good thing is that the newer C/C++ compiler has better warnings for detecting things like variables used before they are assigned, bad conversions and mismatched pointers. The compiler detected a few of these and they needed to be fixed. Generally this is a good thing since it makes the overall program more stable and reliable.
Another things with the newer ATL is that it fixed a few bugs in the older ATL. For instance the older ATL didn’t set the background color of controls in all cases, so suddenly if a background color was set and wrong then it would show up, so a few UIs needed to be fixed to set background colors correctly. Generally these are good things, but take a bit of work to correct. They also help with another project we have going to modernize the look of all our UIs.
Then we just have to make sure that our normally supported features like translation to double byte character languages, keyboard shortcuts, design time dialogs and such all still work as expected. This is a bit of a challenge with controls like the field edit control which have a lot of modes of operation.
There is always a lot of debate when we change the build to use a new version of the compiler. Will older programs still work? Will customers with older hardware still work? Is it worth the work and risk in changing things rather than sticking with the trusted and true?
I take the view that we have to allocate time in our releases to address technical debt in our releases. We need to upgrade various compilers, frameworks and bundled libraries. Otherwise we start having problems with newer versions of Windows, with newer hardware and generally operating in modern environments. I think we need to take advantage of bug fixes, security fixes and performance fixes in the tools we are using.
Visual Studio 2012
Once we figured this out, we realized this explained why some ISVs were having trouble integrating to our system from Visual Studio 2012. DEP is now turned on by default for all new projects, which means you will GPF if you use any of our ActiveX visual controls. We then confirmed this was the problem. So when this fix is GA, it should also simplify integration work for our ISVs using modern tools. In the meantime you can set /NXCOMPAT:NO in your project to turn off DEP for your program. Obviously this isn’t ideal, but it is a workaround.
Usually in Windows DEP is only turned on for Windows system processes, but Windows can be configured to turn it on for all processes. However individual programs can be configured for having DEP on or off when they are built. How the program is built will take precedence over the Windows settings. This is why we ran into problems with Excel 2013, since it is compiled with DEP turned on. However Office 2013 is also a development platform, so turning on DEP for Office, also means anything integrated into Office has to be DEP compliant as well. This then eliminates using anything built with older versions of ATL and the .Net framework.
When Will This Be Fixed?
We have fixed this for our upcoming Sage 300 ERP 2014 release (which will be released in 2013). We are currently testing as part of that project, but once we are confident we’ve fixed any minor glitches that are still present then we’ll bundle these updated controls together as a hotfix for Sage 300 ERP 2012.
Finding and solving the problem with our Financial Reporter and Excel 2013, was a bit of a relief since it also explained a number of other problems that had been hanging around unsolved. It’s good to figure out when something has gone wrong and to fix it. It’s also good to know why some developers were having trouble integrating to Sage 300 ERP from VS2012.