Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘Windows

Playing with CUDA on my Gaming Laptop

leave a comment »

Playing with CUDA on my Gaming Laptop


Last year, I blogged on playing with CUDA on my nVidia Jetson Nano. I recently bought a new laptop which contains an nVidia GTX1650 graphics card with 4Gig of RAM. This is more powerful than the coprocessor built into the Jetson Nano.  I took advantage of the release of newer Intel 10th generation processors along with the wider availability of newer nVidia RTX graphics cards to get a good deal on a gaming laptop with an Intel 9th generation processor and nVidia GTX graphics. This is still a very fast laptop with 16Gig of RAM and runs the couple of video games I’ve tried fine. It also compiles and handles my normal projects easily. In this blog post, I’ll repeat a lot of my previous article on the nVidia Jetson, but in the context of running on Windows 10 with an Intel CPU.

I wanted an nVidia graphics card because these have the best software support for graphics, gaming, AI, machine learning and parallel programming. If you use Tensorflow for AI, then it uses the nVidia graphics card automatically. All the versions of DirectX support nVidia and if you are doing general parallel programming then you can use a system like OpenCL. I find nVidia leads AMD in software support and Intel is going to have a lot of trouble with their new Xe graphics cards reaching this same level of software support.


On Windows, most developers use Visual Studio. I could do this all with GCC, but this is more difficult, since when you install the SDK for CUDA, you get all the samples and documentation for Visual Studio. The good news is that you can use Visual Studio Community Edition which is free and actually quite good. Installing Visual Studio is straightforward, just time consuming since it is large.

Next up, you need to install nVidia’s CUDA toolkit. Again, this is straightforward, just large. Although the install is large, you likely have all the drivers already installed, so you are mostly getting the developer tools and samples out of this.

Performing these installs and then dealing with the program upgrades, really makes me miss Linux’s package managers. On Linux, you can upgrade all the software on your computer with one command on a regular basis. On Windows, each program checks for upgrades when it starts and usually wants to upgrade itself before you do any work. I find that this is a real productivity killer on Windows. Microsoft is starting work on a package manager for Windows, but at this point it does little.

Compiling the deviceQuery sample produced the following output on my gaming laptop:

CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1650 with Max-Q Design"
  CUDA Driver Version / Runtime Version          11.0 / 11.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 4096 MBytes (4294967296 bytes)
  (16) Multiprocessors, ( 64) CUDA Cores/MP:     1024 CUDA Cores
  GPU Max Clock rate:                            1245 MHz (1.25 GHz)
  Memory Clock rate:                             3501 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 6 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.0, CUDA Runtime Version = 11.0, NumDevs = 1
Result = PASS

If we compare this to the nVidia Jetson Nano, we see everything is better. The GTX 1650 is based on the newer Turing architecture and the memory is local to the graphics card and not shared with the CPU. The big difference is that we have 1024 CUDA cores, rather than the Jetson’s 128. This means we can perform 1024 operations in parallel for SIMD operations.

CUDA Samples

The CUDA toolkit includes a large selection of sample programs, in the Jetson Nano article we listed the vector addition sample. Compiling and running this on Windows is easy in Visual Studio. These samples are a great source of starting points for your own projects. 

Programming for Portability

If you are writing a specialized program and want the maximum performance on specialized hardware, it makes sense to write directly to nVidia’s CUDA API. However, most software developers want to have their programs to run on as many computers out in the world as possible. The solution is to write to a higher level API that then has drivers for different popular hardware.

For instance, if you are creating a video game, you could write to the DirectX interface and then your program can run on newer versions of Windows on a wide variety of GPUs from different vendors. If you don’t want to be limited to Windows, you could use a portable graphics API like OpenGL. You can also go higher level and create your game in a system like UnReal Engine or Unity. These then have different drivers to run on DirectX, MacOS, Linux, mobile devices or even in web browsers.

If you are creating an AI or Machine Learning application, you can use a library like Tensorflow or PyTorch which have drivers for all sorts of different hardware. You just need to ensure their support is as broad as the market you are trying to reach.

If you are doing something more general or completely new, you can consider a general parallel processing library like OpenCL which has support for all sorts of devices, including the limited SIMD coprocessors included with most modern CPUs. A good example of a program that uses OpenCL is Folding@Home which I blogged on here.


Modern GPUs are powerful and flexible computing devices. They have high speed memory and often thousands of processing cores to work on your task. Libraries to make use of this computing power are getting better and better allowing you to leverage this horsepower in your applications, whether they are graphics related or not. Today’s programmers need to have the tools to harness these powerful devices, so the applications they are working on can reach their true potential.

Written by smist08

June 20, 2020 at 1:43 pm

Windows Bit-Rot

with 9 comments


In investigating some performance problems being reported on some systems running Sage 300 ERP, it lead down the road to investigating Windows Bit-Rot. Generally Bit-Rot refers to the general degradation of a system over time. Windows has a very bad reputation for Bit-Rot, but what is it? And what can we do about it? Some people go so far as to reformat their hard disk and re-install the operating system every year as a rather severe answer to Bit-Rot.

Windows Bit-Rot is the tendency for a Windows system to get slower and slower over time. Becoming slower to boot, taking longer to log-in, and taking longer to start programs. Along with other symptoms like excessive and continuous hard disk activity when nothing is running.

This blog posting is going to look at a few things that I’ve run into as well as some other background from around the web.


I needed to investigate why on some systems printing Crystal reports was quite slow. This involved software we have written as well as a lot of software from third parties. On my laptop Crystal would print quite slowly the first time and then would print quickly on subsequent times. My computer is used for development and is full of development tools, so the things I found here, might be relevant to myself more than real customers. So how to see what is going on? A really useful program for seeing what is going on is Process Monitor (procmon) from Microsoft (from their SysInternals acquisition). This program will show you every access of the registry, the file system and the network. You can filter the display, in particular you can filter to monitor only a single program to see what it’s doing.


ProcMon yielded some very interesting results.

The Registry

My first surprise was to see that every entry in HKEY_CLASSES_ROOT was read. On my computer which has had many pieces of software installed, including several versions of Visual Studio, several versions of Crystal Reports and several versions of Sage 300 ERP, the number of classes registered here was huge. OK, but did it take much time? Well the first time something that’s run that does this it seems to take several seconds, then after this its fast probably because the registry ends up cached in memory. It appears that several .Net programs I tried do this. Not sure why, perhaps just .Net wants to know all the classes in the system.

But this does mean that as your system gets older and you install more and more programs (after all why bother un-installing when you have a multi-terabyte hard drive?), starting these programs will get slightly slower and slower. So to me this counts as Bit-Rot.

So what can we do about this? Un-installing unused programs should help, especially if they use a lot of COM classes. Visual Studio being the big one on my system, followed by Crystal and Sage 300. This helps a bit. But there are still a lot of classes there.

Generally I think uninstall programs leave a lots of bits and pieces in the registry. So what to do? Fortunately this is a good stomping ground for utility programs. Microsoft used to have RegClean.exe, Microsoft discontinued support for this program, but you can still find it around the web. A newer and better utility is Ccleaner from Piriform. Fortunately the free version includes a registry cleaner. I ran RegClean.exe first which helped a bit, but then ran Ccleaner and it found quite a bit more to clean up.

Of course there is danger in cleaning your registry, so it’s a use at your own risk type thing (backing up the registry first is a good bet).

At the end of the day all this reduced the first time startup time of a number of program by about 10 seconds.

Group Policy

My second surprise was the number of calls to check Windows Group Policy settings. Group Policy is a rather ad-hoc mechanism added to Windows to allow administrators to control networked computers on their domain. Each group policy is stored in a registry key, and when Windows goes to do an operation controlled by group policy, it reads that registry key to see what it should do. I was surprised at the amount of registry activity that goes on reading and checking group policy settings. Besides annoying users by restricting what they can do on their computer, it appears group policy causes a general high overhead of excessive registry reading in almost every aspect of Windows operation. There is nothing you can do about this, but it appears as Windows goes from version to version, that more and more gets added to this and the overhead gets higher and higher.


You may not think that you install that many programs on your computer, so you shouldn’t have these sort of problems but remember many programs including Windows/Microsoft Update, Adobe Updater and such are regularly installing new programs on your computer. Chances are these programs are leaving behind unused bits of older versions that are cluttering up your file system and your registry.

Auto-Run Crap

Related to auto-updates, it appears that so many programs now run as icons in the task bar, install Windows services or install programs to run when you log-in. All of these slow down the time it takes you to boot Windows and to sign-in. Further many of these programs, say like Dropbox, will keep frequently polling their server to see if there are any updates. Microsoft has a good tool Autoruns for Windows which helps you see all the things that are automatically run and help you remove them. Again this can be a bit dangerous as some of them are necessary (perhaps like a trackpad utility).

Similarly it seems that everyone and their mother wants to install browser toolbars. Each one of these will slow down the startup of your browser and use up memory and possibly keep polling a server. Removing/disabling these isn’t hard, but it is a nuisance to have to keep doing this.

Hard Disk Fragmentation

Another common problem is hard drive fragmentation. As your system operates the hard disk becomes more and more fragmented. Windows has a de-frag program that is either scheduled to run when your computer is turned off or you never bother to run it by hand. It is worth de-fragging your hard drive from time to time to speed up access. There are third party de-frag programs, but generally I just use the one that comes built into Windows.

Related to the above problems, often un-installation programs leave odds and ends files around and sometimes it’s worth going into explorer (or a cmd prompt) and deleting folders for un-installed programs. Generally it reduces clutter and speeds up operations like reading all the folders under program files.

Dying Hard Drives

Another common cause of slowness is that as hard drives age, rather than just out right failing, often they will start having to retry reading sectors more. Windows can mark sectors bad and move things around.  Hard drives seem to be able to limp along for a while this way before completely failing. I tend to think that if you hear your hard drive resetting itself fairly often then you should replace it. Or when you defrag if you see the number of bad sectors growing, then replace it.


After going through this, I wonder if the people that just reformat their hard drive each year have the right idea? Does the time spent un-installing, registry cleaning, de-fragging just add up to too much? Are you better off just starting clean each year and not worrying about all these maintenance tasks? Especially now that it seems like we replace our computers far less frequently, is Bit-Rot becoming a much worse problem?

Written by smist08

May 4, 2013 at 2:31 pm

Operating System Competition Heats Up

leave a comment »

Google announced they are going to release their own PC operating system called the ChromeOS. Basically the Chrome browser with a Linux kernel. They haven’t really said what else it will include. Many people have heralded this as a beginning of the end for Microsoft and their Windows monopoly. But I think the beginning of the end was MS’s failure to capture any meaningful piece of the mobile smart phone market. This has basically gone to Apple, Nokia and RIM. Google Android is a small but growing competitor as well. Basically small mobile devices are where the action is these days.

With Vista and Windows 7 being such large resource hogs, Google (and others) see a huge opportunity for small low cost notebook computers that they are calling netbooks. Basically notebooks that are under $500, where it doesn’t make sense to add another $100 for the price of Windows. Another development is that there is about to be a flood of even cheaper ARM processor based notebooks. These ARM notebooks will probably shave another $200 off the price. Windows only runs on Intel and Intel clone processors. It doesn’t run on ARM, nor can Windows easily be made to run on ARM. There is a huge opportunity here.

Microsoft is now feeling squeezed between Apple which owns the higher end of the market with easy to use and stylish Mac notebooks, and the low end soon to be dominated by mobile phone OS and Linux based OS notebooks. As Macs become cheaper and the low end notebooks and smart phones become more powerful, MS is being very strongly squeezed.

Google hopes to succeed where Linux has failed by offering a full suite of applications to go with the OS, namely all the Google web based productivity applications which include all the usual office type things along with some quite innovative new offerings.

Meanwhile what does that mean to us as business application developers? We are now faced with a plethora of platforms to support. Its not just Windows on Intel/AMD anymore. The only way to survive in this brave new world will be to write truely portable platform neutral standards based applications. Writing Windows desktop applications or fake web based applications based on ActiveX, Java Applets, Silverlight or Flash will no longer cut it. With the forthcoming new HTML 5 standard we have the opportunity to write truly platform independent applications that will run in any decent browser on any hardware/operating system. If we can be successful here and all these pieces keep falling into place, there is a really great opportunity to really provide much higher customer value.

Customers will be able to run their business applications on any variety of low cost devices with very high mobility (always connected via the cell phone network), better screens that current smart phones and batteries that will last days between charging. Ease of use will become much better as everyone standardizes on the HTML 5 standards and best practices. TCO will really become lower. You just need one of these low cost netbooks and the URL to connect to your application. No more workstation setups or program installations. Everyone will be up and running and productive very quickly. Definitely things to look forwards to.

Written by smist08

July 14, 2009 at 2:26 am