Stephen Smith's Blog

Musings on Machine Learning…

Archive for the ‘graphics’ Category

Playing with CUDA on my Gaming Laptop

leave a comment »

Playing with CUDA on my Gaming Laptop

Introduction

Last year, I blogged on playing with CUDA on my nVidia Jetson Nano. I recently bought a new laptop which contains an nVidia GTX1650 graphics card with 4Gig of RAM. This is more powerful than the coprocessor built into the Jetson Nano.  I took advantage of the release of newer Intel 10th generation processors along with the wider availability of newer nVidia RTX graphics cards to get a good deal on a gaming laptop with an Intel 9th generation processor and nVidia GTX graphics. This is still a very fast laptop with 16Gig of RAM and runs the couple of video games I’ve tried fine. It also compiles and handles my normal projects easily. In this blog post, I’ll repeat a lot of my previous article on the nVidia Jetson, but in the context of running on Windows 10 with an Intel CPU.

I wanted an nVidia graphics card because these have the best software support for graphics, gaming, AI, machine learning and parallel programming. If you use Tensorflow for AI, then it uses the nVidia graphics card automatically. All the versions of DirectX support nVidia and if you are doing general parallel programming then you can use a system like OpenCL. I find nVidia leads AMD in software support and Intel is going to have a lot of trouble with their new Xe graphics cards reaching this same level of software support.

Setup

On Windows, most developers use Visual Studio. I could do this all with GCC, but this is more difficult, since when you install the SDK for CUDA, you get all the samples and documentation for Visual Studio. The good news is that you can use Visual Studio Community Edition which is free and actually quite good. Installing Visual Studio is straightforward, just time consuming since it is large.

Next up, you need to install nVidia’s CUDA toolkit. Again, this is straightforward, just large. Although the install is large, you likely have all the drivers already installed, so you are mostly getting the developer tools and samples out of this.

Performing these installs and then dealing with the program upgrades, really makes me miss Linux’s package managers. On Linux, you can upgrade all the software on your computer with one command on a regular basis. On Windows, each program checks for upgrades when it starts and usually wants to upgrade itself before you do any work. I find that this is a real productivity killer on Windows. Microsoft is starting work on a package manager for Windows, but at this point it does little.

Compiling the deviceQuery sample produced the following output on my gaming laptop:

CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1650 with Max-Q Design"
  CUDA Driver Version / Runtime Version          11.0 / 11.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 4096 MBytes (4294967296 bytes)
  (16) Multiprocessors, ( 64) CUDA Cores/MP:     1024 CUDA Cores
  GPU Max Clock rate:                            1245 MHz (1.25 GHz)
  Memory Clock rate:                             3501 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 6 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.0, CUDA Runtime Version = 11.0, NumDevs = 1
Result = PASS

If we compare this to the nVidia Jetson Nano, we see everything is better. The GTX 1650 is based on the newer Turing architecture and the memory is local to the graphics card and not shared with the CPU. The big difference is that we have 1024 CUDA cores, rather than the Jetson’s 128. This means we can perform 1024 operations in parallel for SIMD operations.

CUDA Samples

The CUDA toolkit includes a large selection of sample programs, in the Jetson Nano article we listed the vector addition sample. Compiling and running this on Windows is easy in Visual Studio. These samples are a great source of starting points for your own projects. 

Programming for Portability

If you are writing a specialized program and want the maximum performance on specialized hardware, it makes sense to write directly to nVidia’s CUDA API. However, most software developers want to have their programs to run on as many computers out in the world as possible. The solution is to write to a higher level API that then has drivers for different popular hardware.

For instance, if you are creating a video game, you could write to the DirectX interface and then your program can run on newer versions of Windows on a wide variety of GPUs from different vendors. If you don’t want to be limited to Windows, you could use a portable graphics API like OpenGL. You can also go higher level and create your game in a system like UnReal Engine or Unity. These then have different drivers to run on DirectX, MacOS, Linux, mobile devices or even in web browsers.

If you are creating an AI or Machine Learning application, you can use a library like Tensorflow or PyTorch which have drivers for all sorts of different hardware. You just need to ensure their support is as broad as the market you are trying to reach.

If you are doing something more general or completely new, you can consider a general parallel processing library like OpenCL which has support for all sorts of devices, including the limited SIMD coprocessors included with most modern CPUs. A good example of a program that uses OpenCL is Folding@Home which I blogged on here.

Summary

Modern GPUs are powerful and flexible computing devices. They have high speed memory and often thousands of processing cores to work on your task. Libraries to make use of this computing power are getting better and better allowing you to leverage this horsepower in your applications, whether they are graphics related or not. Today’s programmers need to have the tools to harness these powerful devices, so the applications they are working on can reach their true potential.

Written by smist08

June 20, 2020 at 1:43 pm

Amazing Google Street View

leave a comment »

I remember when SQL Server 7 was nearly ready for release, Microsoft research had a project to make a 1 terabyte database. Their project was to server up satellite images of anywhere on Earth. Basically they had 1 Terabyte of images, so that was their terabyte database. I didn’t think this was a realistic terabyte database since their weren’t that many records, just each one was an image and quite large. Anyway they put the thing up on the web as a beta, and as soon as a few people tried it, the whole thing collapsed under the load.

A year or two later Google launched Google Earth, which basically did the same thing, only more detailed. But Google Earth can easily handle the load of all the people around the world accessing it. Why the difference? Why could Google do this and Microsoft couldn’t? I think the main difference is that Microsoft hosted it on one single SQL Server and had no way to scale it besides beefing up the hardware at great expense. Whereas Google uses a massively distributed database running on many many servers all coordinated and all sharing and balancing the load. Google uses many low cost Linux based servers keeping costs down and performance high.

This week Google released Google StreetView for major Canadian cities including Vancouver, where I live. So I can virtually cruise around Vancouver streets with very good resolution including seeing my house and neighborhood. This is really amazing technology. Rather than just panning around a patchwork of satellite images, we are actually navigating in 3D around the world. Suddenly Google has produced a virtual model of the entire world at quite good photographic quality.

Think about the size of this distributed database with all these photos, plus all the data to allow them to be stitched together into 3D Views that you can navigate through. This is so far beyond Google Earth, it’s really amazing. Is this the first step to having a completely virtual alternate Earth? If you are wearing 3D goggles, will you be able to tell if they are transparent or viewing these images?

I think we are just seeing the first applications of what is possible with these giant distributed databases. I’m really looking forwards to seeing some really amazing and mind blowing applications in the future. The neat thing is that Google is starting to open source this database technology so others can use it. Are SQL databases just dinosaurs waiting to be replaced? What will be able to accomplish in our business/enterprise databases and data warehouses one we start apply and using this technology?

Written by smist08

October 11, 2009 at 12:24 am

Amazing CGI

leave a comment »

I saw District 9 this weekend. Quite a good movie. Memorable CGI generated aliens “the prawns”. Quite an improvement from aliens that are clearly people with lots of makeup and fur suites, quite an improvement over using muppets and other robots/puppets. The aliens appear to move naturally, they have insect like mouth parts that are constantly moving and bodies that clearly an actor couldn’t fit into. Good work to the Vancouver company Imagine Engine that created them. The aliens fit right into the film and are completely realistic.

The I was blown away by the trailed for Cameron’s new movie Avatar which is coming in December. Again huge amounts of CGI creating a truly alien but beautiful planet and creatures. Really amazing how realistic these imagined worlds are becoming.

Even on standard PCs today with relatively inexpensive graphics cards from NVidia or AMD, its amazing the level of realistic graphics you can get in modern computer games. Each frame in the movies mentioned above might take hours to render to get the desired quality, but computer games today aren’t far behind and rendering 30 frames a second on current graphics co-processor cards. These cards often have 1 gig of their own memory and hundreds of parallel processors doing all the 3D calculations.

It looks like with modern movie making technology, truly whatever can be imagined can be created. Currently it might be limited to big budget productions cost $100 million to make. But prices keep coming down, techniques keep getting cheaper. Next we’ll see movies less expensive to produce, we’ll see this technology incorporated into video games. Should be amazing to see the crop of movies that start appearing over the next few years.

Written by smist08

August 24, 2009 at 1:55 am

Posted in graphics

Tagged with , , ,