Stephen Smith's Blog

Musings on Machine Learning…

Browsing MSDOS and GW-Basic Source Code

leave a comment »


Introduction

These days I mostly play around with ARM Assembly Language and have written two books on it:

But long ago, my first job out of University involved some Intel 80186 Assembly Language programming, so I was interested when Microsoft recently posted the source code to GW-Basic which is entirely written in 8086 Assembly Language. Microsoft posted the source code to MS-DOS versions 1 and 2 a few years ago, which again is also entirely written in 8086 Assembly Language.

This takes us back to the days when C compilers weren’t as good at optimizing code as they are today, processors weren’t nearly as fast and memory was at a far greater premium. If you wanted your program to be useful, you had to write it entirely in Assembly Language. It’s interesting to scroll through this classic code and observe the level of documentation (low) and the programming styles used by the various programmers.

Nowadays, programs are almost entirely written in high-level programming languages and any Assembly Language is contained in a small set of routines that provide some sort of highly optimized functionality usually involving a coprocessor. But not too long ago, often the bulk of many programs consisted entirely of Assembly Language.

Why Release the Source Code?

Why did Microsoft release the source code for these? One reason is that they are a part of computer history now and there are historians that want to study this code. It provides insight into why the computer industry progressed in the manner it did. It is educational for programmers to learn from. It is a nice gesture and offering from Microsoft to the DIY and open source communities as well.

The other people who greatly benefit from this are those that are working on the emulators that are used in systems like RetroPie. Here they have emulators for dozens of old computer systems that allow vintage games and programs to be run on modern hardware. Having the source code for the original is a great way to ensure their emulations are accurate and a great help to fixing bugs correctly.

Example

Here is an example routine from find.asm in MS-DOS 2.0 to convert a binary number into an ASCII string. The code in this routine is typical of the code throughout MS-DOS. Remember that back then MS-DOS was 16-bits so AX is 16-bits wide. Memory addresses are built using two 16-bit registers, one that provides a segment and the other that gives an offset into that 64K segment. Remember that MS-DOS can only address memory upto 640K (ten such segments).

;——————————————————————–
;       Binary to Ascii conversion routine                
;                                                                 
; Entry:                                                          
;       DI      Points to one past the last char in the             
;       AX      Binary number                                       
;             result buffer.                                        
;                                                                   
; Exit:                                                             
;       Result in the buffer MSD first                            
;       CX      Digit count                                         
;                                                                   
; Modifies:                                                         
;       AX,BX,CX,DX and DI                                          
;                                                                   
;——————————————————————–
bin2asc:
        mov     bx,0ah
        xor     cx,cx
go_div:
        inc     cx
        cmp     ax,bx
        jb      div_done
        xor     dx,dx
        div     bx
        add     dl,’0′          ;convert to ASCII
        push    dx
        jmp     short go_div
div_done:
        add     al,’0′
        push    ax
        mov     bx,cx
deposit:
        pop     ax
        stosb
        loop    deposit
        mov     cx,bx
        ret

For an 8086 Assembly Language programmer of the day, this will be fairly self evident code and they would laugh at us if we complained there wasn’t enough documentation. But we’re 40 or so years on, so I’ll give the code again but with an explanation of what is going on added in comments.

bin2asc:
        mov     bx,0ah ; we will divide by 0ah = 10 to get each digit
        xor     cx,cx ; cx will be the length of the string, initialize it to 0
go_div:
        inc     cx ; increment the count for the current digit
        cmp     ax,bx ; Is the number < 10 (last digit)?
        jb      div_done   ; If so goto div_done to process the last digit
        xor     dx,dx ; DX = 0
        div     bx ; AX = AX/BX  DX=remainder
        add     dl,’0′          ;convert to ASCII. Know remainder is <10 so can use DL
        push    dx ; push the digit onto the stack
        jmp     short go_div ; Loop for the next digit
div_done:
        add     al,’0′ ; Convert last digit to ASCII
        push    ax ; Push it on the stack
        mov     bx,cx ; Move string length to BX
deposit:
        pop     ax ; get the next significant digit off the stack.
        stosb ; Store AX at ES:DI and increment DI
       ; Loop decrements CX and branches if CX not zero.
; Falls through when CX=0
        loop    deposit
        mov     cx,bx ; Put the count back in CX
        ret ; return from routine.

A bit different than a C routine. The routine assumes the DF flag is set, so the stosb increments the memory address, perhaps this is a standard across MS-DOS or perhaps it’s just local to this module. I think the comment is incorrect and that the start of the output buffer is passed in. The routine uses the stack to reverse the digits, since the dividing by 10 algorithm peels off the least significant digit first and we want the most significant digit first in the buffer. The resulting string isn’t NULL terminated so perhaps MS-DOS treats strings as a length and buffer everywhere.

Comparison to ARM

This code is representative of CISC type processors. The 8086 has few registers and their usage is predetermined. For instance the DIV instruction is only passed one parameter, the divisor. The dividend, quotient and remainder are set in hard-wired registers. RISC type processors like the ARM have a larger set of registers and tend to have three operands per instruction, namely two input registers and an output register.

This code could be assembled for a modern Intel 64-bit processor with little alteration, since Intel has worked hard to maintain a good level of compatibility as it has gone from 16-bits to 32-bits to 64-bits. Whereas ARM redesigned their instruction set when they went from 32-bits to 64-bits. This was a great improvement for ARM and only possible now that the amount of Assembly Language code in use is so much smaller.

Summary

Kudos to Microsoft for releasing this 8086 Assembly Language source code. It is interesting to read and gives insight into how programming was done in the early 80s. I hope more classic programs have their source code released for educational and historical purposes.

Written by smist08

May 25, 2020 at 6:56 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: