For this example we will be looking at the C programming language.

High-level language codes are written in a form that is human readable. Computers only understand machine code - they do not understand high-level language code.

Any high-level programming language code has to be converted to executable code. Executable code is also known as machine code which is a combination of binary code 0s and 1s.

Let’s look at how a high-level code converts to machine code.

Preprocessor

This is the first phase before compilation It is also known as macro processor. This phase includes the removal of comments, expansion of Macros, expansion of the included files, and conditional compilation. The pre-processed temporary output is then stored in a file with “.i” extension.

Compilation of preprocessed file

In the compilation phase the compiler comes into action. It accepts temporary pre-processed “filename.i” file generated by the pre-processor and performs some tasks. It checks for any syntax errors and translates the file into an assemble language file. After compilation, it generates an intermediate code in assembly and saves it as “filename.s” file. Below is an example of a simple Hello World program in C and Assembly.

#include <-stdio.h-> 

Int main(void) 

{ 

printf("Hello World!"); 

return 0; 

}

At compilation, the compiler will convert the above code into the following assembly code.

.LC0: 

        .string "Hello World!" 

main: 

        push    rbp 

        mov     rbp, rsp 

        mov     edi, OFFSET FLAT:.LC0 

        call    puts 

        mov     eax, 0 

        pop     rbp 

        ret

Assembly language

Assembly language is a low-level programming language that is processor or hardware specific. It can be produced by compiling high-level language, but the programmer can program directly using assembly language. Assembly instructions does different from hardware to hardware but mostly they are similar for different processor and hardware. Below are some x86 processor assembly instructions.

MOV - move data from one location to another
ADD - add two values
SUB - subtract a value from another value
PUSH - push data onto a stack
POP - pop data from a stack
JMP - jump to another location
INT - interrupt a process

Each processor has its own set of instructions which are in the datasheet of the processor

Mnemonic

In assembly language, mnemonics are used to specify an operation code often called opcode that represents an operational machine language instruction. Mnemonics are then translated by the assembler to generate the object code. For example, the SUB mnemonic is used in assembly for subtracting one operand from another in the memory.

Assembler

In this stage, the assembler translates the assembly into object code. Assembler is a tool that translates assembly code into machine code. The assembler converts an assembly code which is produced during the compilation process into 0s and 1s that can be recognised by a processor or hardware.

Linker

The linker links all the object files together into one object file along with code implementing library functions (e.g., printf) and generates the final executable code file which can then be executed. Linking mainly involves resolving references to external symbols.

Machine codes is not very human friendly and it very difficult for humans to read or write as it only consists of 0s and 1s. programmers who write programs for hardware and processors usually use assembly language. Assembly is a human-readable form of machine codes in which instructions represent the machine codes 0s and 1s.

About the Author

Sameer Humayun

Software Developer

Sameer is a software developer at Optima Systems. He studied BSC Computer Science at university and recently completed his studies. After graduating from university, he started at Optima Systems.

Ask Sameer about Software Solutions / APL / APL Consultancy / APL Legacy System Support / Website Design and Development

Dyalog’22 APL User Meeting, Portugal

APL, Events9

State Funeral for Queen Elizabeth II

Optima8