High-level language codes are written in a form that is human readable. Computers only understand machine code - they do not understand high-level language code.
Any high-level programming language code has to be converted to executable code. Executable code is also known as machine code which is a combination of binary code 0s and 1s.
Let’s look at how a high-level code converts to machine code.
Preprocessor
This is the first phase before compilation It is also known as macro processor. This phase includes the removal of comments, expansion of Macros, expansion of the included files, and conditional compilation. The pre-processed temporary output is then stored in a file with “.i” extension.
Compilation of preprocessed file
In the compilation phase the compiler comes into action. It accepts temporary pre-processed “filename.i” file generated by the pre-processor and performs some tasks. It checks for any syntax errors and translates the file into an assemble language file. After compilation, it generates an intermediate code in assembly and saves it as “filename.s” file. Below is an example of a simple Hello World program in C and Assembly.
#include <-stdio.h->
Int main(void)
{
printf("Hello World!");
return 0;
}
At compilation, the compiler will convert the above code into the following assembly code.
.LC0:
.string "Hello World!"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
mov eax, 0
pop rbp
ret
Assembly language
Assembly language is a low-level programming language that is processor or hardware specific. It can be produced by compiling high-level language, but the programmer can program directly using assembly language. Assembly instructions does different from hardware to hardware but mostly they are similar for different processor and hardware. Below are some x86 processor assembly instructions.
- MOV - move data from one location to another
- ADD - add two values
- SUB - subtract a value from another value
- PUSH - push data onto a stack
- POP - pop data from a stack
- JMP - jump to another location
- INT - interrupt a process
Each processor has its own set of instructions which are in the datasheet of the processor
Mnemonic
In assembly language, mnemonics are used to specify an operation code often called opcode that represents an operational machine language instruction. Mnemonics are then translated by the assembler to generate the object code. For example, the SUB mnemonic is used in assembly for subtracting one operand from another in the memory.
Assembler
In this stage, the assembler translates the assembly into object code. Assembler is a tool that translates assembly code into machine code. The assembler converts an assembly code which is produced during the compilation process into 0s and 1s that can be recognised by a processor or hardware.
Linker
The linker links all the object files together into one object file along with code implementing library functions (e.g., printf) and generates the final executable code file which can then be executed. Linking mainly involves resolving references to external symbols.
Machine codes is not very human friendly and it very difficult for humans to read or write as it only consists of 0s and 1s. programmers who write programs for hardware and processors usually use assembly language. Assembly is a human-readable form of machine codes in which instructions represent the machine codes 0s and 1s.
Software Developer
Sameer is a software developer at Optima Systems. He studied BSC Computer Science at university and recently completed his studies. After graduating from university, he started at Optima Systems.
Ask Sameer about Software Solutions / APL / APL Consultancy / APL Legacy System Support / Website Design and Development