some of your computer's RAM is data (like cat pictures) and some of it is binary code for your CPU to run.
code always needs to be loaded from disk into RAM to run.
your CPU can only execute binary "machine code", which looks a bit like this:
11000011
(that's 0xc3
, an instruction in x86 that means "return"!)
they're generally not only machine code -- usually they have some other data in them as well, like information about which libraries they need to dynamically link to. But a lot of what's in a binary executable is machine code.
01110101011
) valid machine code?
every CPU has an "instruction set" which defines which operations are valid and what they mean.
For example, one of the machine language instructions in the x86 instruction set is
0xb8
, or 10111000
in binary.
if it's invalid code, the CPU will trigger an interrupt that gets
translated into the SIGILL
signal on Unix.
the OS actually sets permissions (read/write/execute) on different parts of a process's memory. if the memory doesn't have execute permissions, you can't run the instruction there.
these permissions are called "memory protection" and on Linux, you can see a process's memory permissions with $ cat /proc/$PID/maps
there's a special register called the "instruction pointer". it holds the address of the next instruction to be executed.
most instructions have arguments. For example, the opcode 0xb8
loads a constant into a register, and takes one 32-bit argument (which is the constant to load).
assembly is a slightly more human-readable programming language that we use to make
it easier for humans to write machine code. For example, mov (%rax),%rdx
is 0x488b10
in machine code.
assembly is still challenging to read but at least it's not just a bunch of numbers :)
you can easily translate machine code to more human readable assembly! The program you use to do this is called a disassembler, like objdump
. Example:
$ objdump -d /bin/cat
similarly, you can translate assembly to machine code with an
assembler like nasm
C code is translated to machine code by your compiler, like
gcc
or clang
or MSVC
javac
, does it get translated to machine code?
the machine code that's running when you run a Java program is the JVM
interpreter (usually /usr/bin/java
or something)
Java programs are compiled, but they're compiled to JVM bytecode, not machine code.
the JVM is an example of a program that does this -- its JIT will compile frequently called bits of JVM bytecode into machine code.
this is also how some software exploits work -- they try to insert new machine code into your memory and run it.