questions about assembly

Hello! Here are some questions & answers. The goal isn't to get all the questions "right". Instead, the goal is to learn something! If you find a topic you're interested in learning more about, I'd encourage you to look it up and learn more.

when your laptop's CPU runs a program, does the code have to be loaded into RAM for it to run?

yes!

some of your computer's RAM is data (like cat pictures) and some of it is binary code for your CPU to run.

code always needs to be loaded from disk into RAM to run.

can your CPU execute human-readable code like Python?

nope!

your CPU can only execute binary "machine code", which looks a bit like this: 11000011 (that's 0xc3, an instruction in x86 that means "return"!)

do binary executables contain machine code?

yes!

they're generally not only machine code -- usually they have some other data in them as well, like information about which libraries they need to dynamically link to. But a lot of what's in a binary executable is machine code.

is any binary sequence (like 01110101011) valid machine code?

nope!

every CPU has an "instruction set" which defines which operations are valid and what they mean.

For example, one of the machine language instructions in the x86 instruction set is 0xb8, or 10111000 in binary.

is it possible to ask the CPU to run an invalid instruction?

yes!

if it's invalid code, the CPU will trigger an interrupt that gets translated into the SIGILL signal on Unix.

is it possible to ask the CPU to run any binary data in your RAM?

nope!

the OS actually sets permissions (read/write/execute) on different parts of a process's memory. if the memory doesn't have execute permissions, you can't run the instruction there.

these permissions are called "memory protection" and on Linux, you can see a process's memory permissions with
$ cat /proc/$PID/maps

how does the CPU know which part of RAM it should run code from?

the instruction pointer!

there's a special register called the "instruction pointer". it holds the address of the next instruction to be executed.

do CPU instructions ever have arguments?

yes!

most instructions have arguments. For example, the opcode 0xb8 loads a constant into a register, and takes one 32-bit argument (which is the constant to load).

are assembly and machine code the same thing?

nope!

assembly is a slightly more human-readable programming language that we use to make it easier for humans to write machine code. For example, mov (%rax),%rdx is 0x488b10 in machine code.

assembly is still challenging to read but at least it's not just a bunch of numbers :)

is it possible to translate machine code to assembly?

yes!

you can easily translate machine code to more human readable assembly! The program you use to do this is called a disassembler, like objdump. Example:

$ objdump -d /bin/cat

similarly, you can translate assembly to machine code with an assembler like nasm

when you compile a C program, does it get translated to machine code?

yes!

C code is translated to machine code by your compiler, like gcc or clang or MSVC

when you compile a Java program with javac, does it get translated to machine code?

no!

the machine code that's running when you run a Java program is the JVM interpreter (usually /usr/bin/java or something)

Java programs are compiled, but they're compiled to JVM bytecode, not machine code.

is it possible for a program to modify its machine code while it's running?

yes!

the JVM is an example of a program that does this -- its JIT will compile frequently called bits of JVM bytecode into machine code.

this is also how some software exploits work -- they try to insert new machine code into your memory and run it.