The 8051's code (ROM or Flash ROM) bus is 8 bits wide. That means that every instruction must be a multiple of 8 bits in size. For simplicity, the designers decided that every instruction would start with an 8-bit opcode, which is passed to the instruction decoder to determine the type of instruction and the number of operand bytes that will follow, and either 0, 1, or 2 bytes of operand data. This gives a nice clean design where the first byte of an instruction fully defines that instruction, but the instruction only occupies as many bytes as are required to completely specify what the instruction should do.
This is not entirely true. For many instructions, the 8-bit opcode contains bits that are actually logically part of the operand information. This is done so that commonly used operations can be reduced in size, at the expense of using more initial opcode values to express them. See the INC Rn and AJMP instructions below.
Here are some examples.
NOP - encoding 00000000
INC A - encoding 00000100
These are single-byte instructions. They have no operand bytes. The total instruction size is 1 byte (8 bits).
INC Rn - encoding 00001rrr
MOV A,Rn - encoding 11101rrr
MOV A,@Ri - encoding 1110011i
These are single-byte instructions where the operand is specified as part of the opcode. In the first two instructions, three bits in the opcode byte are used to specify which of the eight working registers is to be used. This means that INC Rn actually uses eight of the 256 possible opcode values. This was done because it's a very commonly used instruction and it is important to keep it short (one byte), at the expense of using eight opcode values instead of just one.
The MOV A,@Ri instruction has only one opcode bit used to specify the register number, so only two registers can be used as pointers for indirection. This is a reasonable compromise because most operations that are performed using indirect access only need two independent pointers.
AJMP addr11 - encoding aaa00001 aaaaaaaa
This instruction occupies two bytes, and the operand data is split between the opcode byte and the operand byte. The operand byte contains the bottom eight bits of the target address, and the opcode byte contains the top three bits of it. Together this allows an 11-bit-wide target address to be specified, at the expense of using eight of the 256 possible opcode values.
ANL A,#data - encoding 01010100 dddddddd
ANL A,direct - encoding 01010101 aaaaaaaa
INC direct - encoding 00000101 aaaaaaaa
JC relative - encoding 01000000 rrrrrrrr
SETB bit - encoding 11010010 bbbbbbbb
These are traditional two-byte instructions. The opcode fully defines the instruction, and contains none of the operand information. The second byte contains all of the operand information. These instructions use four different types of operand data.
dddddddd is an immediate data value, i.e. a constant in the code. aaaaaaaa is an absolute address that specifies a memory location within the first 256 addressable data locations (registers, RAM and SFRs). rrrrrrrr is an 8-bit signed offset for a conditional branch instruction that specifies an address from -128 to +127 relative to the start of the following instruction. bbbbbbbb is a bit-address specification that specifies an individually addressable bit.
CJNE A,#data,relative - encoding 10110100 dddddddd rrrrrrrr
JB bit,relative - encoding 00100000 bbbbbbbb rrrrrrrr
LJMP addr16 - encoding 00000010 hhhhhhhh aaaaaaaa
MOV direct,#data - encoding 01110101 aaaaaaaa dddddddd
ORL direct,#data - encoding 01000011 aaaaaaaa dddddddd
These are assorted three-byte instructions. Each one consists of an 8-bit opcode that fully defines the instruction type, followed by two 8-bit operands of various types. The only exception is LJMP addr16 where the operand is two halves of a 16-bit address that specifies anywhere in the 16-bit-wide code address space as the destination of the jump.
The 8051's instruction set is actually pretty thorough, well-thought-out, mature, and powerful (for an 8-bit device), especially considering its age. It has instructions to perform nearly every commonly needed operation. If you want to see an instruction set that's simpler to understand, look at the 6502. Apart from its zero-page handling, it is a much more straightforward design. All instructions contain a single opcode byte that fully defines the instruction and contains no operand informaiton, followed by 0, 1 or 2 operand bytes.
Please read this description thoroughly, study the instruction encodings, and make sure you understand every sentence. There is a lot of information here in a small space. Study the 8051 instruction set to reinforce your understanding of the general ideas I've described here.