Lack of bit field instructions in x86 instruction set because of patents ?

Nobody · Mar 11, 2009

ISTM that most of the arguments against the X86 come down to
"aesthetics". While I agree that it is ugly, it has shown itself to be
capable of very high performance and extensibility (new instructions,
wider addressability, new addressing modes, etc). You can do low power,
but perhaps not as low as a different architecture 64 bit chip could be,
but I suspect the difference would be modest.

So other than aesthetics, what is wrong with X86?

Aesthetics is the wrong term, as it implies something without any
impact upon functionality.

The x86's architectural ugliness means that a great deal of inefficiency
is involved in getting the current levels of performance. A RISC chip with
comparable performance would require far less silicon and far less power.

Robert Redelmeier · Mar 11, 2009

In alt.lang.asm Nobody said:
That's makes naive code generation easy, but it also makes optimisation
really hard. Optimisation means using all of the registers, not just the
"right" ones. Highly optimised code often uses -fomit-frame-pointer, to
allow EBP to be used as a general-purpose register. Needless to say, that
makes accessing parameters and local variables rather ugly.

Not much. ESP is then used as a frame pointer. The instructions
get a couple of bytes longer and the offsets slightly less readable.

"Premature optimization is the root of all evil" [Knuth].
Also, optimization is not what it used to be. The cost of register
spills has gone down while the cost of mispredicted branches has
gone _way_ up. Processors have not gotten uniformly faster.

-- Robert

Vladimir Vassilevsky · Mar 11, 2009

Joel said:
...hence the reason you don't see traditional x86 CPUs in cell phones, PDAs,
etc...

....yet the newer processors don't offer any significant breakthrough in
the computing performance compared to the x86s.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Vladimir Vassilevsky · Mar 11, 2009

Robert said:
The cost of register
spills has gone down while the cost of mispredicted branches has
gone _way_ up. Processors have not gotten uniformly faster.

And the mispredicted branches are so expensive because of the huge
pipeline required to process the x86 instructions.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Stephen Fuld · Mar 11, 2009

Nobody said:
Aesthetics is the wrong term, as it implies something without any
impact upon functionality.

The x86's architectural ugliness means that a great deal of inefficiency
is involved in getting the current levels of performance. A RISC chip with
comparable performance would require far less silicon and far less power.

I question the use of "far". Others here have said the overhead of
decoding the X86 instructions as a few percent of the total logic.
Besides, on a current desktop or server chip, the overwhelming part of
the silicon is taken up with cache, not CPU logic. So I suspect that
there would be some savings in logic and power, I don't think it would
be "far". And there is some countervailing effect of the smaller
instructions meaning more instructions in a given size I cache, so
perhaps a higher hit rate. I suspect this effect is small, but it is
something.

krw · Mar 11, 2009

And the mispredicted branches are so expensive because of the huge
pipeline required to process the x86 instructions.

Nonsense. All modern bleeding edge processors have long pipes.
x86 has little to do with it.

nmm1@cam.ac.uk · Mar 11, 2009

I question the use of "far". Others here have said the overhead of
decoding the X86 instructions as a few percent of the total logic.

Actually, "nobody" has a point. Architectural ugliness has very little
to do with the instruction set and a great deal to do with the basic
computational model. However, in this respect, many "RISC" designs
are as ugly as the x86 :-(

Take, for example, floating-point and page table management (TLBs).
A well-designed architecture ensures that functionally separate
instructions can be executed independently. But almost all of them
fail to carry that through to interrupt handling, so the first TLB
miss or floating-point exception/fixup causes the pipeline to glitch!
Or doesn't, and causes the FLIH to have the most disgusting hacks to
cover that up, and that STILL leaves a race condition that can cause
serious problems!

My understanding is that a lot of the logic is concerned with trying
to combine aggressive pipelining/parallelism, while still ensuring
that such problems don't cause chaos. Interrupts are just an extreme
case, and there are a zillion others in most architectures, often
at a much lower level.

Besides, on a current desktop or server chip, the overwhelming part of
the silicon is taken up with cache, not CPU logic. ...

Yes. But the x86 is bad there - not as bad as most "RISC" systems,
true. A good design could probably cut the cache requirement very
considerably - or make the current amounts more effective.

Regards,
Nick Maclaren.

hanukas · Mar 11, 2009

...yet the newer processors don't offer any significant breakthrough in
the computing performance compared to the x86s.

They do; per unit of power consumed. Another roadblock: power
consumption at high clock frequencies. Put these two together, go back
a few years and make a roadmap: multi-core architectures emerge.

James Arthur · Mar 11, 2009

John said:
In the US alone, several gigawatt-level power plants are working 24/7
to overcome Intel's and Microsofts crappy designs.

More people would use "suspend" if it worked. More people would turn
off computers if they booted up quicker.

John

"Suspend" used to work on my machine. Now it works if I turn off
the DSL modem.

Grrrrins,
James Arthur

Phil Carmody · Mar 11, 2009

Joel Koltner said:
...hence the reason you don't see traditional x86 CPUs in cell phones, PDAs,
etc...

Well, there were the Nokia 9000 and 9110 Communicators.

Phil

Nobody · Mar 11, 2009

I question the use of "far". Others here have said the overhead of
decoding the X86 instructions as a few percent of the total logic.
Besides, on a current desktop or server chip, the overwhelming part of
the silicon is taken up with cache, not CPU logic.

Cache doesn't consume anywhere near as much power as the "active" parts of
the CPU.

Vladimir Vassilevsky · Mar 11, 2009

Nobody said:
Cache doesn't consume anywhere near as much power as the "active" parts of
the CPU.

On this topic, I see many statements like "as much", "far less",
"overwhelming" and so on. Those adjectives mean nothing.

Can anyone back up his point with the particular facts, figures and
quotations of the sources of the information?

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

krw · Mar 12, 2009

On this topic, I see many statements like "as much", "far less",
"overwhelming" and so on. Those adjectives mean nothing.

Nevertheless, he's right. Caches draw next to nothing, per unit area.
Remember it's STATIC. Nothing is switching, other than the line
currently being accessed. Leakage is far more.

Can anyone back up his point with the particular facts, figures and
quotations of the sources of the information?

No one is going to give specifics publicly, but he's right. Just
think about it.

MooseFET · Mar 12, 2009

In the US alone, several gigawatt-level power plants are working 24/7
to overcome Intel's and Microsofts crappy designs.

More people would use "suspend" if it worked. More people would turn
off computers if they booted up quicker.

My EEPC boots quickly :>

MitchAlsup · Mar 12, 2009

The CISC-to-RISC decoder consumes a negligible fraction of the silicon
and power in modern x86 chips. It's the register renaming, out-of-order
execution, and on-die cache that consume the majority of the silicon and
power these days on all high (single-threaded) performance chips --
regardless of the ISA.

Actually, the decoders in Opteron (pre Barcelona) (all 7 of them) are
smaller than a single 4KByte chunk of SRAM. 4 are one-byte decoders
used when the predecode information is not present, and the other 3
are the multi-byte instruction at a time superscalar decoders.

The out-of-order stuff (reservation-stations/reorder-buffer/future-
file/LS1/2) are several times larger than all the computation circuits
put together (like 5X)

The branch predictor and associated circuitry is larger than all the
computational circuitry put together.

Take the pipeline flip-flops out of all the conputational circuitry
(int, mem, float), and the total area of the computaonal ciruts is
smaller than 4KB of SRAM. Leave the pipeline flip-flops in and the
computation circuitry is still less than 8KBytes of SRAM.

x86 isn't the liability that you think it is.

New instruction idiom recognition decoders are even converting MOV + 2-
op instructions into 3-op instructions so as to execute them in a
single cycle; compare+branch is done similarly, and a few others.

x86 (the instruction set) is not as hard to decode as is SPARC V9+VIS
(and whatever they may have done to it over the last 9 years).

x86 is not any liability whatsoever (excepting perhaps the legal
chalenges that might be brought forth).

Mitch

Eric Northup · Mar 12, 2009

The out-of-order stuff (reservation-stations/reorder-buffer/future-
file/LS1/2) are several times larger than all the computation circuits
put together (like 5X) [...]
Take the pipeline flip-flops out of all the conputational circuitry
(int, mem, float), and the total area of the computaonal ciruts is
smaller than 4KB of SRAM. Leave the pipeline flip-flops in and the
computation circuitry is still less than 8KBytes of SRAM. [...]
x86 is not any liability whatsoever (excepting perhaps the legal
chalenges that might be brought forth).

What about the semantic complexity of the x86 ISA?

I suppose all of the trickier instructions get delegated to microcode,
but isn't there a cost imposed by segmentation or the density of loads
and stores?

I know the CPUs can optimize for the "flat" segment model, and bypass
the logic for bounds checking and adding base addresses. But just
because you can bypass the complex logic doesn't mean it was free.
Perhaps I overestimate how those costs add up, but it seems like
circuitry that must be located ~1 clock cycle's wire delay from the
load/store units occupies some prime real estate. Especially
considering that you're hoping to never use it!

Similarly, I thought having an extra source of faults (segment limit
check violation) contributes to the complexity of the out-of-order
stuff. Those issues would be compounded by the load/store density of
x86/x64 code - those have to be fast paths.

Out of curiosity, is the 4/8KB SRAMs you mentioned for size comparison
the vanilla single-ported variety? Would that be including the vector
unit?

-Eric

nmm1@cam.ac.uk · Mar 12, 2009

The out-of-order stuff (reservation-stations/reorder-buffer/future-
file/LS1/2) are several times larger than all the computation circuits
put together (like 5X)

One of the things that the IA64 got right in principle and wrong in
practice was to try to simplify that area by making it more explicit.
I still think that could be done - but not that way!

The branch predictor and associated circuitry is larger than all the
computational circuitry put together.

Indeed? It is, of course, a computationally intractable (in the CS
sense) task.

Take the pipeline flip-flops out of all the conputational circuitry
(int, mem, float), and the total area of the computaonal ciruts is
smaller than 4KB of SRAM. Leave the pipeline flip-flops in and the
computation circuitry is still less than 8KBytes of SRAM.

Including a full, glorious, optimised IEEE 754 unit? Boggle. If one
adds full support for denormalised numbers, exceptional results and
(heaven help us) decimal floating-point, that will clearly go up,
but not by a huge factor.

x86 (the instruction set) is not as hard to decode as is SPARC V9+VIS
(and whatever they may have done to it over the last 9 years).

That fails to surprise me! I have always been a supporter of RISC,
the principle, and very unimpressed with RISC, the dogma.

x86 is not any liability whatsoever (excepting perhaps the legal
chalenges that might be brought forth).

Grrk. Now, THERE I disagree. It's extremely unclear how to extend
it to allow for scalable parallelism, except by the tried (and not
very successful) heavyweight threading approach. Of course, the
same remark applies to all of the current 'RISCs' ....

Regards,
Nick Maclaren.

Ken Hagan · Mar 12, 2009

In the US alone, several gigawatt-level power plants are working 24/7
to overcome Intel's and Microsofts crappy designs.

More people would use "suspend" if it worked. More people would turn
off computers if they booted up quicker.

Maybe the electricity is cheaper than redesigning the kit? Maybe its the
customer who pays for the electricity and the vendor who pays for the
redesign? Maybe if that electricity came from a nice green, sustainable
nuclear plant it wouldn't actually matter? Maybe it doesn't matter anyway?

Steve · Mar 12, 2009

Phil Carmody said:
Well, there were the Nokia 9000 and 9110 Communicators.

And the good old HP 200LX I play with. 80186, with excellent
battery life. And mine has a FORTRAN compiler to boot.

Cheers,

Steve N.

Martin Brown · Mar 12, 2009

John said:
In the US alone, several gigawatt-level power plants are working 24/7
to overcome Intel's and Microsofts crappy designs.

More people would use "suspend" if it worked. More people would turn
off computers if they booted up quicker.

Suspend *does* work at least on properly configured computers. There are
a few rogue hardware designs that have typically USB peripherals or
other drivers that do not handle suspend gracefully but that isn't the
CPU's fault. Most of the problems reside in buggy 'Doze device drivers.

I have a year old Toshiba portable supplied with Vista that is barely
usable - it regularly disables its own keyboard and its on-off switch
(whatever setting of power save are used). Works OK on XP or even
Win98SE so it is a Vista fault with too-clever-by-half hardware drivers.

Regards,
Martin Brown

Lack of bit field instructions in x86 instruction set because of patents ?

Lack of bit field instructions in x86 instruction set because of patents ?

Nobody

Robert Redelmeier

Vladimir Vassilevsky

Vladimir Vassilevsky

Stephen Fuld

krw

[email protected]

hanukas

James Arthur

Phil Carmody

Nobody

Vladimir Vassilevsky

krw

MooseFET

MitchAlsup

Eric Northup

[email protected]

Ken Hagan

Steve

Martin Brown

Similar threads