Lack of bit field instructions in x86 instruction set because of patents ?

Tom · Mar 13, 2009

Actually, no, they don't. You may not realise it, but a significant
proportion of the increasing unreliability of computer applications
(and it IS increasing) is due to that usage. How much, I am not sure,
but I have seen the signature fairly often.

Could you give a little detail of what constitutes the tell-tales in such
signatures?

thanks

Nobody · Mar 13, 2009

Returning back to the speculations on the power consumption of the
cache. Cache performs the access to all cache lines at every read or
write operation (and the tags, of course), so it is not the idle circuit.

The SRAM cells are idle unless they're actually being read or written.

Tags are active, but:

1. Most of the cache is L2 (e.g. 8KiB L1 vs 256KiB L2 for the original
P4), and that is only accessed if L1 misses.

2. N-way set-associative caches (4-way for L1, 8-way for L2 for P4) mean
that only N tags need to be compared for any given access.

3. Relatively large cache lines (e.g. 64 bytes for P4) in combination
with #2 limit the proportion of the transistors which are used for tags
(active) versus SRAM cells (idle).

dynamic losses = ~ F x C x U^2/2

Cache occupies large area, many transistors, long wires, many inputs and
outputs, big capacitance, big transistors to drive heavy loads, high
dynamic losses and high static losses as well. To me, it is not obvious
how the power consumption of the cache compares to the other parts of
the CPU.

Looking at it from an engineering perspective, if most of the silicon is
used for cache, anything you can do to reduce the power consumption of the
cache will have a greater impact on overall power consumption than a
similar reduction elsewhere. Furthermore, a cache consists of a few
specific building blocks, each replicated a large number of times, so
the impact of any specific design change will be magnified.

Coupled with the fact that there is no inherent *need* for the cache to be
continually transitioning (unlike, e.g. the ALU and register file), I
can't see how the cache *wouldn't* be using far (!) less energy per
transistor than the "core".

Nobody · Mar 13, 2009

If the VM itself is written in C/C++, then I am assuming that the
language that the VM is interpreting is subject to the same limitations
as C/C++.

Not at all. Any interpreter can be converted to assembler, but that
doesn't mean that the language has the same limitations as assembler.

Clearly, anything that can be implemented in any language which runs on a
particular architecture can be implemented in assembler. But different
languages make different things simple and different things complex.

In that regard, a language's "limitations" aren't about what the language
absolutely prohibits (you can write an emulator in almost any language),
but what it makes impractical.

Also, there doesn't have to be a VM, e.g. many ARM chips can execute
Java bytecode natively (Jazelle).

Haskell was mentioned; I don't know the language, but if it's truly
revolutionary to the point it enables new chip designs that can provide
ten times the performance at a quarter the power consumption, I doubt
that it's "incrementally different" from the C/C++/C#/Java/etc. that
we're using today.

No, Haskell is radically different. It's a pure functional language, not
an imperative language. The language itself doesn't have any concept of
mutable state, although there are specific modules (IO and ST) to support
this.

FWIW, the most popular Haskell compiler (GHC) is written in Haskell.
It does have the option to compile to C, although the resulting C code
will look nothing like the original program; in this situation, it's using
C as "portable assembler".

If it were, by now (a) everyone would have switched,
or (b) whatever it is that makes Haskell special would have been added
to the more common languages.

Judging from the most common criticisms levelled at Haskell by people
who had to (superficially) learn it in academia, the biggest stumbling
block seems to be psychological.

Programmers who have become fluent in imperative programming seem to have
a significant aversion to discovering that there are actually other
programming paradigms, and that they are really only fluent in one
specific paradigm rather than in programming generally.

nmm1@cam.ac.uk · Mar 13, 2009

No, Haskell is radically different. It's a pure functional language, not
an imperative language. The language itself doesn't have any concept of
mutable state, although there are specific modules (IO and ST) to support
this.

Not all that radically. In The Great Scheme Of Things, imperative
and functional languages aren't all that different. There are some
MUCH more radical designs! Even Prolog is more different from both
Haskell and Fortran than the latter two are from each other.

Regards,
Nick Maclaren.

Andrew Reilly · Mar 14, 2009

All of MSOffice, Outlook, etc. are written in C.

MSOffice (and large chunks of Windows itself) was originally written in
Pascal, as was quite a bit of Mac software of the same era. No doubt
some or all of it has been re-written in C or C++ in the mean-time, but
it's not necessarily the case.

All those SQL servers
are written in C.

Yeah, probably, but while they're used by lots of people, I doubt that
you could say that a very large fraction of the programming community are
actively involved in their production or maintenance.

All of those web servers and web browsers are written
in C

There are some fairly widely-deployed web servers in Java (Tomcat,
Glassfish). If anyone was starting to write a new SQL server today
(rather than tweak one of the existing ones), I'd be very surprised if
they chose to write it in C.

Firefox is to Javascript a bit like emacs is to emacs-lisp: most of the
user-facing code, and all of the GUI is written in the "scripting"
language, which is rendered by the underlying rendering engine, which is,
indeed, in C. Similarly for Adobe lightroom vs Lua, I believe.

, and the Perl/PHP/Python/ASP/etc. scripts that the servers are
running are being interpreted by a program written in C.

Sure, but there's only one Larry Wall and one Guido van Rossum (and they
each have teams of maintenance helpers, of course). Most of the code
that is written *in* those languages doesn't care what the language
implementation is written in, and indeed there are versions of Perl and
Python that run on top of Java and .NET: no C involved (apart from some
system-interface shim libraries, probably.) There's a Python compiler
written in Python.

The OSes that
they're running on are written in C.

Of course. As I said, that's what it's for. Maybe that will always be
the case, but maybe not. In any case, if someone were to give you a new
Linux distribution that had been re-written in Java or something else,
why would you care, if it offered the same system call API?

Most people's computers spend the vast majority of their time running
code that was, at one time or another, output by a C compiler -- to the
point that virtually nothing else matters except in very specialized
applications.

Yes, but quite a lot of that C has now been generated by a compiler for a
different language, and so does not necessarily have the same code
profile or idiom (or, in particular, propensity for buffer overflow bugs)
as hand-written C.

HPCC is almost all Fortran or C, in my experience, because the libraries
for the clustering code are only available in Fortran and C. Matlab is
fine for small-scale stuff, but I bet that was written in C itself.

Matlab was originally an interpretive wrapper around Fortran BLAS and
LAPAC. I suspect that it is a very complicated beast underneath, these
days. Much of it seems to have been re-written in Java (you can load and
run java objects almost transparently), and it claims to have FFTW3 in
it, and that's C code that was written by an Ocaml program.

The most popular ones are still in C or C++; only the newest stuff is
being written in C# and the like, and that's still running in a VM
that's written in C calling libraries written in C on top of an OS
written in C.

Sun's JVM and libraries seem to be Java almost all the way down. MS's VM
might be based on C, or it might not. I don't know. In any case, both
VMs translate straight from byte-codes to machine language: there's no C
idiom involved in most of that code execution. Same with the new
javascript JIT compilers, and many (most) of the LISP compilers.

If the VM itself is written in C/C++, then I am assuming that the
language that the VM is interpreting is subject to the same limitations
as C/C++.

No, not at all. If a language has strong typing and checked array
accesses, for example, then the generated C code will have those features
too, even though C as a language doesn't. The checks are just more code
that humans typically don't bother to write for themselves.

Same goes for parallel processing/threading/sharing models: if a language
has a useful definition of those, then it can emit whatever is necessary
in C or whatever to get the job done. Might not be as fast as a hand-
tuned C or C++ thread application, where the programmer can make implicit
a lot of the sharing assumptions, but it won't break because of threading
model mistakes, either. Well, that depends on the model of course: it's
pretty easy to break a multi-threaded application written in a wide
variety of languages.

And, of course, that VM is calling libraries written in C and
running on top of an OS that's written in C.

Sure, some of them.

Haskell was mentioned; I don't know the language, but if it's truly
revolutionary to the point it enables new chip designs that can provide
ten times the performance at a quarter the power consumption, I doubt
that it's "incrementally different" from the C/C++/C#/Java/etc.

Sure. Same goes for scheme+termite (or other actor-model
implementations), or Clojure, or erlang. That's why I said that there's
a lot more to be gained by those who *are* prepared to reexamine their
programming preconceptions and fundamentals.

that
we're using today. If it were, by now (a) everyone would have switched,
or (b) whatever it is that makes Haskell special would have been added
to the more common languages.

Indeed, that's what's happening. For example, clojure is more-or-less a
LISP, but it comes with a bunch of pure-functional (immutable) data
structure libraries and other features that are there to make large-scale
parallelism more reliable and easier to program. I don't imagine that
there will ever be a visible "switch-over", but there may well eventually
be a "tipping point" where people start to wonder whether they need to
dare go with "C", or stick with the safer, more parallel language that
they're familiar with...

I've not seen much progress in that direction; most libraries are still
implemented in C/C++, though they may have bindings for "cooler"
languages.

Well, certainly many libraries are like that, but I've started to notice
that large chunks of the libraries in the Perl and Python distributions,
and certainly all of those in the Java collections are no longer wrappers
around the equivalent C library, but are new or re-writes in the native
language. Mostly that's because they can be easier to use when they use
the native language's object model and idioms. Probably simplifies the
library portability and build requirements significantly, too.

That's because they're written for the largest possible
audience, and virtually every language has _some_ way to call C
functions, while the reverse is rarely true. Good luck getting your
Java program to use a library written in Perl or vice versa -- but they
can both use a library written in C.

Sure, but you'll find that most of the useful libraries are already
native in Java, Perl, whatever. Indeed, there's a whole pile of web-
development stuff that more-or-less *only* exists in Perl, Python and
Ruby, and the only way to use that stuff from C, should you want to, is
to import the relative language subsystem as a library.

Cheers,

Nicholas King · Mar 14, 2009

MitchAlsup said:
The out-of-order stuff (reservation-stations/reorder-buffer/future-
file/LS1/2) are several times larger than all the computation circuits
put together (like 5X)

The branch predictor and associated circuitry is larger than all the
computational circuitry put together.

The thing that worries me about basing the cost on the decoder size is
that perhaps an alternative architecture wouldn't need the same level of
branch prediction/O-o-O/register renaming that X86 need.

Ultimately the question that we face is what overhead does the x86
architecture place on the whole thing from the compiler right through
the chips. Would a different architecture be able to provide more
semantics to the chip allowing a simpler design? that could mean that
the x86 overhead question has a different answer than one based solely
upon decoder size.

Cheers,
Nicholas King

nmm1@cam.ac.uk · Mar 14, 2009

Could you give a little detail of what constitutes the tell-tales in such
signatures?

With difficulty :-( Almost all of the time, the signatures could
indicate many possible causes. In most cases, you need to know the
logic of the application fairly well to make an educated guess. The
following is a very high-level description of the signatures, but
the details vary immensely with the application.

The phenomena can occur in serial codes, too, but are typically
more predictable. Not always, unfortunately, so assuming threading
is always a matter of guesswork until you have positively identified
the cause.

Also, note that exactly the same effects can occur from interrupt
handling, which is a comparably broken area. There is really no
programmatic difference between asynchronous interrupt handling and
threading.

The classic one is when you know that an action A was performed, but
a later action B behaves as if A had not taken place. And I mean
"know that it was performed" - e.g. a recheck shows that it was.

Another is when you perform an action A followed by one B and the
latter behaves partly as if A had been performed and partly as if
it had not.

Another is when an action A definitely takes place, but later is
wiped from history. However, this can be due to many causes, and
is common in serial codes, too.

When the above occurs with file updates or GUIs, then it is probable
that the cause is NOT threading, as there are far worse misdesigns
in the POSIX file model and the X Windows event model (which have been
copied, lock, stock and barrel, by Microsoft). The GUI event model
is particularly broken, and is why people with fast reactions often
see a button flash to indicate a click has been received, but the
action that should be taken never happens.

At the source/debugger level, you can see it in dumps and diagnostic
output, when you work through the logic leading to the location, and
there is simply no path that can possibly cause that combination of
values. But, typically, you see that only after you have identified
where the problem must lie.

Regards,
Nick Maclaren.

Nobody · Mar 14, 2009

Not all that radically. In The Great Scheme Of Things, imperative
and functional languages aren't all that different. There are some
MUCH more radical designs!

Such as?

Even Prolog is more different from both
Haskell and Fortran than the latter two are from each other.

I disagree. I consider Haskell to be much closer to Prolog than it
is to Fortran, both being declarative languages.

Steve · Mar 14, 2009

JosephKK said:
I don't think so. Part of the issues inolved in the diffrence was a
cascaded (2nd) programmable interrupt controller, which was necessary
to use the 286.

The XT286 had fewer and smaller expansion slots, a smaller motherboard,
and a slightly different keyboard than the AT. There is no requirement for
a second PIC built into the 286.

Steve N.

Nobody · Mar 14, 2009

That is not a problem with x86 per se; modern RISC chips are just as bad
or, in some cases, even worse. Extracting instruction-level parallelism
from a stream of sequential instructions and hiding memory latency (the
two purposes of the OoO engines) and predicting branches (to hide the
long pipeline which is a result of the OoO engine) are difficult
problems -- not decoding or actually executing the instructions.

Remember, modern x86 chips are really RISC cores with an x86 decoder
slapped on the front; the _only_ burden that x86 imposes compared to a
native RISC chip is the cost of that decoder -- and it's a very, very
tiny fraction of the chip's total cost.

It isn't just about syntax, but semantics.

OoO on the x86 is complicated by having one of the strictest memory
consistency models of any modern CPU, limiting the ability to re-order
memory accesses and requiring a larger store of pending instructions.

It also means that multi-threaded code developed and tested on x86 often
needs additional effort (adding memory barriers) to get it working on
other architectures.

nmm1@cam.ac.uk · Mar 14, 2009

Such as?

Try some of the hardware design languages, for a start!

I disagree. I consider Haskell to be much closer to Prolog than it
is to Fortran, both being declarative languages.

You would be amazed at how much more a declarative language Fortran
is than languages like C and C++! To call it an imperative language
is over-categorising.

Also, when I was involved with Haskell, it assuredly had a control
flow model, in a way that (much of) Prolog didn't. I don't regard
the functional/imperative difference as being as fundamental as the
proponents of the former claim.

But, what the hell? This is a value judgement, and I have always
regarded semantics as more important than syntax - unlike most of
the people involved with language design.

Regards,
Nick Maclaren.

krw · Mar 14, 2009

I can't document it very well, but in gigascasle chips leakage curent
losses (heat) is about equal to switching energy losses. At least so
says my cow-orker who actually did work for Intel in the Pentium sales
support part of the business.

This is true for the gates that are pushed to the max (transistors
with thin-oxide gates, low Vt). Not as much for caches. The
switching power term still applies, though.

Tom · Mar 14, 2009

Thanks, your comments are at a useful level and have useful detail.

Regrettably I don't think you need fast reaction speeds to see
the GUI button push phenomena you mention.

If you want yet another area to worry about, consider the
ways that many people seem to be implementing systems
with "web-services" technology. Typically, if pushed, they'll
dimly remember one or two lectures on "distributed systems"
and their inherant problems - but they don't connect those with
their "web services". If pushed harder so they make the
connection, they'll still manage to avoid thinking about what
they're building, and will "justify" that by presuming that
"the frameworks take care of everything, so that it just works".

But then I'm also old enough to remember when people refused to
believe in hardware synchroniser failures due to metastablity.
Its deja vu all over again

Nobody · Mar 15, 2009

Try some of the hardware design languages, for a start!

Okay; I'll concede that those are radically different to most programming
languages. I'm unsure if it's even fair to consider them as programming
languages; they're more like data languages. Although I suppose you can
consider an emulator to be an interpreter.

To the extent that they are programming languages, they would probably
fall into the declarative family.

You would be amazed at how much more a declarative language Fortran
is than languages like C and C++! To call it an imperative language
is over-categorising.

Also, when I was involved with Haskell, it assuredly had a control
flow model, in a way that (much of) Prolog didn't.

What control-flow model?

Monads aside, Haskell evaluates expressions; there isn't any control to
flow. Issues such as ordering and short-circuit evalutation are
meaningless in the absence of mutable state.

The evalutation strategy is an implementation detail, and only matters
insofar as it affects time/space complexity, plus some corner cases such
as unsafePerformIO.

I don't regard the functional/imperative difference as being as
fundamental as the proponents of the former claim.

I'd suggest that you haven't used functional languages enough, then. The
absence of mutable state makes a significant difference to the way that
programs are written, and an even more significant difference to the
amount of freedom the implementation has regarding evaluation strategy.

nmm1@cam.ac.uk · Mar 15, 2009

Thanks, your comments are at a useful level and have useful detail.

Well, I struck lucky! It's not an easy thing to explain.

Regrettably I don't think you need fast reaction speeds to see
the GUI button push phenomena you mention.

Interesting. Most of the people I speak to claim that they never
see them, even when I have just seen them trip over the effect.

If you want yet another area to worry about, consider the
ways that many people seem to be implementing systems
with "web-services" technology. Typically, if pushed, they'll
dimly remember one or two lectures on "distributed systems"
and their inherant problems - but they don't connect those with
their "web services". If pushed harder so they make the
connection, they'll still manage to avoid thinking about what
they're building, and will "justify" that by presuming that
"the frameworks take care of everything, so that it just works".

God help me, yes! I was at a "Grid" workshop, and I tried to get
one (ANY!) of the project leaders to accept error detection as a
work item - I said that I was happy to do much of the work, but
wasn't prepared to negotiate for a new project. They all said
the above, and I explained that there was a serious, fundamental
flaw in their design, where no response was indistinguishable
from refusal.

I said that an ATM or business might try to validate a card, go
to a broker, which failed to get through. The broker then relies
"no", so the ATM or business reports the card to the "bad debt"
line. I was told that the banks' protocols prevented that.

3 months after returning to the UK, it happened to me. My bank
rang me before I got onto them, apologising for the failure of
their database and saying that they had deleted me from the bad
card register.

This is an example of what I have just said to Stephen Sprunk.
The system is designed to cater for small numbers of such events
but, if there were a million at once, it would fail (not enough
staff). And, if another million happened before the first million
were cleared, the whole system would collapse. No more use of
cards to pay debts or extract cash ....

But then I'm also old enough to remember when people refused to
believe in hardware synchroniser failures due to metastablity.
Its deja vu all over again

And again and again and again and again ....

The only thing that people learn from history is that nobody ever
learnt anything from history.

Regards,
Nick Maclaren.

Jasen Betts · Mar 15, 2009

Even their marketing campaigns, while sometimes completely absurd ("the Internet
was designed to run on Intel processors" -- say what!?

indeed... the internet puts the bytes in integers in the opposite order to intel
x86 processors.

Vladimir Vassilevsky · Mar 15, 2009

Nobody said:
The SRAM cells are idle unless they're actually being read or written.

Tags are active, but:

Not just tags but the cache SRAM is been accessed on every read/write.

BTW, I know a person who actually designs superscalar MIPS processors.
He is a professor in the local university. One of his previous works was
about the power consumption; specifically predicting and identifying the
"hot" spots. I will probably see him during a week; hopefully he can
put some light on the problem.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

krw · Mar 15, 2009

Not just tags but the cache SRAM is been accessed on every read/write.

You're nuts.

BTW, I know a person who actually designs superscalar MIPS processors.

I "know" someone who developed high-performance OoO superscalar
processors too. Nothing as mundane as MIPS, though.

He is a professor in the local university. One of his previous works was
about the power consumption; specifically predicting and identifying the
"hot" spots. I will probably see him during a week; hopefully he can
put some light on the problem.

Trolling in person can get you into trouble.

krw · Mar 17, 2009

Actually, he's kind of right. It depends on the cache organization, of
course but in most high-performance designs, for an instruction fetch or
a load, one accesses:
1. the tags and
2. all the ways of the set
then selects the appropriate way (if there is a cache hit).

That's not what the moron said.

Joe Pfeiffer · Mar 17, 2009

Mayan Moudgill said:
Actually, he's kind of right. It depends on the cache organization, of
course but in most high-performance designs, for an instruction fetch
or a load, one accesses:
1. the tags and
2. all the ways of the set
then selects the appropriate way (if there is a cache hit).

No, somebody tried to cut him some slack and tell him that's how it
works in an earlier post (which I'm not going to try to track down).
He was adamant that the *whole* SRAM is active.

E.g. if the cache is a 4 way cache 16 byte lines and the processor is
trying to access a 4 byte word, it would read the 4 tags and
_at_least_ 4 bytes from each of the 4 lines. Depending on the RAMs
used, one might read 4x8 bytes or even 4x16 byes.

One could design the processor to access the tags first, then based on
the hit resolution, access only one of the ways. However, this adds
extra cycles on the load/ifetch path, so a high-performance design
would not do this.

I've seen that called a "phased cache".

Lack of bit field instructions in x86 instruction set because of patents ?

Lack of bit field instructions in x86 instruction set because of patents ?

Tom

Nobody

Nobody

[email protected]

Andrew Reilly

Nicholas King

[email protected]

Nobody

Steve

Nobody

[email protected]

krw

Tom

Nobody

[email protected]

Jasen Betts

Vladimir Vassilevsky

krw

krw

Joe Pfeiffer

Similar threads