Lack of bit field instructions in x86 instruction set because of patents ?

Nobody · Mar 12, 2009

On this topic, I see many statements like "as much", "far less",
"overwhelming" and so on. Those adjectives mean nothing.

Can anyone back up his point with the particular facts, figures and
quotations of the sources of the information?

I doubt it; if this information is available at all, it would almost
certainly require an NDA.

But it's no secret that the power consumption for digital logic is
dominated by energy-per-transition rather than quiescent current.

krw · Mar 12, 2009

To-Email- said:
Careful. Don't spread it around. The AGW crowd will want to tax each
transition :-(

Please! The last one will be taxing enough.

http://townhall.com/Columnists/DanKennedy/2009/03/11/but_what_if_th
e_rich_refuse_to_be_eaten)

Bumper sticker: "Don't buy until he's gone"

Vladimir Vassilevsky · Mar 12, 2009

Nobody said:
I doubt it; if this information is available at all, it would almost
certainly require an NDA.

))))))

Nobody knows anything but everybody has the invaluable opinion. That's
the essense of the leftism - weenism.

But it's no secret that the power consumption for digital logic is
dominated by energy-per-transition rather than quiescent current.

That's true for 3.3V, it depends for 1.5V, and it is pretty much not
true for below one volt high speed logic. Consider the fanout and the
stray capacitance as well.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

krw · Mar 12, 2009

))))))

Nobody knows anything but everybody has the invaluable opinion. That's
the essense of the leftism - weenism.

I *know*. You won't listen. *You* are the essence of weenieism.

That's true for 3.3V, it depends for 1.5V, and it is pretty much not
true for below one volt high speed logic. Consider the fanout and the
stray capacitance as well.

Absolute horseshit.

krw · Mar 12, 2009

To-Email- said:
[email protected] says...> [snip]

Nobody knows anything but everybody has the invaluable opinion. That's
the essense of the leftism - weenism.

Click to expand...

I *know*. You won't listen. *You* are the essence of weenieism.

But it's no secret that the power consumption for digital logic is
dominated by energy-per-transition rather than quiescent current.

That's true for 3.3V, it depends for 1.5V, and it is pretty much not
true for below one volt high speed logic. Consider the fanout and the
stray capacitance as well.

Click to expand...

Absolute horseshit.

Click to expand...

Don't you just love all our resident "experts"?

Expert == "Has-been drip under pressure"

Vlad has never been.

But I notice our OP is "hotmail", so I would have never noticed,
except for you feeding the troll ;-)

Prince Vlad? Troll? No!?

Robert Myers · Mar 12, 2009

One of the things that the IA64 got right in principle and wrong in
practice was to try to simplify that area by making it more explicit.
I still think that could be done - but not that way!

Maybe not. Maybe, as someone said to me that he had observed very
early in the game, the compiler is just too far from the action.

Zillions of transistors committed to OoO, branch prediction,
speculative execution, etc. are plenty close to the action, but, now
we have to worry about the fact that they eat power.

It was the failure of Dynamo-RIO and of Transmeta that puzzled me.
Why is this so fundamental? Why is it either Terje (or equivalent),
zillions of transistors burning watts, or live with it?

Surely it must be possible to have *something* scope what's actually
happening and respond appropriately... or is it that the computational
task is roughly the same as building a machine to pass the Turing
test?

Robert.

nmm1@cam.ac.uk · Mar 12, 2009

Maybe not. Maybe, as someone said to me that he had observed very
early in the game, the compiler is just too far from the action.

That's why I said what I said. They forgot that the architecture
is a protocol to be used by the compiler to communicate the semantics
of the program to the hardware, and not a set of laws for the compiler
to fit itself into.

I have posted in the past what I think of better approaches, and
mostof the hardware people seem to agree that they would be easy
to implement. They wouldn't be hard to compile, either - from the
right sort of language (e.g. including Haskell, the better class of
Fortran program, but definitely not C and C++)! Whether they are
as effective as I think they might be is less clear.

Unrealistic? Perhaps. But what if you could get 10 times the
performance for 1/4 the power consumption? Wouldn't that be worth
a revolution?

Regards,
Nick Maclaren.

Richard The Dreaded Libertarian · Mar 12, 2009

Nobody knows anything but everybody has the invaluable opinion. That's
the essense of the leftism - weenism.

That's the essence of any disciple, leftist or "right"-ist. "The Poobah
says it, I believe it, that settles it!"

Unfortunately, they then vote for the poobah of their choice, tweedledumb
or tweedleduh.

Sigh.
Rich

Nobody · Mar 13, 2009

That's true for 3.3V, it depends for 1.5V, and it is pretty much not
true for below one volt high speed logic. Consider the fanout and the
stray capacitance as well.

How does stray capacitance increase the current drawn by a stable circuit?
If anything, it's going to increase the energy required for transitions
(I=C.dV/dt, increase C => increase I => increase I^2.R).

Vladimir Vassilevsky · Mar 13, 2009

Nobody said:
How does stray capacitance increase the current drawn by a stable circuit?
If anything, it's going to increase the energy required for transitions
(I=C.dV/dt, increase C => increase I => increase I^2.R).

Returning back to the speculations on the power consumption of the
cache. Cache performs the access to all cache lines at every read or
write operation (and the tags, of course), so it is not the idle circuit.

dynamic losses = ~ F x C x U^2/2

Cache occupies large area, many transistors, long wires, many inputs and
outputs, big capacitance, big transistors to drive heavy loads, high
dynamic losses and high static losses as well. To me, it is not obvious
how the power consumption of the cache compares to the other parts of
the CPU.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

krw · Mar 13, 2009

Returning back to the speculations on the power consumption of the
cache. Cache performs the access to all cache lines at every read or
write operation (and the tags, of course), so it is not the idle circuit.

I hope you aren't an engineer. So far you're doing as well as
DimBulb.

dynamic losses = ~ F x C x U^2/2

Cache occupies large area, many transistors, long wires, many inputs and
outputs, big capacitance, big transistors to drive heavy loads, high
dynamic losses and high static losses as well. To me, it is not obvious
how the power consumption of the cache compares to the other parts of
the CPU.

Of course it's not obvious to you. You're dumb as a stump.

nmm1@cam.ac.uk · Mar 13, 2009

And what percentage of software in use today is written in a language
other than C or C++ (or a language written on top of one of those)? How
many professional programmers (i.e. not academics) are learning and
using those languages? You'd have to have something amazingly
revolutionary to throw away the collective knowledge of an entire industry.

Didn't I imply that?

But think of it the other ways round - no radical change will be
accepted - no significant progress is possible without radical
change - ergo?

Worse, I'm not even sure that "10 times the performance for 1/4 the
power consumption" is enough to motivate most people to switch; that's
only a few years of Moore's Law -- probably less time than it'd take the
industry to learn your new system, buy and deploy the machines, etc. Do
you have a roadmap to how you'd continue to improve the performance of
your chips after the first release? ...

There are three points there, to which I regretfully agree with the
first :-(

And how does Moore's Law help? C and C++ are inherently serial - and,
yes, I know about the current activities to add threading of a POSIX
style. After 30 years of experience with that, we KNOW that (a) it
doesn't scale and (b) virtually nobody can use it correctly except in
the simplest cases. I am giving a seminar shortly where I point out
that the CPUs of 2015 will be the same speed as ones of today, but
with 32 cores. That's not just me saying that, as you know.

The whole point about such a radical redesign is that it would lead
to a roadmap for scalable, usable parallelism. My assertion is that
(today) any proposal for radical change that doesn't do BOTH of those
is pointless.

But I don't expect to see that happen in any form before I retire!

Regards,
Nick Maclaren.

Andrew Reilly · Mar 13, 2009

And what percentage of software in use today is written in a language
other than C or C++

Really: most of it, I suspect. Yes, C (and perhaps C++) is liked by
embedded systems and operating systems people, and that's fair: that's
what it's for. I think that you'll find that most of the web-deployed
applications are being written in perl/php/python/ruby/maybe-java/
something-proprietary. All of the "Web2.0" applications are being
written in the those plus a large dollop of JavaScript or ActionScript.
Most of the in-house corporate one-offs are probably still being written
in Visual Basic or Excel or some reporting layer on top of SQL.

Most exploratory numerical code is probably being written in Matlab or R
or S, or IDL or the like. Maybe a few die-hard Fortran hands, and maybe
a few bleading-edge NumPy/Fortress/whatever...

Yes, there's clearly quite a few folk beating GUI applications out of C+
+, but surely it can't take that much longer for them to realize that
that's a losing game.

(or a language written on top of one of those)?

Why would that matter? Although a lot of the newer ones are being built
on JVM or .NET or the like, and that's only tenuously C-related these
days.

How
many professional programmers (i.e. not academics) are learning and
using those languages? You'd have to have something amazingly
revolutionary to throw away the collective knowledge of an entire
industry.

Most of the newer languages that could be interesting are incrementally
different from the languages professional programmers already know.
There's no need to throw everything away and start from scratch (although
there are some interesting things to be learned by those who do.)

I suspect that by the time that the everyman-programmer really has no
alternative than to change to a parallel model they will most probably
already be programming in something other than C or C++ for completely
different reasons (safety, productivity, where the cool libraries and
toolkits are, etc.)

Cheers,

Chris M. Thomasson · Mar 13, 2009

Didn't I imply that?

But think of it the other ways round - no radical change will be
accepted - no significant progress is possible without radical
change - ergo?

There are three points there, to which I regretfully agree with the
first :-(

And how does Moore's Law help? C and C++ are inherently serial - and,
yes, I know about the current activities to add threading of a POSIX
style. After 30 years of experience with that, we KNOW that (a) it
doesn't scale and (b) virtually nobody can use it correctly except in
the simplest cases. I am giving a seminar shortly where I point out
that the CPUs of 2015 will be the same speed as ones of today, but
with 32 cores. That's not just me saying that, as you know.

POSIX threading model aside for a moment... C++ will finally allow an expert
to create highly efficient portable non-blocking algorihtms that can indeed
scale up to 32 cores and beyond. C/C++ are very versatile. You can use C/C++
and highly platform specific techniques to create user-space RCU today. As
you know, RCU can scale to a boatload of processors and is NUMA friendly.

I am all for NUMA models that have _very_ weak cache coherency mechanism;
AFAICT, its basically the only way to scale. Luckily, for me anyway, C/C++
can address these architectures quite nicely.

The whole point about such a radical redesign is that it would lead
to a roadmap for scalable, usable parallelism. My assertion is that
(today) any proposal for radical change that doesn't do BOTH of those
is pointless.

What type of threading model do you have in mind?

But I don't expect to see that happen in any form before I retire!

:^o

Chris M. Thomasson · Mar 13, 2009

Chris M. Thomasson said:
Didn't I imply that?

But think of it the other ways round - no radical change will be
accepted - no significant progress is possible without radical
change - ergo?

There are three points there, to which I regretfully agree with the
first :-(

And how does Moore's Law help? C and C++ are inherently serial - and,
yes, I know about the current activities to add threading of a POSIX
style. After 30 years of experience with that, we KNOW that (a) it
doesn't scale and (b) virtually nobody can use it correctly except in
the simplest cases. I am giving a seminar shortly where I point out
that the CPUs of 2015 will be the same speed as ones of today, but
with 32 cores. That's not just me saying that, as you know.

Click to expand...

POSIX threading model aside for a moment... C++ will finally allow an
expert to create highly efficient portable non-blocking algorihtms that
can indeed scale up to 32 cores and beyond. [...]

However, I agree that there are very few experts that can actually create
these types of exotic algorihtms. I personally don't have a problem, and
have been creating and implementing scaleable synchronization techniques for
years, but that puts me in a fairly narrow minority. Oh well.

;^(...

nmm1@cam.ac.uk · Mar 13, 2009

POSIX threading model aside for a moment... C++ will finally allow an
expert to create highly efficient portable non-blocking algorihtms that
can indeed scale up to 32 cores and beyond. [...]

Click to expand...

However, I agree that there are very few experts that can actually create
these types of exotic algorihtms. I personally don't have a problem, and
have been creating and implementing scaleable synchronization techniques for
years, but that puts me in a fairly narrow minority. Oh well.

Yes and no. The answer to your other remark is the killer:

I am all for NUMA models that have _very_ weak cache coherency mechanism;

AFAICT, its basically the only way to scale. Luckily, for me anyway, C/C++
can address these architectures quite nicely.

Oh, no, they can't! That's precisely the problem. The new C++ standard
will move some way towards that - but ONLY if you use none of the C
features in C++ (including cstring and, even worse, some C++ features
that inherit their semantics from C).

The point is that there is no consistent consistency model for either
C or POSIX, and the C++ addresses only the pure C++ aspects. Unless
it has been vastly extended since I tried to get that aspect addressed.

Regards,
Nick Maclaren.

Chris M. Thomasson · Mar 13, 2009

POSIX threading model aside for a moment... C++ will finally allow an
expert to create highly efficient portable non-blocking algorihtms that
can indeed scale up to 32 cores and beyond. [...]

Click to expand...

However, I agree that there are very few experts that can actually create
these types of exotic algorihtms. I personally don't have a problem, and
have been creating and implementing scaleable synchronization techniques
for
years, but that puts me in a fairly narrow minority. Oh well.

Click to expand...

Yes and no. The answer to your other remark is the killer:

I am all for NUMA models that have _very_ weak cache coherency mechanism;

AFAICT, its basically the only way to scale. Luckily, for me anyway,
C/C++
can address these architectures quite nicely.

Click to expand...

Oh, no, they can't! That's precisely the problem.

This is a _major_ cop out, but I do indeed make heavy use of compiler and
architecture specific techniques/guarantees to get the job done. For
instance, I create most of my sensitive synchronization algorihtms in
externally assembled libraries and link them into a C program, with
link-time optimizations turned off course. So, I should of really said that
assembly language, and some specific C/C++ compilers (e.g., GCC) can be used
to address NUMA models with weak CC. You can get some degree of portability
this way, but its definitely not fully portable in any way, shape or form.
It can be a pain to port synchronization algorihtms to new architectures
because I have to rewrite all of the damn assembly language files, and then
_hope_ a C compiler that gives me the guarantees I need will be available.
Basically, if you like to juggle running chainsaws, and you have patience,
you can use C/C++ and ASM to bring great scalability, throughput and
performance characteristics to concurrent programs.

The new C++ standard
will move some way towards that - but ONLY if you use none of the C
features in C++ (including cstring and, even worse, some C++ features
that inherit their semantics from C).

Yeah. I am mostly interested in the fairly fine-grain memory barriers,
specifically the relaxed barriers and data-dependant loads, that should be
incorporated into the standard. It pleases me to know that Paul E. McKenney
is giving his advise in the development process...

The point is that there is no consistent consistency model for either
C or POSIX,

POSIX guarantees absolutely nothing if you don't use locks to guard any
access to shared data. So, if you follow the standard, it can be extremely
difficult to scale. There are some things you can do, but they have there
limitations:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a23aeb712e8dbdf9

This seems to scale better than most native POSIX rw-locks, however, the
overhead in the write access is increased. Or:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/776f6842784072f2

This allows for concurrent mutations, however it has limitations wrt
traversals:

http://groups.google.com/group/comp.programming.threads/msg/356576741aed8f06

http://groups.google.com/group/comp.programming.threads/msg/3cd613e8b72e5ace

I could easily use RCU to manage the traversal, but then I lose all sense of
portability. Therefore, I conclude that its very difficult, if not
impossible, to scale using 100% pure PThreads.

and the C++ addresses only the pure C++ aspects.

Yes; your right.

Unless
it has been vastly extended since I tried to get that aspect addressed.

I haven't been following the cpp-threads group lately. I have made some
comments on that list. Sadly, it does seem like there are a few people on
there that don't see a need for fine-grain membars; they seem to think that
sequential consistency is all that is needed because the only programmers
that would ever use fine-grain barriers are hard core thread monkeys that
are few and far between. Its good that Paul E. McKenney has seemed to
successfully convinced them that data-dependant load barriers are an
essential tool.

nmm1@cam.ac.uk · Mar 13, 2009

This is a _major_ cop out, but I do indeed make heavy use of compiler and
architecture specific techniques/guarantees to get the job done. For
instance, I create most of my sensitive synchronization algorihtms in
externally assembled libraries and link them into a C program, with
link-time optimizations turned off course. So, I should of really said that
assembly language, and some specific C/C++ compilers (e.g., GCC) can be used
to address NUMA models with weak CC. You can get some degree of portability
this way, but its definitely not fully portable in any way, shape or form.
It can be a pain to port synchronization algorihtms to new architectures
because I have to rewrite all of the damn assembly language files, and then
_hope_ a C compiler that gives me the guarantees I need will be available.
Basically, if you like to juggle running chainsaws, and you have patience,
you can use C/C++ and ASM to bring great scalability, throughput and
performance characteristics to concurrent programs.

Yes, that describes the situation. But what you are really doing is
using C/C++ as a syntactic harness for some wholly implementation-
dependent semantics. That's where C started, after all

That can also be done in any other language, and used to be done very
extensively in Fortran. But there's no way that it will come back to
the mainstream - damn few people can handle that sort of thing (and,
yes, juggling running chainsaws is the right analogy).

Regards,
Nick Maclaren.

nmm1@cam.ac.uk · Mar 13, 2009

Haskell was mentioned; I don't know the language, but if it's truly
revolutionary to the point it enables new chip designs that can provide
ten times the performance at a quarter the power consumption, I doubt
that it's "incrementally different" from the C/C++/C#/Java/etc. that
we're using today. If it were, by now (a) everyone would have switched,
or (b) whatever it is that makes Haskell special would have been added
to the more common languages.

It's very conventional - just a different convention! Functional
languages.

Regards,
Nick Maclaren.

nmm1@cam.ac.uk · Mar 13, 2009

If your processor is 10 times faster than today's x86 chips, then x86
chips in ~5 years will have caught up with it. Of course, if you can
increase just as rapidly (which nobody has ever managed to do for long),
you might get some converts, but it's still only a single order of
magnitude faster.

We are at cross-purposes. I am NOT talking about running any faster
(serially) - I am talking about getting the performance out of the
multiple cores. Currently, that ain't happening, except in HPC,
video rendering and a few other specialist, embarrassingly parallel
applications.

From all appearances, virtually nobody can manage to write C code that
doesn't have potential buffer overruns all over the place. That's a far
bigger problem for most projects, but it still doesn't stop people from
using C.

God help us, yes. But the first spectacular accident that is blamed
on that will change things. And I don't mean a diddy little thing
like an airliner crashing, killing 300 people - I mean a chemical
plant going up near a population centre in the USA, total failure of
air traffic control systems for 6 months, complete collapse of the
banking system for a month (not the current loss of confidence, no
transactions), and so on. It will happen - but when?

Lots of folks use POSIX threads and it mostly works; even more people
use Windows threads, which are roughly the same, and those mostly work,
too. As the saying goes, "good enough" is the enemy of "great" -- and
you're trying to sell "great" in a world that has already bought several
"good enough" solutions.

Actually, no, they don't. You may not realise it, but a significant
proportion of the increasing unreliability of computer applications
(and it IS increasing) is due to that usage. How much, I am not sure,
but I have seen the signature fairly often.

That seemed rather obvious several years ago. What is not obvious is
how to take advantage of all those cores...

Yup. And tackling that problem is precisely my point!

You'd need two or three orders of magnitude in performance gains before
people will accept a radical change. ...

Not a problem. But just not before I retire. By 2025, certainly.

Regards,
Nick Maclaren.

Lack of bit field instructions in x86 instruction set because of patents ?

Lack of bit field instructions in x86 instruction set because of patents ?

Nobody

krw

Vladimir Vassilevsky

krw

krw

Robert Myers

[email protected]

Richard The Dreaded Libertarian

Nobody

Vladimir Vassilevsky

krw

[email protected]

Andrew Reilly

Chris M. Thomasson

Chris M. Thomasson

[email protected]

Chris M. Thomasson

[email protected]

[email protected]

[email protected]

Similar threads