Skybuck said:
Hello,
It seems CPU's nowadays have prediction logic, to try and predict which
branch will be taken, and prepare/execute those instructions (pipelining
etc), if it was miss-predicted the performance suffers.
Why not execute both branches in two seperate logic units etc... and once
the real outcome is know continue with one of the two prepared branches ?
Bye,
Skybuck.
Hi all,
While many of the arguments here are correct (power, complexity, etc.)
the idea is also flawed at the theoretic level. It "may" give some
performance, but for a non-trivial reason.
All the people that proposes it make the mistake to compare to different
machines for the cases with and without eager execution (that is what we
called it in the past.) In the post above, we have the term "separate"
logic unit. If you are willing to add a logic unit, why not to use it
for the standard code as well? (I will come back to this later, so don't
kill me yet)
So, the merits of this idea must be compared when the machine back end
is given. Make it twice as big (of what?) if you want, but now use it
for both cases.
The two cases, for low confidence branches, are:
1) I predict the branch and run the predicted flow alone. It runs at the
full CPU speed.
2) I run both flows. Each have half the resources so it runs at half
speed (yes, I know, just wait)
Now, if my probability teacher was right, I expect case one to run at
full speed x branch prediction probability. If the confidence predictor
tells me a low confidence prediction is ~70% (this is usually the case,
I was told), then the performance is ~70% of full speed.
For the second case, I always run at 50% of full speed.
Hence, eager execution is a statistic loss for branches with more than
50% confidence.
Can we use it only for branches with less than 50% confidence? Well, if
you think the confidence is less than 50%, just flip the prediction.
Ok, but I made to assumptions: 1) I can make a bigger machine (double)
and get twice the speed for one flow. 2) Each flow can use 100% of what
its given, so when I run two flows, I get 50%.
In fact, these two assumptions mean that the amount of ILP in the CPU,
in the two cases, are the same. But the two flows are by definition
independent, so if I use eager execution, I rise the instruction level
parallelism. This is in fact the principle behind SMP (see Intel
Hyperthreading.) If, for example, the two flows run at 60% of full
speed, then the aggregated performance is 120%.
But this is a weak argument, since the extra parallelism doesn't give
much on real machines, and the predictors have a pretty good level of
confidence, even for weird or unknown branches.
Eli
Discalimer: I work for Intel, but this has NOTHING to do with it, and
NOTHING must be inferred from it.