The current crop of source ports are a bit weird - they still don't quite have the full code, but are based on a mixture of code from the (originally stolen/leaked by a Monolith employee) alpha version, some reverse-engineering and some clever eyeballing and guesstimating.
It all seems to have worked out, though. BloodGDX isn't perfect yet, but it's probably far better than trying to play it through DOSBox, and the snippets of footage I've seen of Blood EX look even better, with some slick new UI upgrades.
However we got here, though, Blood is back. It'll never be as big as Duke or Shadow Warrior, but it's a great piece of design.
Technically, most x86 CPUs, with or without SMT, already do process multiple instructions simultaneously.
It's to do with Out of Order Execution, and why they can process more than one instruction per clock cycle (which is called being superscalar).
Each core has multiple sets of execution units, and the scheduler tries to keep them all busy by executing multiple instructions from the one thread.
As you can imagine, there can be issues with doing this, for instance some instructions require data from previous instructions, and you also run into branches in the code.
One of the things they do is they guess which branch it thinks is most likely to be taken, and starts speculatively executing instructions from it. And if it's right, it's just sped up execution. But if it's wrong, it has to throw the results away and start again from the other branch.
So it pays to make the right guess as often as it can. And in order to do that, they have branch prediction algorithms, which take a bunch of factors into account, such as recording a history of how often that branch has been taken before.
But anyway, the point is that with SMT, it can process multiple instructions simultaneously. But it can also do that without SMT too. SMT just allows it to fill even more idle execution units simultaneously than if only one thread is being executed.
Most of the complication from SMT is keeping track of which instructions belong to which thread.
Thanks for the additional information. Why don't CPU's have enough hardware to pre-emptively execute 2 paths in a branch? Eliminating the miss-prediction downside altogether?
Also my comment was based on a GIF animation Intel had on their website back when Nehalem CPU's were the bees knees. It showed pretty much what I said happening. 3 horizontal lines with the center being actively processed and the top and bottom line being different threads, information (represented as big dots) from those threads moving in to the center line to fill it more effectively. The GIF compared that animation to another showing just a single horizontal line representing both data being actively processed and a single thread.