pouët.net

Go to bottom

Extended Instruction Set

category: code [glöplog]
Quote:
Instruction that applies mask to some data so result is all the masked bits gathered on the right(or left I don't care).


I think the Intercal language has something like that.

Anyway, all of this should be done by hardware.
trace_ray
added on the 2013-01-11 22:54:12 by xeron xeron
Quote:
Come up with any super useful, crazy or just silly stuff, but keep it sort of realistic.

Here is an example: instruction that applies mask to some data so result is all the masked bits gathered on the right(or left I don't care).

that instruction was added in the haswell new instructions, coming soon to a x86 processor near you: http://software.intel.com/sites/default/files/m/f/7/c/36945
"PEXT - Parallel Bit Extract" (in chapter 7.2)
added on the 2013-01-11 23:20:48 by ryg ryg
Quote:

that instruction was added in the haswell new instructions, coming soon to a x86 processor near you: http://software.intel.com/sites/default/files/m/f/7/c/36945
"PEXT - Parallel Bit Extract" (in chapter 7.2)


Nice. PEXT and his counterpart can be used to do morton order conversion of 2D coordinates.

Back in the days when I was writing software texture mappers for x86 I would have given away an arm for that.

added on the 2013-01-12 08:37:50 by torus torus
re: hardware GC
http://en.wikipedia.org/wiki/Intel_i432
added on the 2013-01-12 09:30:09 by yzi yzi
SSE2 integer horizontal adds. Proper ones that sum up the entire register (16x8 → 1x64, 8x16 → 1x64, or 4x32 → 1x64), not like the useless PHADD* stuff for float that nobody would ever want to use. (I mean, seriously… what's the point of such instructions if it's faster to shuffle-and-add?)

The closest you get currently seems to be either abusing the muladds, or shuffle-and-add, depending on the input size.
added on the 2013-01-12 23:20:22 by Sesse Sesse
A set of instructions with new design of memory controller where operation could be done very efficiently in parallel on different address in memory. Maybe by splitting memory into small units and having a tiny ALU into each unit. If each 64kb of my PC would have a 6510 attach to it, for 2 gigs of RAM, I would end up with 32768 CPU.
added on the 2013-01-13 03:46:54 by F-Cycles F-Cycles
torus: way more general than morton actually. you can do almost the whole family of patterns described here with pdep/pext. then again, you can step those directly in your interpolation too, it's not that expensive :)
added on the 2013-01-13 05:00:24 by ryg ryg
xenon: Come on, that's practically just an overly specialized version of what I suggested!
added on the 2013-01-14 11:17:16 by kusma kusma
There are many ways to make instruction sets nicer, a fundamental one is removing result-use stalls for condition codes. ARM has a sort of solution, but it can be improved. Other improvements remove the instruction decode stage, which means fixed 4-8 byte instructions like a few 70s supercomputers had. Usually it's enough to overlap that stage, and since memory bandwidth gets worse with longer instructions, it's not really a great idea. But it could be a nice instruction set :)

For performance, I'd say it's about time we got smart memory. Instead of changing a chunk of memory with a loop, you send its built in little ALUs 1 instruction and the address range. Could be done on the cache line level.

For CPUs with caches, I think many would drool over actual detailed control over what's in the caches. ;) Some instructions for that wouldn't hurt.

A CPU with a smart cache coupled to a bunch of stack registers could actually do without conventional registers, and only the number of bits to encode stack#+offset would set the limit to how many registers you could use in a function, and you could have symbols for the variables on the "stack" instead of fixed register numbers or cumbersome aliases. The variables could also be of varying size, making byte-juggling effortless. You'd get a massive freedom from register-management on an 8-bit "accumulator" type CPU, and of course others too.

A binomial instruction could have many uses and would remove the need for many tables. But only if it was faster than a LUT read ;)

added on the 2013-01-14 18:06:21 by Photon Photon
@kusma
OK, how about:

print_real_edible_warm_pizza_out_of_the_printer
added on the 2013-01-14 23:11:28 by xeron xeron
hund.

as in,

Code:hund r1, 42
added on the 2013-01-15 04:27:56 by ryg ryg
ryg, Photon: Whoa, that was informative.

Quote:
hund.

but will it fetch?
added on the 2013-01-15 14:18:03 by a13X_B a13X_B
in the future, we will only need photons, muons, bosons and gluons
added on the 2013-01-15 15:24:46 by Tigrou Tigrou
I just thought that it would be really nice to have all the blending operations in hardware but there is couple of problems: cool ones are really ugly(in terms of complexity) and I'd rather have them on my GPU.
added on the 2013-01-24 16:45:48 by a13X_B a13X_B
swizzling is nothing. none of you have yet figured that this is gonna revolutionaze intercal!
added on the 2013-01-24 17:05:26 by 216 216
except of course pulkomandy. i could try reading the whole shit first
added on the 2013-01-24 17:07:23 by 216 216

login

Go to top