JS performance and global eval() (attn: p01 and cb/adinpsz)
category: code [glöplog]
tl;dr: If eval(blah) makes your JS code slow down to a crawl, try using (1,eval)(blah) instead.
This exchange in the comments on Fabrik caught my eye:
This sounded exactly like the problem I had on Fake Plastic Cubes - the performance was dramatically worse when the demo was run from inside the packer - so I was really excited to hear about a possible workaround (and especially happy that cb found it just as we were 5 minutes away from cancelling the browser 4K compo at Revision :-) ) After a lot of hunting on the web, I found this blog post - in particular, see the comments about the (1,eval) trick, which would seem to be the most concise way to perform an indirect eval.
I don't completely follow the logic of why code optimisers have a problem dealing with indirect eval, or why the local scope cripples performance so much... but boy, does it work. I tried changing the setTimeout(T,1) call in Fabrik to eval(T), and sure enough, it took about three minutes (and about 10 'script unresponsive' messages) for the intro to start in Chrome. I then changed it to (1,eval)(T)... instant fix. I tried the same trick on Fake Plastic Cubes, and it stopped the audio from skipping on my macbook. (The visuals still have that crap jumpy framerate due to the single threading, but... meh.)
I suspect this will also fix the performance issue I was having with JSSpeccy v2, which generates and eval()s an enormous switch statement to use as the Z80 core. Another project I'll have to resurrect, then...
This exchange in the comments on Fabrik caught my eye:
Quote:
Any particular reason to prefer setTimeout(T,1) over eval(T) ?
added on the 2012-04-10 16:34:09 by p01
Quote:
p01>cb tells me: "in the rush for the deadline, it's the only way I found to execute the code outside the JS context of the <img> tag. Calculation were too slow if not (even with local variables)". He will dig the thing. Feel free to contact him
added on the 2012-04-11 10:26:04 by wullon
This sounded exactly like the problem I had on Fake Plastic Cubes - the performance was dramatically worse when the demo was run from inside the packer - so I was really excited to hear about a possible workaround (and especially happy that cb found it just as we were 5 minutes away from cancelling the browser 4K compo at Revision :-) ) After a lot of hunting on the web, I found this blog post - in particular, see the comments about the (1,eval) trick, which would seem to be the most concise way to perform an indirect eval.
I don't completely follow the logic of why code optimisers have a problem dealing with indirect eval, or why the local scope cripples performance so much... but boy, does it work. I tried changing the setTimeout(T,1) call in Fabrik to eval(T), and sure enough, it took about three minutes (and about 10 'script unresponsive' messages) for the intro to start in Chrome. I then changed it to (1,eval)(T)... instant fix. I tried the same trick on Fake Plastic Cubes, and it stopped the audio from skipping on my macbook. (The visuals still have that crap jumpy framerate due to the single threading, but... meh.)
I suspect this will also fix the performance issue I was having with JSSpeccy v2, which generates and eval()s an enormous switch statement to use as the Z80 core. Another project I'll have to resurrect, then...
reading this i'm somewhat glad i dont do "creative" browser coding :)
Not to derail this entirely, but can everyone please stop doing this:
:)
:)
werent all eval()s mandated into deprecation a couple years ago? or did i miss a random memo of useful uses?
[offtopic]
Gasman saves the compo, again!
I think I never told you how glad, flattered and thankful I was for your amazing bugfixing skills, even if they're only a startup flag. Thank you! :)
[/offtopic]
Gasman saves the compo, again!
I think I never told you how glad, flattered and thankful I was for your amazing bugfixing skills, even if they're only a startup flag. Thank you! :)
[/offtopic]
ps: It's more like "avoid eval() unless you really know what you're doing". It's generally decent advice, since there are plenty of situations in JS where a novice programmer might try to use eval when they really shouldn't (like eval("myObject."+someProperty), and passing strings to setTimeout/setInterval) - but if you are actually generating and executing code at runtime (as a depacker does), and you're aware of the reasons not to do it *inside* performance-critical code, then it's a useful tool.
mog: You're welcome! I can't take any credit for saving this year's compo, though - I was mostly running around and panicking while everyone around me magically got stuff working :-) (I believe the other unsung hero of the compo was Bero, who along with Chaos tracked down the --use-gl=desktop switch to make Laser work on the compo machine...)
mog: You're welcome! I can't take any credit for saving this year's compo, though - I was mostly running around and panicking while everyone around me magically got stuff working :-) (I believe the other unsung hero of the compo was Bero, who along with Chaos tracked down the --use-gl=desktop switch to make Laser work on the compo machine...)
Gasman, you made my day :)
The (1,eval) is not only a clever solution for global eval problem, but also it saves two bytes in the unpacker :
"setTimeout(E,1)" which may already be reduced to "setTimeout(E)" (I found it out right after the party!) can be replaced by "(1,eval)(E)".
Maybe I'm not saying anything new but here is a very detailed explanation of how to do indirect eval :
http://perfectionkills.com/global-eval-what-are-the-options/
Pretty interesting to see how ECMAScript 5 defines direct eval. Here is a sample list of indirect eval calls :
Now it does not tell us why indirect eval is faster in our case. I can understand that execution in global scope could be more efficient if we use global variables since the scripting engine does not have to through the context's scope chain to find them (further in the chain, the slower the variable resolution seems to be), but why is the speed so different if we only do mathematic computation with defined variables ? (like during the music generation in Fabrik)
And yes Gasman, you did save the compo ! :)
The (1,eval) is not only a clever solution for global eval problem, but also it saves two bytes in the unpacker :
"setTimeout(E,1)" which may already be reduced to "setTimeout(E)" (I found it out right after the party!) can be replaced by "(1,eval)(E)".
Maybe I'm not saying anything new but here is a very detailed explanation of how to do indirect eval :
http://perfectionkills.com/global-eval-what-are-the-options/
Pretty interesting to see how ECMAScript 5 defines direct eval. Here is a sample list of indirect eval calls :
Code:
(1, eval)('...')
(eval, eval)('...')
(1 ? eval : 0)('...')
(__ = eval)('...')
e = eval; e('...') // personal note : as concise as the first one, but less elegant :)
(function(e) { e('...') })(eval)
(function(e) { return e })(eval)('...')
(function() { arguments[0]('...') })(eval)
this.eval('...')
this['eval']('...')
[eval][0]('...')
eval.call(this, '...')
eval('eval')('...')
Now it does not tell us why indirect eval is faster in our case. I can understand that execution in global scope could be more efficient if we use global variables since the scripting engine does not have to through the context's scope chain to find them (further in the chain, the slower the variable resolution seems to be), but why is the speed so different if we only do mathematic computation with defined variables ? (like during the music generation in Fabrik)
And yes Gasman, you did save the compo ! :)
Thanks for digging this.
I'll try to do a few test at work.
I'll try to do a few test at work.
Shifting the topic slightly... how the hell did I miss Daeken and his awesome PNG-as-HTML trick until now?! This. Changes. Everything.
http://pouet.net/prod.php?which=57308
http://demoseen.com/windowpane/fl0wer.png.html
I never thought I'd get a Keanu Reeves 'whoa.' moment from five ASCII characters, but
has just done exactly that.
I don't know if it saves any bytes over having an external PNG, but a) it's more aesthetically pleasing / in tune with traditional-platform 4Ks to have everything packaged as a single file, and b) it gets around Chrome's security restrictions and the need for the --allow-file-access-from-files switch when running it as a local file. Sweet. Is there a js-to-self-extracting-png utility around yet? If not, we need one.
http://pouet.net/prod.php?which=57308
http://demoseen.com/windowpane/fl0wer.png.html
I never thought I'd get a Keanu Reeves 'whoa.' moment from five ASCII characters, but
Code:
src=#
has just done exactly that.
I don't know if it saves any bytes over having an external PNG, but a) it's more aesthetically pleasing / in tune with traditional-platform 4Ks to have everything packaged as a single file, and b) it gets around Chrome's security restrictions and the need for the --allow-file-access-from-files switch when running it as a local file. Sweet. Is there a js-to-self-extracting-png utility around yet? If not, we need one.
That's basically a CAB-dropper for JS, right? :)
gasman+Gargaj: Daeken's PNGxHTML trick combined with src=# is the thing of a beauty. JSYK I got it to work cross browser and with a smaller bootstrap ;)
I did a couple of tests, the difference between global and local eval is ~15% in Opera 12 and ~30% in Chrome Canary. That's probably why I never really noticed it :p
I did a couple of tests, the difference between global and local eval is ~15% in Opera 12 and ~30% in Chrome Canary. That's probably why I never really noticed it :p
In "Magister" demo the trick seems to add 8 bytes to the total size, because the HTML code is included in a custom PNG chunk (called "jawh" in this demo) which header takes 4+4 bytes. But I think we can reduce the excess to only 4 bytes by using the beginning of HTML code as chunk 4-letters name ("<img").
Just been reading his blog post about the technique, and finding just how deep this rabbit-hole goes... :-) Apparently "<img" doesn't work, even though it should according to the PNG spec. It doesn't say whether he tried "x<im", which would seem to be the next logical step...
FWIW I did not experiment much with the header. I took an existing PNG, fiddled in an HEX editor and got the thing to work in Opera, Chrome, Firefox and Safari.
I don't have a proper HEX editor at the moment to double check but IIRC the chunk in my experiments where 4 bytes biggers than Daeken's . OTOH my HTML+JS bootstrap is 150-160 bytes. So All in all it's 30+ bytes smaller.
I don't have a proper HEX editor at the moment to double check but IIRC the chunk in my experiments where 4 bytes biggers than Daeken's . OTOH my HTML+JS bootstrap is 150-160 bytes. So All in all it's 30+ bytes smaller.
NB: Your usual JS depacker takes 70-90 bytes but with a much more simple compression scheme. Nothing prevents you from combining both approaches: regular JS packer and HTMLxPNG bootstrap.
Finally neither "<img" nor "x<im" work because libpng strictly follows PNG spec :
pngrutil.c :
pngrutil.c :
Code:
if (c < 65 || c > 122 || (c > 90 && c < 97))
OK, first pass at a self-extracting-png packer: https://gist.github.com/2560551
There's some semi-working code there for splitting the code onto multiple image rows (because certain browsers can't handle images wider than 10000px or so) but Daeken's bootstrap code doesn't support that as it stands. Care to share yours, p01? :-)
On the subject of compatibility - it turns out that Safari doesn't let you hack the length field of the IDAT chunk, although you can still abuse the checksum of the 'jawh' chunk (and omit the IDAT checksum and IEND chunk entirely), which accounts for the 4 byte difference p01 arrived at. Next job is to squirt the png through PNGOUT before applying the file format hacks - apparently it should do a better job at deflate compression than zlib.
(Oh, and there's another slight disadvantage in comparison to against external .png: your JS code has to clear away the excess document junk when the demo starts...)
There's some semi-working code there for splitting the code onto multiple image rows (because certain browsers can't handle images wider than 10000px or so) but Daeken's bootstrap code doesn't support that as it stands. Care to share yours, p01? :-)
On the subject of compatibility - it turns out that Safari doesn't let you hack the length field of the IDAT chunk, although you can still abuse the checksum of the 'jawh' chunk (and omit the IDAT checksum and IEND chunk entirely), which accounts for the 4 byte difference p01 arrived at. Next job is to squirt the png through PNGOUT before applying the file format hacks - apparently it should do a better job at deflate compression than zlib.
(Oh, and there's another slight disadvantage in comparison to against external .png: your JS code has to clear away the excess document junk when the demo starts...)
Hey guys, figure I should chime in with some code and background. Both Magister and Fl0wer were built with my Windowpane framework, which is available at https://github.com/daeken/windowpane and handles everything from running a local webserver for testing to packing everything into PNGs. Provide the shader and it does the rest.
I've recently been working to get my bootstrap and compression down to the point where things like Fl0wer can be 512b demos rather than 1k. To that end, I've been trying just about everything. I no longer have the jawh chunk, but rather put the bootstrap code after the IDAT (which of course has no checksum -- nobody checks it anyway). I didn't do this in the first place because any less than symbol in the IDAT would cause the HTML parser to break my code, but detecting that and throwing in a beginning greater than symbol is trivial.
As for using another zlib implementation, I used deflate from 7zip and got a ~10b drop in some cases, but that was about the top end. I did some work in making it more compressible, e.g. applying a BWT to the code or applying delta coding (or both, in either order) and every single thing I tried increased the size, no matter what.
I'm pretty confident that the bootstrap can't be reduced any further from what's in the Windowpane repo as it stands; 4 or 5 of us worked for a couple days at it and got absolutely nowhere. However, I have this feeling that there's a way to use the bootstrap code to 'seed' the compression, thus reducing the size of the compressed code greatly. I don't know if it'll pan out, but we'll see -- hopefully I'll get to use that for my submissions to Solskogen!
If anyone has any questions about this stuff, feel free to ask -- most of the code-golf on this has been done in #stackoverflow on Freenode if you want to talk in real time.
I've recently been working to get my bootstrap and compression down to the point where things like Fl0wer can be 512b demos rather than 1k. To that end, I've been trying just about everything. I no longer have the jawh chunk, but rather put the bootstrap code after the IDAT (which of course has no checksum -- nobody checks it anyway). I didn't do this in the first place because any less than symbol in the IDAT would cause the HTML parser to break my code, but detecting that and throwing in a beginning greater than symbol is trivial.
As for using another zlib implementation, I used deflate from 7zip and got a ~10b drop in some cases, but that was about the top end. I did some work in making it more compressible, e.g. applying a BWT to the code or applying delta coding (or both, in either order) and every single thing I tried increased the size, no matter what.
I'm pretty confident that the bootstrap can't be reduced any further from what's in the Windowpane repo as it stands; 4 or 5 of us worked for a couple days at it and got absolutely nowhere. However, I have this feeling that there's a way to use the bootstrap code to 'seed' the compression, thus reducing the size of the compressed code greatly. I don't know if it'll pan out, but we'll see -- hopefully I'll get to use that for my submissions to Solskogen!
If anyone has any questions about this stuff, feel free to ask -- most of the code-golf on this has been done in #stackoverflow on Freenode if you want to talk in real time.
Nice to see you here Daeken ;)
First of all hats off for abusing the PNG format this way and some more with the lastest tricks: IDAT and >
About the multiple rows thingy, it's really simple. I think you made things way slower and more complicated than necessary in Magister. See the code below, I put in bold the "crazy" parts:
Your new bootstrap is smaller but still complicated:
My bootstrap is a bit simpler, faster, smaller and does not have this long strip restriction:
Gasman: About cleaning the junk, you can get away with that by setting the CSS of the canvas to fill the viewport ;)
JSYK I'll talk about HTMLxPNG bootstraping, among other things, at WebRebels.
First of all hats off for abusing the PNG format this way and some more with the lastest tricks: IDAT and >
About the multiple rows thingy, it's really simple. I think you made things way slower and more complicated than necessary in Magister. See the code below, I put in bold the "crazy" parts:
Quote:
196 bytes. It creates a Canvas element, loop backwards, get a chunk of ImageData for every single character.<img onload=with(document.createElement('canvas'))p=width=4968,(c=getContext('2d')).drawImage(this,e='',0);
while(p)e+=String.fromCharCode(c.getImageData(0,0,p,1).data[p-=4]);
(t=top).eval(e) src=#>
Your new bootstrap is smaller but still complicated:
Quote:
166 bytes. Still a a backward loop on a single strip and tons of getImageData.<canvas id=q><img onload=with(q.getContext('2d'))for(p=q.width=9999,drawImage(this,0,e=0);p;)e+=String.fromCharCode(getImageData(--p,0,1,1).data[0]);eval(e) src=#>
My bootstrap is a bit simpler, faster, smaller and does not have this long strip restriction:
Quote:
159 bytes. The canvas element is already in the markup, it loop forwards on a single ImageData which you can read as an LFB. Fiddle with the 99 width and height as you see fit for your project.<canvas id=c><img onload=for(a=c.getContext('2d'),d=a.getImageData(a.drawImage(this,p=0,e=''),0,99,99).data;t=d[p+=4];)e+=String.fromCharCode(t);eval(e) src=#>
Gasman: About cleaning the junk, you can get away with that by setting the CSS of the canvas to fill the viewport ;)
JSYK I'll talk about HTMLxPNG bootstraping, among other things, at WebRebels.
Nice work on the new bootstrap -- I may have to pull that into Windowpane.
Btw, I'm currently talking with some other Mozilla people about fixing the bug where images drawn to a canvas come out premultiplied from getImageData. Everyone is in violation of the spec right now, and fixing that will make the bootstrap shorter by not having to loop over the data to get just one element ;)
Btw, I'm currently talking with some other Mozilla people about fixing the bug where images drawn to a canvas come out premultiplied from getImageData. Everyone is in violation of the spec right now, and fixing that will make the bootstrap shorter by not having to loop over the data to get just one element ;)
Now I'm wondering... could we eliminate the loop entirely by switching to a 32-bit PNG, and slurping the whole thing into a string in a single fromCharCode call?
...or is that the "not having to loop over the data" trick you're referring to?
Code:
String.fromCharCode.apply(0,q.getImageData(0,0,w,h).data)
...or is that the "not having to loop over the data" trick you're referring to?
(ah, String.fromCharCode.apply is not such a good idea for 64Ks, though...)
p01: Tell me if I'm wrong but your bootstrap saves few bytes against Daeken's one because you don't initialize the canvas width. The fact is, it is possible only if your PNG have small width & height, otherwise the browser resizes the PNG to a smaller size and the unpacking becomes impossible. Since PNG compression is far better in Nx1 format than in NxN square (because each PNG row consumes an additional byte), you need a very large PNG, so a canvas width initialization.
In addition, according to my tests, calling getImageData multiple times is n't a problem at all if we read one byte each time, like in Daeken's bootstrap.
In addition, according to my tests, calling getImageData multiple times is n't a problem at all if we read one byte each time, like in Daeken's bootstrap.
Daeken: Thanks. Ouch! about the premultiplied alpha. I didn't realize it forces to read more data to "un-multiply". I hope the read is cached!!!
cb: I didn't know that each row of PNG "cost" one extra byte. As Daeken said, calling getImageData multiple times can be a problem if the original data are premultiplied. Also, the default resolution of a canvas is 300x150. All browsers comply to that. These 300x150 pixels represent ~44Kb worth of data using a single component. I'd recommend to make the PNG 300px wide. OTOH you can make a slow, forward looping, bootstrap that takes a single strip PNG image without having to set the width of the canvas:
cb: I didn't know that each row of PNG "cost" one extra byte. As Daeken said, calling getImageData multiple times can be a problem if the original data are premultiplied. Also, the default resolution of a canvas is 300x150. All browsers comply to that. These 300x150 pixels represent ~44Kb worth of data using a single component. I'd recommend to make the PNG 300px wide. OTOH you can make a slow, forward looping, bootstrap that takes a single strip PNG image without having to set the width of the canvas:
Quote:
158 bytes. /!\ I have NOT tested that code, but you get the idea.<canvas id=c><img onload=with(c.getContext('2d'))for(p=0,e='';t=getImageData(drawImage(this,p--,e=''),0,1,1).data[0];)e+=String.fromCharCode(t);eval(e) src=#>
my bad
Quote:
155 bytes<canvas id=c><img onload=with(c.getContext('2d'))for(p=0,e='';t=getImageData(drawImage(this,p--,0),0,1,1).data[0];)e+=String.fromCharCode(t);eval(e) src=#>