Learning Android Application Development

These last months I’ve been very busy writing a book about Android development. It covers from the very basics to application testing and using third party libraries to make your life as a developer easier.

It’s available both in ebook and printed format, if you’re interested, grab a copy!

IMG6117

PacktPub
https://www.packtpub.com/application-development/learning-android-application-development

Amazon
https://www.amazon.es/Learning-Android-Application-Development-Montane/dp/1785286110/

O’Reilly
http://shop.oreilly.com/product/9781785286117.do

Rakuten
http://books.rakuten.co.jp/rk/a635fb0c500f374a997c977adbe0d71c/

Kobo
https://store.kobobooks.com/en-us/ebook/learning-android-application-development

Booktopia
http://www.booktopia.com.au/ebooks/learning-android-application-development-raimon-rafols-montane/prod9781783553846.html

Voxeling – js1k 2016 postmortem

CeAcHlAUYAAP-ua (1)

This year, as usual, I got the inspiration from somewhere 🙂
Checking twitter while commuting I saw a re-tweet with a voxel image, I digged a bit and, to be honest, I was really impressed by the amazing work of @Sir_carma:

More work of Sir_carma

More specifically in this voxel image:
mm2WNK7 copy

Source: https://twitter.com/Sir_carma/status/651500940822974464

So I did my best to recreate it in 1k and here is the result:

or go to the js1k page for bigger preview: http://js1k.com/2016-elemental/demo/2497

and feel free to leave a message in the reddit thread:
https://www.reddit.com/r/js1k/comments/48ndh9/demo_2497_voxeling_by_raimon_r%C3%A0fols_canvas_1024/

and, of course, do not hesitate to check all the other amazing entries this year: http://js1k.com/2016-elemental/demos

Some details of the source code, scroll down to see the whole thing:

First I created a height map of the plane, but to optmize it a bit, I made it a bit smaller & symmetric:

  1. //full plane
  2. p = [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
  3.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  4.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  5.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  6.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  7.      0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
  8.      1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  9.      1, 1, 0, 0, 1, 2, 2, 2, 2, 2, 2, 1, 0,
  10.      8, 4, 3, 2, 2, 3, 3, 5, 5, 4, 3, 2, 1,
  11.      1, 1, 0, 0, 1, 2, 2, 2, 2, 2, 2, 1, 0,
  12.      1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  13.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  14.      0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
  15.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  16.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  17.      0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  18.      0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0];
  1. //reduced & symmetric plane
  2.   p = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
  3.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  4.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  5.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  6.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  7.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  8.        0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 1, 0,
  9.        5, 2, 2, 2, 3, 3, 5, 5, 4, 3, 2, 1
  10.    ]

I tried to optimize it by using a string and then splitting the array into numbers but after executing the code compressor, final size was bigger, so at the end I left it this way.

Terrain and clouds are a interpolated noise function. First we need to generate some random noise:

  1. V = 50
  2.     H = 100
  3.     D = 1000
  4.     // …
  5.     n = new Uint8Array(D)
  6.     w.crypto.getRandomValues(n)

V & H are the number of cells we will render (Vertical and Horizontal). We just reuse the variable D instead of calculating the right amount of noise values we need… so, we actually generate too many noise values, but memory is infinite.. right?

Then, to generate a smooth terrain, we interpolate values using a cosine interpolator. To generate a field of 50×1000 points, we need 5×100 and interpolate each single value for 10×10 new generated values.

To get the integer values of the coordinate we divide by 10 (B in our case) and we do a binary OR logic operation by 0 to get rid of the floating point part. It is one of the javascript tricks that save me few precious bytes.

Another approximation I used in the code below is to get rid of PI. As we have to calculate (1.f – Math.cos(value * Math.PI)) / 2.f and value has to be between 0 and 1 I originally did the following:

  1. x = (j / 10) | 0
  2.    y = (i / 10) | 0
  3.    F = (i % 10) / 10
  4.    E = (j % 10) / 10
  5.    E = (1 Math.cos(E * Math.PI)) / 2
  6.    F = (1 Math.cos(F * Math.PI)) / 2

but, as turns out, both i % 10 and j % 10 values will be between 0 & and 9, so dividing them by 3, gives us a result between 0 and 3 which is “good enough” as an approximation of a value between 0..1 multiplied by PI, hence the division by 3 and no PI mentioned anywhere in the code:

  1. B = 10
  2.    o = []
  3.    T = S = s = 0
  4.    i = V
  5.    while (i) {
  6.        j = D
  7.        while (j) {
  8.            y = (i / B) | 0
  9.            x = (j / B) | 0
  10.            k = y * H + x
  11.            F = (i % B) / 3
  12.            E = (j % B) / 3
  13.            E = (1 Math.cos(E)) / 2
  14.            u = 1 E
  15.            F = (1 Math.cos(F)) / 2
  16.            v = 1 F
  17.            v = n[k] * u * v + n[k + 1] * E * v + n[k + H] * u * F + n[k + H + 1] * E * F
  18.            o[s++] = v < 180 ? 0 : (v 180) / 7
  19.        }
  20.    }

Last part is to “normalize” the value. If the interpolated value is smaller than 180 we assume is water or 0. Otherwise we scale it down dividing by 7 (7 felt good, not any big reason why it is 7 and not 8 or at least, not any I could remember right now, as far as I remember was trial & error)

Another optimization I had to do, although not really proud of this one is to get rid of one gradient color. Initially voxels had 3 colors (top, front, left) but at the end, for size constraints, front and left shared the same color value. At the time I did this optimization it felt acceptable enough, but definitely looks way nicer with 3 colors, you can judge yourself in the screenshot below (left – current version, right – old version with 3 colors)

screenshots_combined

To render the voxels and avoid any depth checks or z-fight between voxels, they are always drawn from the far end (top of screen) to the bottom of the screen and from right to left, so all overdrawn voxels are correct. In the initial version I introduced checks to avoid drawing left & front faces of the voxel if they were occluded, but I had to get rid of the optimizations because of size constraints.

Clouds are rendered vertically-centered, and small mountains are just a height function from 0.

clouds_terrain

Finally all the voxels are rendered (water, mountains, clouds & plane) in a single loop with the help of a precalc table:

  1. A = [0, 6, 1,
  2.        9, 7, 1,
  3.        7, 11, 1,
  4.       2, 10, 1,
  5.        0, 0, 0,
  6.        9, 1, 0,
  7.        9, 7, 1,
  8.        0, 6, 1,
  9.        0, 0, 0,
  10.       2, 4, 0,
  11.       2, 10, 1,
  12.        0, 6, 1
  13.    ]

Each value indicates the X, Y and height multiplier of the rectangle to draw. There are 3 groups of 4 coordinates, rendering 3 distorted (perpective) rectangles for each voxel.

  1. C = [
  2.        "#A87", "#754", //0-mountains
  3.        "#f96", "#C53", //3-plane
  4.        "#fff", "#ccc", //1-clouds
  5.        "#799" //2-water
  6.     ]
  7.  
  8.     // number of faces to draw (on water this will be set to 1)
  9.     r = 3
  10.  
  11.     // h is the value of the height map for that voxel, if height multiplier is 1
  12.     // in the 3rd coordinate of the precalc table it will be added to the y
  13.     // coordinate of the rectangle
  14.     G = h * 4
  15.  
  16.     v = 0
  17.  
  18.     // s contains the value of the clouds above only when rendering water &
  19.     // mountains, "shadows" are only an alpha function
  20.     // color is computed depending on which layer are we rendering and
  21.     // which face
  22.     while (r) {
  23.         c.fillStyle = C[f * 2 + (!v ? 0 : 1)]
  24.         c.globalAlpha = 1 s / 2
  25.         c.beginPath()
  26.         c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  27.         c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  28.         c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  29.         c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  30.         c.fill()
  31.     }

This code is executed three times, each with a different translate (see the while(–z) loop below) and some logic depending on which phase it is. Initially there were 3 different loops which was way more clear what was being drawn each time, but took too many space and the whole thing didn’t fit in 1k.

I had plans to actually move the plane using the keyboard and even fire bullets, but I was, maybe, a bit too optimistic 😉

Below you can find the whole source code (works using the js1k shim file). Code is quite unreadable due to being highly optimized for space. Compressed file under 1k is achieved by using: uglify and RegPack with 2, 1, 1 score weight values..

  1. p = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
  2.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  3.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  4.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  5.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  6.        0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  7.        0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 1, 0,
  8.        5, 2, 2, 2, 3, 3, 5, 5, 4, 3, 2, 1
  9.    ]
  10.  
  11.    V = 50
  12.    H = 100
  13.    B = 10
  14.    D = 1000
  15.    w = window
  16.    n = new Uint8Array(D)
  17.    w.crypto.getRandomValues(n)
  18.    w = w.innerWidth
  19.  
  20.    A = [0, 6, 1,
  21.        9, 7, 1,
  22.        7, 11, 1, 2, 10, 1,
  23.        0, 0, 0,
  24.        9, 1, 0,
  25.        9, 7, 1,
  26.        0, 6, 1,
  27.        0, 0, 0, 2, 4, 0, 2, 10, 1,
  28.        0, 6, 1
  29.    ]
  30.  
  31.    C = [
  32.        "#A87", "#754", //0-mountains
  33.        "#f96", "#C53", //3-plane
  34.        "#fff", "#ccc", //1-clouds
  35.        "#799" //2-water
  36.    ]
  37.  
  38.    o = []
  39.    T = S = s = 0
  40.    i = V
  41.    while (i) {
  42.        j = D
  43.        while (j) {
  44.            y = (i / B) | 0
  45.            x = (j / B) | 0
  46.            k = y * H + x
  47.                //F&E will be less than 3.333 – good approximation for PI
  48.            F = (i % B) / 3
  49.            E = (j % B) / 3
  50.            E = (1 Math.cos(E)) / 2
  51.            u = 1 E
  52.            F = (1 Math.cos(F)) / 2
  53.            v = 1 F
  54.            v = n[k] * u * v + n[k + 1] * E * v + n[k + H] * u * F + n[k + H + 1] * E * F
  55.            o[s++] = v < 180 ? 0 : (v 180) / 7
  56.        }
  57.    }
  58.  
  59.    setInterval(function() {
  60.        s = w / (D + V)
  61.        t = T % 9
  62.        a.width = w
  63.        b.style.background = ‘#635’
  64.        c.save()
  65.        c.scale(s, s)
  66.        c.translate(9, H * 3)
  67.  
  68.        c.fillStyle = "#568"
  69.        c.beginPath()
  70.        c.lineTo(0, H * 2)
  71.        c.lineTo(H * 9, H * 3)
  72.        c.lineTo(D, H)
  73.        c.lineTo(D, H)
  74.        c.lineTo(0, 0)
  75.        c.fill()
  76.  
  77.        c.save()
  78.        c.translate(H * 9 t, t / 9 H)
  79.  
  80.        c.save()
  81.        z = 4
  82.        while (z) {
  83.            if (z == 2) c.translate(t, t / 9)
  84.            if (z == 1) c.translate(V + B, B)
  85.  
  86.            i = 0
  87.            while (i++ < V) {
  88.                j = z == 1 ? 12 : H
  89.                while (j) {
  90.                    k = i * D + j + (z == 3 ? S : S * 2)
  91.  
  92.                    if (z == 2) k += H
  93.                    f = (z == 3 && o[k]) ? 0 : z
  94.                    h = z == 1 ? p[(i <= 6 ? i : 15 i) * 12 + j] : o[k]
  95.                    s = o[z == 3 ? i * D + H + S * 2 + j : k]
  96.  
  97.                    r = 1
  98.                    if (s > 1) s = 1
  99.                    v = G = E = F = 0
  100.  
  101.                    if (f < 3) {
  102.                        r = 3
  103.                        G = h * 4
  104.  
  105.                        if (f) {
  106.                            E = 40
  107.                            F = 160 + h * 2
  108.                            s = 0
  109.                            if (!h) r = 0
  110.                        }
  111.                    }
  112.  
  113.                    while (r) {
  114.                        c.fillStyle = C[f * 2 + (!v ? 0 : 1)]
  115.                        c.globalAlpha = 1 s / 2
  116.                        c.beginPath()
  117.                        c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  118.                        c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  119.                        c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  120.                        c.lineTo(A[v++] + E, A[v++] + F + A[v++] * G)
  121.                        c.fill()
  122.                    }
  123.                    c.translate(9, 1)
  124.                }
  125.                c.translate(z == 1 ? H + B : H * 9 + 2, z == 1 ? 9 : H + 4)
  126.            }
  127.            c.restore()
  128.        }
  129.  
  130.        T++
  131.        S = (T / 9) | 0
  132.    }, B)

III Jornades Emprenedoria i Tècniques Institut Poblenou

Few weeks ago I was invited as a speaker to the “Jornades Emprenedoria i Tècniques Institut Poblenou” and yesterday I went there and gave a talk about product vs project, lean approach, ux and, at the end I explained the AngelHack experience, what are hackathons, why is useful to attend, …

There was a good attendance (~80-90), room was packed and there were people standing at the end of the room. I really enjoyed being able to talk to that audience, to introduce them to the world of hackathons and giving them feedback about their projects and ideas.

Here are the slides (sorry, only available in catalan)

And some pictures of the event:

IMG_7491

IMG_7492

IMG_7493

IMG_7490

Speaking in more conferences ;)

There has been a lack of updates on this blog.. yes… I’m fully aware. But I’ve been speaking in some more conferences (like if that was a good excuse…). For example, last weekend, I spoke at the first edition of JBCNConf here in Barcelona about Java performance and basically what’s going on at compiler level when we compile java code.

Click here or the image below for the slides.

Screen Shot 2015-06-28 at 21.45.54

Speaking at mobile development conferences

These last months I’ve been speaking in some mobile conferences:

Mobiconf (Krakow – Poland)
Droidcon UK (London)
Droidcon NL (Amsterdam)

Here is a video recording of my presentation at Droidcon UK: (you can create a free account with skills matter if you don’t have one)
https://skillsmatter.com/skillscasts/5628-how-to-improve-android-app-performance-with-the-new-art-runtime-and-dalvik-vm-perfmatters

And here is a link to the slides, with a nice summary made by the Droidcon NL organisers:

droidconNL-002

Big thanks to Eric Lafortune for the mention in their blog! https://www.saikoa.com/blog/Mobiconf2014

BlackBerry 10 – Native algorithm optimization

When looking for performance it’s important to understand what is going on at the compiler level and how the compiler is optimising our code. For example, enabling the assembly output will allow us to see what is exactly generating and have a greater understanding of why some parts of the code execute faster or the reason why it’s not performing as it should.

BlackBerry 10 uses the QCC compiler (more information here: QCC). Enabling the right flags will generate a .s file of the generated assembly:

qcc_flags2

qcc_flags3

Once we got the flags enabled we can start optimising the code. Let’s take as example a YUV2RGB converter and optimise it for BlackBerry 10 devices (used in this app http://appworld.blackberry.com/webstore/content/21198027/) . This converter can be used to read from camera preview buffers and convert it to RGB to save it as an image or show it on the screen. YUV is a format the encodes luma & chroma independently:

yuv_format

To make this example simpler, assume the following values (hardcoded from a camera preview buffer size)

  1. int src_height = 768;
  2. int src_width = 1024;
  3. int src_stride = 1024;
  4.  
  5. int uvStart = 1024 * 768; // offset to the UV data (after all the Y data)

This is our first approach:

  1. void inline renderFrame(unsigned char *src, char *dst, int rect[4], int frame, int dst_stride) {
  2.     int i, j;
  3.     for (i = 0; i < src_height; i++) {
  4.         for(j = 0; j < src_width; j++) {
  5.             int y_rpos = i * src_stride + j;
  6.             int uv_rpos = uvStart + ((i/2) * src_stride) + (j/2) * 2;
  7.             int wpos = i * dst_stride + j * 4;
  8.  
  9.             float y = src[y_rpos     ];
  10.             float u = src[uv_rpos    ];
  11.             float v = src[uv_rpos + 1];
  12.  
  13.             int r = clip((int) ((y 16) * 1.164                     + 1.596 * (v 128)));
  14.             int g = clip((int) ((y 16) * 1.164 0.391 * (u 128) 0.813 * (v 128)));
  15.             int b = clip((int) ((y 16) * 1.164 + 2.018 * (u 128)));
  16.  
  17.             dst[wpos    ] = b;
  18.             dst[wpos + 1] = g;
  19.             dst[wpos + 2] = r;
  20.             dst[wpos + 3] = 0xff;
  21.         }
  22.     }
  23. }

As with this YUV format a pair of Y values share the same UV values we can generate 2 pixels with just 2 Y and 1 single UV pair, generating 2 pixels per iteration.

  1. float y0 = src[y_rpos    ];
  2. float y1 = src[y_rpos + 1];
  3. float u = src[uv_rpos    ];
  4. float v = src[uv_rpos + 1];
  5.  
  6. int r0 = clip((int) ((y0 16) * 1.164                     + 1.596 * (v 128)));
  7. int g0 = clip((int) ((y0 16) * 1.164 0.391 * (u 128) 0.813 * (v 128)));
  8. int b0 = clip((int) ((y0 16) * 1.164 + 2.018 * (u 128)));
  9.  
  10. int r1 = clip((int) ((y1 16) * 1.164                     + 1.596 * (v 128)));
  11. int g1 = clip((int) ((y1 16) * 1.164 0.391 * (u 128) 0.813 * (v 128)));
  12. int b1 = clip((int) ((y1 16) * 1.164 + 2.018 * (u 128)));
  13.  
  14. dst[wpos    ] = b0;
  15. dst[wpos + 1] = g0;
  16. dst[wpos + 2] = r0;
  17. dst[wpos + 3] = 0xff;
  18.  
  19. dst[wpos + 4] = b1;
  20. dst[wpos + 5] = g1;
  21. dst[wpos + 6] = r1;
  22. dst[wpos + 7] = 0xff;

So far so good, but there are many arithmetic operations that can be shared and there is no need to do them twice

  1. float y0 = src[y_rpos    ] 16;
  2. float y1 = src[y_rpos + 1] 16;
  3. float u = src[uv_rpos    ] 128;
  4. float v = src[uv_rpos + 1] 128;
  5.  
  6. float y0v = y0 * 1.164;
  7. float y1v = y1 * 1.164;
  8.  
  9. float chromaR = 1.596 * v;
  10. float chromaG = 0.391 * u 0.813 * v;
  11. float chromaB = 2.018 * u;
  12.  
  13. int r0 = clip((int) (y0v + chromaR));
  14. int g0 = clip((int) (y0v + chromaG));
  15. int b0 = clip((int) (y0v + chromaB));
  16.  
  17. int r1 = clip((int) (y1v + chromaR));
  18. int g1 = clip((int) (y1v + chromaG));
  19. int b1 = clip((int) (y1v + chromaB));

and we’ve improved 14 multiplications and 20 additions/subtractions to 6 multiplications and 12 additions/substractions. Pretty sure our CPU would like that although we’re far from done. There are many values that doesn’t change in every pixel so we can move those to the external loop.

  1. void inline renderFrame(unsigned char *src, char *dst, int rect[4], int frame, int dst_stride) {
  2.     int i, j;
  3.     for (i = 0; i < src_height; i++) {
  4.         int y_rpos = i * src_stride;
  5.         int uv_rpos = uvStart + ((i/2) * src_stride);
  6.         int wpos = i * dst_stride;
  7.  
  8.         for(j = 0; j < src_width; j++) {
  9.             float y0 = src[y_rpos    ] 16;
  10.             float y1 = src[y_rpos + 1] 16;
  11.             float u = src[uv_rpos    ] 128;
  12.             float v = src[uv_rpos + 1] 128;
  13.  
  14.             float y0v = y0 * 1.164;
  15.             float y1v = y1 * 1.164;
  16.  
  17.             float chromaR = 1.596 * v;
  18.             float chromaG = 0.391 * u 0.813 * v;
  19.             float chromaB = 2.018 * u;
  20.  
  21.             int r0 = clip((int) (y0v + chromaR));
  22.             int g0 = clip((int) (y0v + chromaG));
  23.             int b0 = clip((int) (y0v + chromaB));
  24.  
  25.             int r1 = clip((int) (y1v + chromaR));
  26.             int g1 = clip((int) (y1v + chromaG));
  27.             int b1 = clip((int) (y1v + chromaB));
  28.  
  29.             dst[wpos    ] = b0;
  30.             dst[wpos + 1] = g0;
  31.             dst[wpos + 2] = r0;
  32.             dst[wpos + 3] = 0xff;
  33.  
  34.             dst[wpos + 4] = b1;
  35.             dst[wpos + 5] = g1;
  36.             dst[wpos + 6] = r1;
  37.             dst[wpos + 7] = 0xff;
  38.  
  39.             wpos += 8;
  40.             y_rpos += 2;
  41.             uv_rpos += 2;
  42.         }
  43.     }
  44. }

As some CPU doesn’t contain hardware accelerated floating point processors, floating point operations are emulated in software. In others, plain integer operations are usually faster. We can simulate floating point arithmetics with plain integers using Fixed point arithmetics

  1. int y0v = y0 * 298;   // 1.164 * 256
  2. int y1v = y1 * 298;   // 1.164 * 256
  3.  
  4. int chromaR = 408 * v;                // 1.596 * 256
  5. int chromaG = 100 * u 208 * v;     // – 0.391 * 256, 0.813 * 256
  6. int chromaB = 517 * u;                // 2.018 * 256
  7.  
  8. int r0 = clip((y0v + chromaR) >> 8);  // >> 8 is equivalent to dividing by 256
  9. int g0 = clip((y0v + chromaG) >> 8);
  10. int b0 = clip((y0v + chromaB) >> 8);
  11.  
  12. int r1 = clip((y1v + chromaR) >> 8);
  13. int g1 = clip((y1v + chromaG) >> 8);
  14. int b1 = clip((y1v + chromaB) >> 8);

As YUV values are always 0 <= YUV <= 255 some operations can be already recalculated in small arrays

  1. void precalc() {
  2.     int i;
  3.     for(i = 0; i < 256; i++) {
  4.         factorY[i]  = ( 298 * (i 16))  >> 8;
  5.         factorRV[i] = ( 408 * (i 128)) >> 8;
  6.         factorGU[i] = (100 * (i 128)) >> 8;
  7.         factorGV[i] = (208 * (i 128)) >> 8;
  8.         factorBU[i] = ( 517 * (i 128)) >> 8;
  9.  
  10.         clipVals[i] = min(max(i 300, 0), 255);
  11.     }
  12. }

We also precalculated the clipping values, so we remove the comparisons in the inner loop.

No we can use these precalc tables in our code:

  1. int y0 = factorY[src[y_rpos    ]];
  2. int y1 = factorY[src[y_rpos + 1]];
  3.  
  4. int chromaR = factorRV[v];
  5. int chromaG = factorGU[u] + factorGV[v];
  6. int chromaB = factorBU[v];
  7.  
  8. int r0 = clipVals[y0 + chromaR + 300];
  9. int g0 = clipVals[y0 + chromaG + 300];
  10. int b0 = clipVals[y0 + chromaB + 300];
  11.  
  12. int r1 = clipVals[y1 + chromaR + 300];
  13. int g1 = clipVals[y1 + chromaG + 300];
  14. int b1 = clipVals[y1 + chromaB + 300];

We introduced a 300 offset into the array indexes to avoid problems when values are negative. We can avoid that index offset by doing the following:

  1. int *clipVals_ = &clipVals[300];
  2.  
  3.  
  4. int r0 = clipVals_[y0 + chromaR];
  5. int g0 = clipVals_[y0 + chromaG];
  6. int b0 = clipVals_[y0 + chromaB];
  7.  
  8. int r1 = clipVals_[y1 + chromaR];
  9. int g1 = clipVals_[y1 + chromaG];
  10. int b1 = clipVals_[y1 + chromaB];

We could write pixels in one single operation instead of using 4 write operations (one per single byte). We’ll create different precalc tables to store values for the R, G and B components with shift masks and alpha transparency already in place.

  1. for(i = 0; i < 1024; i++) {
  2.     clipVals[i]  = min(max(i 300, 0), 255);
  3.     clipValsR[i] = 0xFF000000 | (min(max(i 300, 0), 255) << 16);
  4.     clipValsG[i] = min(max(i 300, 0), 255) << 8;
  5.     clipValsV[i] = min(max(i 300, 0), 255);
  6. }
  7.  
  8. void inline renderFrame(unsigned char *src, char *dst, int rect[4], int frame, int dst_stride) {
  9.     int i, j;
  10.  
  11.     int *clipValsR_ = &clipValsR[300];
  12.     int *clipValsG_ = &clipValsG[300];
  13.     int *clipValsB_ = &clipValsB[300];
  14.     int *dsti = (int *) dst;
  15.  
  16.     for (i = 0; i < src_height; i++) {
  17.         int y_rpos = i * src_stride;
  18.         int uv_rpos = uvStart + ((i/2) * src_stride);
  19.         int wpos = i * dst_stride;
  20.  
  21.         for(j = 0; j < src_width; j++) {
  22.             int u = src[uv_rpos    ];
  23.             int v = src[uv_rpos + 1];
  24.  
  25.             int y0 = factorY[src[y_rpos    ]];
  26.             int y1 = factorY[src[y_rpos + 1]];
  27.  
  28.             int chromaR = factorRV[u];
  29.             int chromaG = factorGU[u] + factorGV[v];
  30.             int chromaB = factorBU[u];
  31.  
  32.             dst[wpos    ] = clipValsR_[y0 + chromaR] | clipValsG_[y0 + chromaG] | clipValsB_[y0 + chromaB];
  33.             dst[wpos + 1] = clipValsR_[y1 + chromaR] | clipValsG_[y1 + chromaG] | clipValsB_[y1 + chromaB];
  34.  
  35.             wpos += 2;
  36.             y_rpos += 2;
  37.             uv_rpos += 2;
  38.         }
  39.     }
  40. }

Execution time

Small graph showing the execution time of the same function with different optimisations. At the end we achieved around 600% performance increase.

exec_time

There are still quite a lot of things than can be done, stay tuned for the second part of this post!

JS1k 2014 entries

Last year I said I had a very ambitious idea for the js1k 2013 competition but I didn’t had the time and I thought to leave it for the following year. This year, once again, I could only work on my entry the day before the deadline, so I decided to do something else and start from scratch.

I saw this animated gif long time ago (which, by the way, I don’t know the original source or author)

So I thought it would be good to make a 1k version of it.

To be honest I’m not very happy with the result, I’m pretty sure it could be done better & nicer. Anyway I submitted it to the competition and here is the result:

Find below the full source code or check the details page on js1k

  1. //values are just trial & error
  2. w=window
  3. f=w.innerHeight/w.innerWidth
  4. w=1343
  5. h=w*f
  6.  
  7. _=Math
  8. C=_.cos
  9. S=_.sin
  10. R=_.random
  11.  
  12. F=512
  13. Z=2200
  14. P=6.28
  15.  
  16. l=[]
  17. A=[]
  18. B=[]
  19.  
  20. L=54
  21. p=P/L
  22. Y=N=0
  23.  
  24. //generate lamp posts in a circle on both sides and pointing to the center
  25. df=‘000100110010000001011010010110110010010011011010’.split()
  26. for(i=0;i<L;i+=.5){
  27.   v=-.25
  28.   for(k=4;k>=3;k) {
  29.     x=k*F*S(p*i)
  30.     z=k*F*C(p*i)
  31.     x2=(k+v)*F*S(p*i)
  32.     z2=(k+v)*F*C(p*i)
  33.  
  34.     for(j=0;j<16;j++) {
  35.       xv=x
  36.       zv=z
  37.       //shorter way to check for: j==10||j==11||j==14||j==15
  38.       if(j>9&&j&2){
  39.         xv=x2
  40.         zv=z2
  41.       }
  42.       A[N++]=xv+df[j*3  ]*10
  43.       A[N++]=-df[j*3+1]*512
  44.       A[N++]=zv+df[j*3+2]*8
  45.     }
  46.     i+=.5
  47.     v=-v
  48.   }
  49. }
  50.  
  51. //store where lamp posts end
  52. K=N
  53.  
  54. //generate random lines
  55. k=0
  56. for(i=L*8;i–;) {
  57.   l[k++]=R()*P
  58.   l[k++]=R()
  59.   l[k++]=-R()*12864
  60.   l[k++]=R()*.2+.2;
  61. }
  62.  
  63. N+=L*96;        //8*12
  64.  
  65. setInterval(function(){
  66.   Y+=.004
  67.   b.style.background=‘#000’
  68.  
  69.   //clear screen
  70.   a.width=w
  71.   a.height=h
  72.  
  73.   //generate trail lines & move them
  74.   j = K
  75.   k=i=0;
  76.   for(;i<L*8;i++) {
  77.     d=l[k++]
  78.     p=l[k++]
  79.     y=l[k++]
  80.     e=l[k++]
  81.     r=(3.1+p*.8)*F
  82.  
  83.     v=4*S(y+i)
  84.     g=[0,v,v,0]
  85.     for(v=0;v<4;v++) {
  86.       x=r*S(d)
  87.       z=r*C(d)
  88.  
  89.       A[j++]=x
  90.       A[j++]=y+g[v]
  91.       A[j++]=z
  92.       if(v&1) d+=e
  93.     }
  94.     l[k4]+=.02
  95.  
  96.     // // move lines in both directions
  97.     // if(i<L*2) {
  98.     //  l[k-4]-=.02
  99.     // } else {
  100.     //  l[k-4]+=.02
  101.     // }
  102.   }
  103.  
  104.   //rotation & transform 3d-2d all points
  105.   k=0
  106.   for(i=N;i–;) {
  107.     z=A[k+2]*C(Y)A[k]*S(Y)+Z
  108.     B[k  ]=F*(A[k+2]*S(Y)+A[k]*C(Y))/z+256
  109.     B[k+1]=F*A[k+1]/z+h*0.9
  110.     B[k+2]=z/Z
  111.     k+=3
  112.   }
  113.  
  114.   //draw everything, color is based on the poly index
  115.   // index < K -> lamp post
  116.   // index >= K – trail lines, half red #f22, half white #fff
  117.   k=i=0
  118.   for(;i<N/3;i++) {
  119.     c.globalAlpha=(2B[k+2])/2
  120.     c.fillStyle="#444"
  121.  
  122.     if(i*12>=K) {
  123.       if(i*12<K + L * 12*4) {
  124.         c.fillStyle="#f22"
  125.       } else {
  126.         c.fillStyle="#fff"
  127.       }
  128.     }
  129.  
  130.     //draw all lines, clip lamp post passing just in front of "camera"
  131.     c.beginPath()
  132.     for(j=4;j–;) {
  133.       if((i%4<2 && B[k+2] > 0.1)||i%4>=2) {
  134.         c.lineTo(B[k],B[k+1]);
  135.       }
  136.       k+=3
  137.     }
  138.     c.fill()
  139.  
  140.     //draw lamp post light after drawing the last top horizontal quad
  141.     //lamp posts are formed of 4 quads:
  142.     //0 & 1 vertical
  143.     //2 & 3 top horizontal
  144.     //avoid drawing lights on trail lights
  145.     if(i % 4 == 3 && i*12<K) {
  146.       c.fillStyle="#fff"
  147.       c.beginPath()
  148.       c.arc(B[k3], B[k2], (2B[k1])*13,0,P)
  149.       c.fill()
  150.     }
  151.   }
  152. }, 20)

Code is compressed under 1k thanks to Google Closure Compiler and RegPack v3

As I wasn’t really happy with the result I did another quick entry based on the old good daCube2 colors & idea:

Find below the result:

and the source code or check the details page on js1k:

  1. w=window
  2. h=w.innerHeight
  3. w=w.innerWidth
  4.  
  5. _=Math
  6. mC=_.cos
  7. mS=_.sin
  8. R=_.random
  9.  
  10. F=512
  11. Z=2048
  12.  
  13. N=L=X=Y=0
  14.  
  15. A=[]
  16. B=[]
  17. C=[]
  18. D=[]
  19.  
  20. addCube=function(s) {
  21.   k=N/3
  22.   j=‘000100010110001101011111’.split()
  23.   for(i=0;i<24;i++) {
  24.     A[N++]=(j[i].5)*2*s
  25.   }
  26.  
  27.   l=‘011332200445514662577367’.split()
  28.   for(i=0;i<24;i++) {
  29.     D[L++]=k+(l[i]|0)
  30.   }
  31. }
  32.  
  33. for(z=200;z–;){
  34.   addCube(100+z*5)
  35. }
  36. anim=.004
  37.  
  38. setInterval(function(){
  39.   Y+=anim*R()+anim
  40.   X+=anim*R()+anim
  41.   y=Y
  42.   x=X
  43.   //X+=.01
  44.   b.style.background=‘#444’
  45.  
  46.   //clear screen
  47.   a.width=w
  48.   a.height=h
  49.  
  50.   k=0
  51.   //dy rotation
  52.   for(i=N;i–;) {
  53.     if(k%24*3==0) y-=anim*2
  54.     C[k+2]=A[k+2]*mC(y)A[k]*mS(y)
  55.     C[k  ]=A[k+2]*mS(y)+A[k]*mC(y)
  56.     k+=3
  57.   }
  58.  
  59.   k=0
  60.   //dx rotation
  61.   for(i=N;i–;) {
  62.     if(k%24*3==0) x+=anim*2
  63.     B[k+1]=A[k+1]*mC(x)C[k+2]*mS(x)
  64.     B[k+2]=A[k+1]*mS(x)+C[k+2]*mC(x)
  65.     k+=3
  66.   }
  67.  
  68.   //transform 3d-2d
  69.   k=p=0
  70.   for(i=N;i–;) {
  71.     C[k++]=F*C[p  ]/(C[p+2]+Z)+w/2
  72.     C[k++]=F*B[p+1]/(C[p+2]+Z)+h/2
  73.     p+=3
  74.   }
  75.  
  76.  
  77.   p=0
  78.   for(i=0;i<L/24;i++) {
  79.     c.strokeStyle="#fff"
  80.     c.globalAlpha=(i/(L/24))
  81.     c.beginPath()
  82.     for(k=24;k–;){
  83.       j=D[p++]
  84.       c.moveTo(C[j*2],C[j*2+1]);
  85.       j=D[p++]
  86.       c.lineTo(C[j*2],C[j*2+1]);
  87.     }
  88.     c.stroke()
  89.   }
  90. }, 20)