Developer Log

Changing the default display size for images : October 23, 2024

With q5.js being a relatively new graphics library, there's an opportunity to make some changes that break backwards compatibility with p5.js, but are otherwise obvious improvements. Overall the Processing API is so well designed that I haven't felt a need to make breaking changes. Yet, there is one thing that's always irked me about p5: the default display size of images.

I'm currently working on a new system for determining the display size of an image when it's undefined by the user.

// what size should the image be displayed at?
image(img, x, y);

Context

p5.js was created in 2013, back when only the most expensive phones had "retina" displays. But over a decade later, nearly every phone, tablet, laptop, and TV has a high pixel density display.

If you don't know, pixelDensity is auto-set by p5.js to be the same as the devicePixelRatio. It's a ratio of the resolution in physical pixels to abstract CSS pixels for the current display device. This pixel scaling system enables developers to write just a single program with hardcoded coordinates and dimension values, and it'll draw the stuff at a consistent size, relative to the canvas, on screens with different pixel densities. It's an elegant solution. So what's the problem?

The problem with default image sizing in p5

My MacBook Air 2020 has a display density of 2. So in the following sketch when createCanvas is run, the actual size of the 132x132 canvas is 264x264. All coordinates and dimensions in p5 are scaled by the pixel density.

let icon;

function preload() {
	icon = loadImage('https://q5js.org/q5js_icon.png');
}

function setup() {
	createCanvas(132, 132);
	background(0);

	// position to draw the image is given
	// but not a width and height
	image(icon, 2, 2);

	stroke(200);
	noFill();
	square(2, 2, 64);
	square(2, 2, 128);
}

But note that if pixel density is 2, then p5.js displays images at double their actual size by default. Images are effectively rendered as if pixel density was set to 1. So while the rest of the sketch looks crisp, images by default appear blurry with jagged aliased edges on modern displays.

p5-image-function

A decade ago it was fine for images to be displayed with the assumption that pixel density would be 1 or close to it. But today, since most devices do have a pixel density of 2 or greater, it's not a good look.

Images displaying at double the size that users expect can cause confusion too. One student on the p5play discord thought the image function was broken cause the assets they were trying to use were so high-res and their canvas was so small that p5.js only displayed a transparent part of the image within the bounds of the canvas.

Users can fix this issue by providing a destination width and height to the image function or using scale, but I don't want q5 users to have to worry about that.

The solution

In q5, when the devicePixelRatio is 2, I want the previous code example to display the image at it's actual size of 128x128, so 64x64 in CSS pixels.

q5-image-function

Those edges look crisp when the image is displayed at its actual size.

To give a more practical example, here's "Sprites with Images" from https://p5play.org/learn/sprite.html?page=2

Before and after! Note that latter example displays a high-res image at the same size that the low-res image was displayed at in the former example.

View fullscreen: https://github.com/user-attachments/assets/fd71af79-8425-4adf-a586-650c66cdd666

Now that we've seen it there's no going back right?

Implementation

p5.js v1.8 introduced pixel density settings for images. Yet, I don't want to set image pixel density to 2 by default in q5, because as with canvas or graphics objects, that would change the image's width and height property values. That'd completely break backwards compatibility and make clipping subsections of the image more confusing. Let's not mess with the image objects themselves.

Having the pixel density always be set to 1 would make sketches developed on screens with a pd of 2 or 3 appear too large on low-res screens with a pd of 1. That's the problem that the pixel density system solves, being able to write code once and have it displayed at the same relative size on screens with different pixel densities.

Images can't always be displayed at actual size by default because then they'd be displayed at different sizes relative to the canvas at different pixel densities.

Instead, I'd like there to be a defaultImageScale that effectively specifies what pixel density that images should display at by default. If the sketch pixel density is 2, a defaultImageScale multiplier of 0.5 should be applied to counteract the 2x scaling of images caused by the pixel density scaling. I'm making defaultImageScale(scale) a user facing function as future proofing, for when we're all developing art for 8k displays and 16k VR headsets lol. I imagine today's users starting to learn q5 will simply accept whatever the default image display size is and won't need to use or know about this function.

q5.js WebGPU is now faster than Java Processing : October 21, 2024

I used to think p5's weaker performance compared to Java Processing was a good tradeoff for the ease of programming in JS and web share-ability. But now with q5 WebGPU, we can get the best of both worlds!

So how much faster is q5.js v2.8 than the original Java Processing? Initial testing shows up to 5x better performance. 👀

Drawing +50,000 dots at 100 FPS

Here's the updated results for the disc dots test, which I modified to draw a consistent 50,502 points at varying stroke widths and stroke colors.

q5.js WebGPU [100 FPS]
q5.js q2d [23 fps]
Java Processing P3D [18 fps]
p5.js P2D [10 fps]
Java Processing P2D [6 fps]
p5.js WEBGL [4 fps]

Solid 60hz display rate with power to spare, just 67% processor usage. Uses HDR colors.

Java Processing console printing the frame rate of 17hz. Limited to SDR colors.

This is unbelievable! I would've been hype about a 2x performance increase over Java Processing, but 5x is a dream come true. 😅

https://aijs.io/editor?user=quinton-ashley&project=discDots

Other tests yield similar results, but more testing is needed to have a fuller picture.

How?!

Most programmers would expect that a Java program would run faster than an equivalent JavaScript program, since Java is a statically typed language. The difference may be that q5 was written with performance as a primary goal but also this is really more of an apples and oranges comparison between WebGPU, backed by the new Metal or Vulkan GPU APIs, and OpenGL.

Changes in v2.8

All q5 webgpu modules now use the "triangle-strip" topology because it's faster to have less data in the Float32Array that gets copied to the GPU's vertex buffer. Also no need to manually create an index buffer. I wish I had known about it from the start!

https://shi-yan.github.io/webgpuunleashed/Basics/triangle_strips.html

Refactoring for Optimal Code : October 11, 2024

Writing code once and being done with it can be fine in some scenarios, but not if you want optimal code! 🧼

I've found that experimentation and A/B testing is the best way to learn what's good for performance in the WebGPU + JavaScript ecosystem, cause there's quite a lot of variables at play. Having a strong Computer Science background from my college courses at NYU has definitely helped, but just because something might seem like a better approach on paper doesn't guarantee it'll be better in practice.

The first idea I had was instead of the vertexStack being a normal JS array that would have to be converted to a Float32Array to be placed in the vertex buffer, that I would just make it a Float32Array from the start. This does come with the drawback of not being able to use push. Also using set requires creation of little JS arrays. So assigning to the array has gotten a bit uglier for sure. Also typed arrays do need to be a fixed size, but it's fast to use slice to make a sub-array of just the data needed for the vertex buffer each frame.

This change alone (avoiding the typed array conversion) nearly doubled performance!

Next up, I thought I could save some space by storing the colorIndex and transformIndex of a vertex separately, since in some use cases many shapes are drawn with the same color and transform. I used instancing to achieve this, but it didn't pan out and actually decreased performance.

So I tried a different way to decrease the amount of vertex data: using an index buffer. https://webgpufundamentals.org/webgpu/lessons/webgpu-vertex-buffers.html

Without an index buffer, 2 out of 4 rect vertices are stored twice as 6 vertices of two triangles. An index buffer stores the order to draw vertices in, that way each vertex is only stored once. A vertex buffer memory saving of 33% seems like it would equate to a 33% increase in speed, but after implementing this technique I found that was not the case.

Firstly the total memory saved is much less. Without an index buffer the total memory used for a rect is: 6 vertices * 4 floats (x, y, colorIndex, and transformIndex) * 4 bytes per float32 = 96 bytes. With an index buffer the total memory used is: 4 vertices * 4 * 4 = 64, but plus the index buffer's six indices 6 * 4 = 24, which is 88 bytes total. So just 9% less. When drawing rects, the memory saved is not enough to outweigh the increased complexity of using index buffer. Yet, it's not slower by much. Perhaps it'd be a good tradeoff if this technique was beneficial for other shapes.

What about ellipses? They're made up of triangles with a lot of shared vertices.

In my tests without an index buffer, drawing 59,040 small circles (10 pixels in diameter) requires 4,250,880 vertices. With an index buffer the same test only requires 1,653,120 vertices. The result is a 20% decrease in scripting time.

But wait there's more... 😅

I also found that the fragment shader I was using was not really what I wanted. Simple color interpolation can be accomplished by simply passing the color and returning it from the shader. The interpolation is done on the GPU internally with the default interpolation setting. https://webgpufundamentals.org/webgpu/lessons/webgpu-inter-stage-variables.html

In addition to the performance improvements I also wanted to use multi-sample anti-aliasing, which is easy to do in WebGPU. https://webgpu.github.io/webgpu-samples/?sample=helloTriangleMSAA

...and the results? 🥁🥁🥁

Now q5 WebGPU is 8x faster than p5.js P2D and 32x faster than p5.js WebGL in the rainbow loops test! https://aijs.io/editor?user=quinton-ashley&project=rainbowLoops

The search for more optimal code continues though. 😅 I just found out about triangle strips. I'll have to test if it's even faster to just have the GPU handle triangulation for drawing shapes without needing to manually create an index buffer. https://shi-yan.github.io/webgpuunleashed/Basics/triangle_strips.html

After that and implementing https://github.com/q5js/q5.js/issues/77 and https://github.com/q5js/q5.js/issues/76, I'll work on q5's documentation.

Real-time text rendering with WebGPU : September 27, 2024

It's not uncommon for games to render a lot of small bits of text that are always on screen, such as gamer tags, chat logs, quest info, item inventories, etc. Fast text rendering is important, since it frees up the processor to do other things and saves battery power.

I'm excited to announce that q5 WebGPU can display a realistically unlimited amount of text at 60fps!

The max on my computer is 20,000 text calls displayed on screen at a barely readable size of just 4px.

Check out the documentation for more info about how q5's WebGPU text rendering module works: https://github.com/q5js/q5.js/tree/main/src#webgpu-text

The following tests of the text function were performed with p5.js 1.10.0 and q5.js v2.5.0 on my Apple M1 chip 2020 Macbook Air in Google Chrome v128.0.6613.138 (arm64), letting each test run for 5 seconds and double checking to make sure the results were consistent. rotate was used to rotate text a full 360 degrees in all these tests. Admittedly, I had to crank up the amount of calls to text to illustrate the differences, which would be easier to illustrate on more average or low end hardware.

Results are shown as a percent of CPU+GPU processing power used if 60fps was achieved or fps count.

https://aijs.io/editor?user=quinton-ashley&project=textTest

Calls	Test Scenario	p5 WebGL	p5 P2D	q5 q2d	q5 WebGPU
289	unique text content	✅ 25%	✅ 18%	✅ 5%	✅ 4%
1681	2 char pattern	❌ 20fps	✅ 20%	✅ 12%	✅ 9%
1681	unique text content	❌ 15fps	✅ 50%	✅ 25%	✅ 11%
6561	2 char pattern	❌ 1fps	⚠️ 93%	✅ 20%	✅ 17%
6561	unique text content	❌ 0fps	❌ 50fps	⚠️ 80%	✅ 30%
-----	----------------------	--------	------	------	---------
289	unique content & color	✅ 30%	✅ 25%	✅ 20%	✅ 4%
1681	unique content & color	❌ 30fps	⚠️ 90%	✅ 70%	✅ 11%
6561	unique content & color	❌ 0fps	❌ 10fps	❌ 30fps	✅ 35%

Even up against Canvas2D's built-in, fast HTML5 based text rasterization, q5 WebGPU blasts past the competition without even breaking a sweat!

Side note: Curiously q5 q2d is faster than p5 P2D, even though they both use the Canvas2D renderer. When testing q5 q2d's performance, I noticed that changing ctx.font is a slow operation, so q5 only changes it when necessary instead of every time text is used. I think this must be enabling some kind of internal Chrome optimization.

In p5.js WebGL mode, text is drawn directly to the canvas. This is a complex task, since letters have intricate geometry: thus many triangles must be used to render text at high resolutions. This is not necessarily a bad approach, but p5's implementation is slow. p5 also depends on opentype.js for this, which is 528kb (171kb minified), not including .ttf or .otf font files which can be several megabytes in size, so a different approach was needed to keep q5 lightweight.

Even starting from the MSDF text rendering example, adapting it to work with q5 text settings and transformations was quite the challenge. https://webgpu.github.io/webgpu-samples/?sample=textRenderingMsdf#msdfText.wgsl

I actually don't think I'm done yet either, can I further optimize q5's WebGPU text renderer by caching char data? 🤞🏻 Stay tuned... 😅

Research on realtime text rendering in WebGPU : September 22, 2024

I've realized that while caching text as images might be a fine approach for applications in which the text may rotate but not change in scale or color, such as in game menu UI, but it's not good enough for general use.

I came across this awesome video on GPU text rendering and my interest was piqued by the SDF rendering technique. https://youtu.be/SO83KQuuZvg?si=7zhjyiuUccCIMg3C&t=1703

At the end of the video's coverage of SDF, this paper is referenced: "Shape Decomposition for Multi-channel Distance Fields" by Chlumsky Viktor

MSDF improves on SDF by preserving sharp corners in text, which greatly increases the overall quality of the final rendered text. It seems perfect for q5.js as it wouldn't require opentype.js (171kb minified).

The author of this technique created, msdfgen, an open source MIT licensed C++ based tool for creating msdf files. Sadly it seems no one has made a JS or wasm port of it yet.

I found msdf-bmfont-xml which is node.js based, but it uses msdfgen behind the scenes. https://github.com/soimy/msdf-bmfont-xml/blob/master/index.js

Recently Chlumsky authored his own C++ tool for text atlas generation, msdf-atlas-gen, last updated in June 2024. It seems that over the years he expanded on his previous technique to create 4 channel "MTSDF" files, which utilize the alpha channel to store "Soft effects": "effects that use true distance, such as glows, rounded outlines, or simplified shadows". How incredible!

I would take the higher quality text rendering in exchange for slightly slower performance compared to SDF, yet the source files for msdfgen weigh in at +300kb, not including build files. These are quite verbosely written C++ files though, so I'd expect a minified JS port could be significantly smaller.

But as a catch 22, it seems that to actually create the MSDF textures from .ttf or .otf font files, a JS port of msdfgen would need to load the vectors from those files, which would de facto require using opentype.js or a similar library, which we wanted to avoid in the first place. Ahhh.

Although SDF texture generation from a typical font file could be done just using Canvas2D and compute shaders, I don't want to cop out and do that. I'm determined to provide something newer and better in q5, otherwise what's even the point? https://protectwise.github.io/troika/troika-three-text/
https://medium.com/@clashofcoins/implementing-sdf-text-rendering-in-pixi-js-3cf78614071d

So all this to say that users would need to provide their own msdf texture and atlas files for realtime text rendering using this technique in q5, which I don't think is a bad tradeoff given the results.

Check out this website which makes an API call to a server that runs msdf-bmfont-xml, it enables users to convert characters from a font file into .png msdf images and corresponding text atlases in .json format. https://msdf-bmfont.donmccurdy.com/

Luckily, MSDF support has already been implemented in WebGPU! All I need to do is figure out how to integrate it into q5. 😅 https://webgpu.github.io/webgpu-samples/?sample=textRenderingMsdf#msdfText.wgsl

Misconceptions about the WebGPU API : September 12, 2024

When I started working on the experimental q5 WebGPU renderer, I'd never done any lower level graphics programming before. I really enjoy using the WebGPU API (especially with help from ChatGPT) but I had many misconceptions about it (and I still might be wrong about some of this info lol) so I thought I’d reflect on my current understanding.

I thought that bind groups were going to be pipeline specific, since when a pipeline is created it takes bind group layouts as input. Yet, it seems binding groups are more like a shared pool of memory. You only get four groups but they can have a lot of entries.

I thought that I could place an array of images into a bind group and easily access them in the shader, but that’s not really possible. Texture 2d arrays can be made, but they can't store different sized textures. Bind groups can store many textures as separate entries but then they can't be accessed in the fragment shader function in a switch or if statement, which does make sense: checking conditions would be very slow over millions of image texture sample calls. So instead it seems that before drawing each image they need to be set in bind group memory. Does this mean only one image can be stored in gpu memory at a time? Not sure.

I think it’ll be best to organize bind group memory by the frequency that it's modified. Bind group 0 can store uniforms like the size of the canvas, which rarely or never changes. Bind group 1 can store data that’s updated either never or every frame: transformations. Bind group 2 can store data that’s updated every frame: colors. Bind group 3 is for image data which may be set multiple times per frame.

Vertex buffers are also shared memory and I found it's best to just keep them separate, so the shape drawing vertices are at buffer 0 and image vertices are at buffer 1, with an empty buffer layout at 0 to pad the vertex layout for the image pipeline.

Speaking of vertex buffers, ChatGPT recently informed me that's it's possible to map memory directly to the GPU, so no need to convert to a Float32Array in CPU memory just to transfer it to the GPU. Although I haven't been able to get this working yet, this approach could potentially be a big optimization, since conversion to Float32Array takes up to 20% of render time in my tests.

Still a lot of work to do before the official v3 release! Next on my list is text rendering.

Transformations and color blending in q5-webgpu! : September 5, 2024

q5 is on track to become the world's first 2D WebGPU library!

For v2.2, I spent a lot of time trying out different implementations of transformations to see what would be fastest. I learned that matrix multiplication is very fast on GPUs, since they have hardware dedicated to it.

I also learned that translations are really slow with the Canvas2D renderer. p5play uses translate when drawing every sprite lol, so I may try editing p5play to avoid using the translate function when a sprite has no rotation.

It's rumored that Apple will release WebGPU in Safari on September 16th, making it standard for ~90% of internet users. I'd love to have image rendering in webgpu and an initial version of the docs ready by then!

q5.js WebGPU is 16x faster than p5.js WebGL! : July 25, 2024

q5's WebGPU renderer can run sketches up to:

3x faster than q5's q2d renderer
4x faster than p5's p2d renderer
16x faster than p5's webgl renderer

The goal with q5.js is for it to become a superior alternative to p5.js for real time rendering of interactive art and games. Even though the q5 WebGPU renderer is still in the "proof of concept" stage of development, these are encouraging results!

Better performance also increases accessibility since it enables users to do more on a wide variety of devices.

Let's take a look at "Rainbow Loops", one of the sketches I used to test performance.

https://aijs.io/editor?user=quinton-ashley&project=rainbowLoops

tests performed on a M1 MacBook Air 2020

q5.js webgpu renderer

q5.js q2d renderer

p5.js p2d renderer

p5.js webgl renderer

(let's "optimize" the sketch by removing over half the rects 🥲)

Here's the p5.js WEBGL version of the "Rainbow Loops" sketch.

let size;

function setup() {
  createCanvas(windowWidth, windowHeight, WEBGL);

  size = Math.min(width, height) / 2 - 10;

  angleMode('degrees');
  noStroke();
}

function draw() {
  background(0);

  for (let i = 0; i < 720; i++) {
    let r = cos(frameCount + i * 2) * 128 + 128;
    let g = sin(frameCount + i * 3) * 128 + 128;
    let b = cos(frameCount + i) * 128 + 128;
    fill(r, g, b);

    let x = cos(frameCount + i / 2 + mouseX) * size * sin(frameCount + i);
    x *= sin(frameCount + i);
    let y = sin(frameCount + i / 2 + mouseY) * size * cos(frameCount + i);
    for (let j = -5; j <= 5; j++) {
      rect(x + j * 20, y, 10, 10);
    }
  }
}

Here's the q5 WebGPU version.

let size;

function setup() {
  createCanvas();
  size = Math.min(canvas.w, canvas.h) / 2 - 10;

  angleMode('degrees');
  noStroke();
}

function draw() {
  background(0);

  for (let i = 0; i < 720; i++) {
    let r = cos(frameCount + i * 2) * 0.5 + 0.5;
    let g = sin(frameCount + i * 3) * 0.5 + 0.5;
    let b = cos(frameCount + i) * 0.5 + 0.5;
    fill(r, g, b);

    let x = cos(frameCount + i / 2 + mouseX) * size * sin(frameCount + i);
    x *= sin(frameCount + i);
    let y = sin(frameCount + i / 2 + mouseY) * size * cos(frameCount + i);
    for (let j = -5; j <= 5; j++) {
      rect(x + j * 20, y, 10, 10);
    }
  }
}

Q5.webgpu();

Learn more about q5's WebGPU renderer in the Module Info section of q5's src folder readme.md file.

Optimizing Performance in JavaScript : July 23, 2024

WebGPU rendering is so fast that seemingly insignificant differences in JS code can make a sizable difference in render time.

I used Chrome's Dev Tools for Performance testing and let it record for ~5 seconds each test. ChatGPT provided helpful advice and explanations of performance differences.

In general, instructing a computer to do less always leads to better performance. But performance optimization is about getting the same or roughly equivalent results with greater efficiency.

Let's take a deep dive into some discoveries I made while developing q5.

Use local variables in frequently run JS code

Initially, when developing the q5 webgpu renderer I had everything in one file and used local variables for stacks like the drawStack which stores vertex counts for each shape.

// main render loop
for (let i = 0; i < drawStack.length; i++) {
  $.pass.draw(drawStack[i], 1, o, 0);
  o += drawStack[i];
}

Yet when trying to make the code modular, like the q2d renderer modules, I noticed that nearly the same code was slower. ($ is a reference to this, a q5 instance)

for (let i = 0; i < $.drawStack.length; i++) {
  $.pass.draw($.drawStack[i], 1, o, 0);
  o += $.drawStack[i];
}

Accessing properties of an object involves more overhead because the JavaScript engine needs to look up the property in the object's prototype chain. But accessing local variables is very fast because they're stored in the function's stack frame and can be accessed directly.

Storing a local variable reference to an object property is beneficial if the property would be accessed very frequently. In my test the main render loop iterates 7920 times.

let drawStack = $.drawStack;

This little change brought scripting usage down from 14% to 10%! I was super surprised by this because I've never even thought of it before.

Avoid conditional branching in shader code

Which is faster? This bit of wgsl shader code always calculates the mix between colors, even if colorIndex is an integer.

let index = u32(colorIndex);
return mix(uColors[index], uColors[index + 1u], fract(colorIndex));

This code checks if colorIndex is an integer to avoid doing the color mixing calculation.

let index = u32(colorIndex);
let f = fract(colorIndex);

if (f == 0) {
  return uColors[index];
}
return mix(uColors[index], uColors[index + 1u], f);

The first code segment actually performs ~20% better! That's because there's not much difference for the GPU between doing the simple calculation or not, but the if statement does cause a slowdown. Why?

Conditional branching is when code execution forks into two or more possible paths based on statements, such as if, else, and switch.

To achieve high performance, modern processors are pipelined. They consist of multiple parts that each partially process an instruction, feed their results to the next stage in the pipeline, and start working on the next instruction in the program. This design expects instructions to execute in a particular unchanging sequence. Conditional branch instructions make it impossible to know this sequence. So conditional branches can cause "stalls" in which the pipeline has to be restarted on a different part of the program. - Wikipedia : Branch (computer science)

When a modern processor encounters a branch, it can be optimal for it to evaluate both paths because it stalls to decide which path to take, once a decision is made the processor will select the result of the correct path and halt or discard the result of other paths.

Conditional branching is at the heart of what makes a computer different from other machines. Games necessarily run hundreds of conditional branches every frame to produces different output based on user input. But branching should be avoided in fragment shader code when possible, since it may be run thousands or even millions of times each frame.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly