Running Neural Networks Inside the Web Browser? Only if You Know How!



In our previous case study, we introduced you to one of our clients – ShareTheBoard, an Ed-Tech startup, designed to bridge the gap between digital and analog solutions.

Our dedicated team developed a custom neural network that excels at identifying and digitizing handwritten content on whiteboards while preserving its visual structure. This enhancement improves content visibility on output video streams, ensuring a seamless user experience across different devices and platforms. Remarkably, we accomplished all of this in under 12 months.

But now that the neural network is ready, it’s time for the next step...


Running a neural network in real-time on the server side is technically straightforward, but consumes large amounts of processing power, leading to high infrastructure costs. As a result, this approach limits commercialization options, making it impractical.

To avoid these issues, we decided to run our proprietary convolutional neural network on the client side. However, due to the client's business constraints, the application had to run within the web browser. Although advancements in web technologies such as WebAssembly, WebGL, and WebGPU make it possible to run small neural networks, larger setups presented significant challenges.

Once the neural network was completed, it was time to implement it into the test application. Unfortunately, ShareTheBoard soon encountered severe performance issues—the application would freeze the browser completely, rendering it unresponsive for multiple seconds and making the application unusable.

Our challenge was clear: optimize the performance of the neural network within the ShareTheBoard application. This task required innovative solutions and a thorough understanding of both the framework and the application's requirements to ensure a smooth and responsive user experience.


During the optimization process, we tested various approaches, each ending in failure. Here are some examples:

  1. We tried TensorFlow.js, which offers two implementations: one built on WebGL and another using pure JavaScript. The former resulted in the GPU being blocked for the UI, while the latter proved to be too slow for production use.
  2. We tested ONNX.js, Microsoft's alternative to TensorFlow.js. But despite our high hopes, its performance was similarly disappointing. Solutions caused the user interface to freeze, degrading the speed and user experience of ShareTheBoard’s app.
  3. We experimented with GPU.js, which accelerated the neural network on computers equipped with dedicated GPUs but slowed the performance on older laptops with integrated chipsets.

However, there was a breakthrough. Both TensorFlow and Onnx provided high-level APIs (Application Programming Interface) that limited operations to the entire model, restricting our efforts. This led us to consider solutions that offer a low-level approach.


Developing a solution took several trials and experiments but ultimately led to a great outcome. We created a proprietary library for running neural networks inside the web browser. We based it on WebGL, a JavaScript API, using shader programs written in the GLSL.

This implementation allowed us to break down the execution of the neural network into smaller layers. Although this slightly increased the total inference time, it enabled the web browser rendering process to access the GPU, effectively overcoming the UI freezing issue.

We gained granular control over the operations, leading to significant performance improvements and a smoother, more responsive user experience for ShareTheBoard.


After these changes, we successfully released our dedicated convolutional neural network content detection system, which operates in real time within the web application.

The app was tested across multiple scenarios with diverse hardware and software configurations. It performed successfully, leading to commercial success for ShareTheBoard and satisfying the client's needs.

Share this article: