Object detection, tracking, and 6DoF pose estimation in web browser
Better than a long explanation, a short video:
Do you have a lighter? Let the dragon light it on our live demo.
This repository hosts a full Integrated Training Environment. You can train your own neural network models directly in the web browser and interactively monitor the training process.
The main use case is object detection and tracking and pose estimation: you can train a neural network model to detect and track a real-world object (for example, a lighter) with 6 Degrees of Freedom.. Once trained, this model can be used with WebAR.rocks.object to create a web-based augmented reality application. For instance, you could have a genie pop out of the lighter in augmented reality, as if it were a magic lamp.
This software is fully standalone. It does not require any third party machine learning framework (Google TensorFlow, Torch, Keras, etc.).
/css/: Styles for the integrated training environment/images/: Images used in the integrated training environment/libs/: Third-party libraries/src/: Source code
/glsl/: Shader code/js/: JavaScript code
/preprocessing/: Image preprocessing modules/problemProviders/: Problem providers (e.g., object detection or image classification)/share/: Contains the GL debugger/trainer/: Main directory for the integrated training environment
/core/: Core object model (trainer, neural network, and subcomponents)/imageProcessing/: Image transformations for pre- and post-problem provider execution/UI/: User interface components/webgl/: Wrappers for WebGL objects/trainingData/: Data used for neural network training
/images/: Image-based datasets/models3D/: 3D object datasets/trainingScripts/: Scripts used to train neural networks within the integrated environment/tutorials/: Tutorials to help you get started/players/: Web application demos using trained neural networks
/webar-rocks-object-boilerplate/: React + Vite web application boilerplate with a lighter/webar-rocks-object-lighter-dragon/: React + Vite web application displaying a dragon lighting a lighter/trainer.html: The integrated training environmentTraining neural networks on GPUs in the browser may seem unconventional, but it has several advantages:
Object detection and tracking are performed using synthetic data generated in real time with THREE.js from 3D models. This approach offers significant benefits:
What happens on the GPU stays on the GPU.
WebGL provides direct access to powerful GPU capabilities in the browser, while JavaScript itself can be relatively slow. Data transfers between the GPU and CPU are also slow, so it is crucial to keep as much work as possible on the GPU.
A single training iteration for object detection and tracking includes:
All these steps run entirely on the GPU. This requires:
minibatch.With a medium range gaming GPU, we can run about 500 training cycles per second. The GPU usage should be close to 100%.
Contact us at contact__at__webar.rocks if you need:
We have already trained networks to detect and track:
We strongly recommend following this tutorial before going further: WebAR Application Tutorial: A Dragon Lights a Lighter.
Clone this GitHub repository.
Start a static web server (assuming python is installed):
python -m SimpleHTTPServer
Open the following URL in your web browser: http://localhost:8000/trainer.html?code=ImageMNIST_0.js.
You should see the integrated training interface with the script located at /trainingScripts/ImageMNIST_0.js. Click RUN in the CONTROLS section to start the training. This script trains a model to classify the MNIST dataset (handwritten digits). The model learns to distinguish digits 0 through 9.
Switch to the LIVE GRAPH VIEW tab (under MONITORING) to watch the training progress in real time. The expected output digit is shown in red, and the bars on the chart represent the network’s current predictions.
WebAR.rocks.train uses WebGL1 and requires the following widely supported extensions:
OES_texture_float, texture_float_linear, WEBGL_color_buffer_float.
You can check your available WebGL extensions at webglreport.com.
The interface is designed for desktop use. We strongly recommend using a dedicated desktop computer with an NVIDIA GeForce GPU. Laptops or mobile devices are generally not suited for continuous, intensive GPU usage.
The players and other WebAR.rocks libraries using trained neural network models for real-time inference only don't have this strong compatibility constraints. They implement alternative execution paths (using WebGL2 or handling floating point linear texture filtering with specific shaders) to work on any devices including low-end mobile devices.
A full documentation of the training script API is forthcoming. For now, feel free to adapt one of the provided scripts in /trainingScripts/ to suit your needs.
The target object must have an aspect ratio between 1/2.5 and 2.5. An object with an aspect ratio of 1 fits into a square (equal width and height). For example, the standard Red Bull can has an aspect ratio of 2.5 (height/diameter).
Elongated objects, such as a fork, a pen, or a knife, do not meet this requirement. In such cases, it may be easier to target only a specific part of the object (e.g., the end of the fork). We only detect objects that fully fit within the camera's field of view (i.e., objects that are not partially visible).
Highly reflective objects, such as shiny metallic items, are harder to detect. Similarly, refractive materials are more challenging due to their high variability.
The 3D model should be in one of the following file formats: .OBJ, .GLTF, or .GLB. The textures should have power-of-two dimensions, and their highest dimension (width or height) must be 2048 pixels or less.
If necessary, the 3D model should embed the PBR textures (typically the metallic-roughness texture).
We provide 3D modelling support.
We can train a neural network to detect multiple objects simultaneously. The first detected object is then tracked (we currently do not support simultaneous multi-object tracking).
We have not yet tested the system's limitations. The more objects you need to detect and track, the less accurate the neural network becomes, and this also depends on the similarity between the objects. It works reliably with three objects; for more than three, you will need to conduct additional tests.
The more generic the object, the more challenging the task becomes. For instance, detecting a generic lighter is more difficult than detecting a specific lighter model. If you require a neural network that generalizes extensively, you must use multiple 3D models for the same target and apply material tweakers to randomly alter certain materials.
A tutorial is included in this repository to help you get started:
More to come! Stay tuned by following us on Linkedin or X.
This software may be used for long-term (multi-month) training on high-end GPUs, so stability is critical. Hence:
For reference, WebAR.rocks.train is part of a larger project that includes other problem providers for: