In this article, I’m going to explain how to implement motion controls in the browser. That means you’ll be able to create an application where you can move your hand and make gestures, and the elements on the screen will respond.

Here’s an example:

The above code does the following:

Let’s take a closer look at the handData object since that’s where the magic happens. Inside handData is multiHandLandmarks, a collection of 21 coordinates for the parts of each hand detected in the video feed. Here’s how those coordinates are structured:

A couple of notes:

Here’s a visual representation of the hand coordinates:

Now that you have the hand landmark coordinates, you can build a cursor to follow your index finger. To do that, you’ll need to get the index finger’s coordinates.

You could use the array directly like this handData.multiHandLandmarks[0][5], but I find that hard to keep track of, so I prefer labeling the coordinates like this:

And then you can get the coordinates like this:

I found cursor movement more pleasant to use with the middle part of the index finger rather than the tip because the middle is more steady.

Now you’ll need to make a DOM element to use as a cursor. Here’s the markup:

And here are the styles:

A few notes about these styles:

Now that we’ve created a cursor element, we can move it by converting the hand coordinates into page coordinates and applying those page coordinates to the cursor element.

Now that we have a working cursor, we’re ready to move on.

The next step in our journey is to detect gestures, specifically pinch gestures.

First, what do we mean by a pinch? In this case, we’ll define a pinch as a gesture where the thumb and forefinger are close enough together.

To designate a pinch in code, we can look at when the x, y, and z coordinates of the thumb and forefinger have a small enough difference between them. “Small enough” can vary depending on the use case, so feel free to experiment with different ranges. Personally, I found 0.08, 0.08, and 0.11 to be comfortable for the x, y, and z coordinates, respectively. Here’s how that looks:

It would be nice if that’s all we had to do, but alas, it’s never that simple.

What happens when your fingers are on the edge of a pinch position? If we’re not careful, the answer is chaos.

With slight finger movements as well as fluctuations in coordinate detection, our program can rapidly alternate between pinched and not pinched states. If you’re trying to use a pinch gesture to “pick up” an item on the screen, you can imagine how chaotic it would be for the item to rapidly alternate between being picked up and dropped.

The trick is that the delay must be long enough to be reliable but short enough to feel quick.

We’ll get to the debounce code soon, but first, we need to prepare by tracking the state of our gestures:

Now we can write a function to update the pinched state:

Here’s what updatePinchState() is doing:

We can run updatePinchState(handData) each time we get hand data so that we can put it in our onResults function like this:

Now that we can reliably detect a pinch state change, we can use our custom events to define whatever behavior we want when a pinch is started, moved, or stopped. Here’s an example:

Now that we’ve covered how to respond to movements and gestures, we have everything we need to build an application that can be controlled with hand motions.

Here are some examples:

If you’ve made it this far, you’ve seen how to implement motion controls with a browser and a webcam. You’ve read camera data using browser APIs, you’ve gotten hand coordinates via machine learning, and you’ve detected hand motions with JavaScript. With these ingredients, you can create all sorts of motion-controlled applications.

What use cases will you come up with? Let me know in the comments!

source