After experimenting with voice commands using the Kinect, the next thing I wanted to experiment with was tracking the users movement.
Using Rumen’s examples I was quickly able to start recognizing all the joints of the body that Kinect is able to track. Kinect can capture positional and rotational data of a user’s center of hips, spine, neck, head, left shoulder, left elbow, left wrist, left hand, right shoulder, right elbow, right wrist, right hand, left hip, left knee, left ankle, left foot, right hip, right knee, right ankle, right foot, spine at the shoulders, left hand tip, left thumb, right hand tip, and right thumb. That sounds like an impressive amount of joints, and it is, however, there are a few issues.
First, notice the Kinect does not handle individual fingers other than the thumbs. This means any intricate finger gestures, like pointing or flashing the peace sign, can not be tracked. What it can do is recognize if a user’s hand is open or closed, such as in a fist, using a combination of the hand, hand tip and thumb joints.
A second issues comes when joints get lost by the Kinect. Joints get lost by either getting outside the view of the Kinect’s camera or by becoming hidden by other parts of the user’s body. When this happens the Kinect seems to try and guess where the joints might be located and how they might be rotated. The data it generates with this guess is not particularly accurate or useful.
That brings me to the third and most troubling issue that the Kinect presents, which is the generally dirty data that it generates for the position and rotation of joints. The term “dirty” maybe isn’t completely fair so let me describe what I mean. The Kinect returns positional and rotational data for the users’ joints somewhere in the neighborhood of 60 times a second. In that data, assuming it can see the joint clearly, there is quite a bit of jitter. This is basically because the Kinect is trying to return very precise data. However, when the jittery data is translated onto a game object, something in the 3D scene, it often results in some very shakey movement even though the user might be remaining relatively still. Even worse, when the Kinect losses site of a joint, the data often results in some impossible translation of the displayed game object.
To fix the dirtiness it probably is wise to write some code in Unity to do some of what is called smoothing and damping of the Kinect data before applying it to your game objects. This will take the jittery data and come up with a smoother, generally more stable, average. The side effect of this is a bit of slow down in the response between the user’s movement and the representative displayed game object. It’s not horrible and you can adjust the threshold between jittery, but immediate, or smooth, but with lag, very easily.
There is no great way to deal with the loss of joint data mid stream. In my opinion the best way may be to place limits on both the position and rotation of the joints using Unity code. At least then the displayed game objects will never be forced into impossible or unknown positions and or rotations.
That’s enough general information about the how the Kinect tracks motion, let’s talk about what specifics I experimented with that might be helpful in our teaching simulator.
The first experiment that came to mind was to provide the user a way to expand on a static view in order look and move around the virtual classroom. My initial thoughts were to rotate the camera in the virtual scene by tracking the rotation of the user’s head. After some experimentation, this presented some unexpected issues.
Unlike when wearing some type of VR headset, the monitor displaying the virtual classroom is in a fixed position, in front of the user. Therefore, to look at the display, the user must generally always be facing forward. As a user turns their head to rotate the virtual camera they are actually turning away from the screen. To improve upon this obvious defect in strategy I wrote some code in Unity that could take into account how close the user is to the screen, an consequently the Kinect. This assumes the Kinect is located directly above the monitor.
I decided that as a user gets closer to the screen their ability to rotate the virtual camera should increase and should decrease as they back further away. This made logical sense to me because as you approach a window your ability to see more increases as you approach the window itself. When right up against a window you have the ability to look left, right and up and down to get a better view of what is on the other side. As you back away from a window, and the edges of the window become totally in view, the the effect of looking left, right and up and down out the window becomes null. So the ability to manipulate the rotation of the virtual camera, via head rotation, is proportional to the distance from the screen.
This experiment met with mediocre results. Though it functionality worked well, once smoothing and damping was applied, the overall feel of the motion felt unnatural. Also, a side effect of the smoothing creates a feeling of being at sea with the motion on screen never quite settling on what you really wish to look at. Watch the video below for a demo of head tracking rotation.
The second experiment I did was to see how it felt to move deeper into the virtual space as you move closer to the Kinect. My theory here was that it might be important for a teacher to walk closer to particular students, who would presumedly be sitting in rows. This was relatively easy to achieve by just moving the virtual camera forward and back along the Z axis according to the distance the user was from the Kinect.
This movement had a nice feel to it, with the smoothing applied, however, it did raise an unforeseen issue. The issue deals with the scale of the virtual room related to the physical space the user has to move around in. Meaning, say the virtual classroom is 20 ft deep by 20 ft wide, does this mean the physical space the user is in has to also be 20’x20′? Well, for the movement to be represented in a one to one ratio the answer would be yes. However, with a little bit of programming magic it is not necessary to stick to a strict one to one ratio. Using a multiplier on the user’s physical movement I was able to make virtual camera move as much, or as little, as I wanted in the virtual space. Watch the video below for a demo of how the virtual camera can move in parallel to movement in a physical space.
Another issue with sticking to a strict one to one ratio of movement is; what if one user’s physical room is different in size from another user’s? The virtual classroom will be the same size for every user. Even though a user’s physical space might differ, we must be able to provide them with the ability to traverse the entire virtual space. We can do this by establishing a ratio between the physical space and the virtual one.
The virtual space will obviously be a size that we will dictate, but a user’s physical space, we may not have any control over. So to get the ratio between the two, we will need to provide the user with that ability to let the program know how much space they have. I did this by writing a room measurer algorithm where all the user has to do is walk to the extremes of the room, front, back left and right, to establish how much space they have. The Kinect, along with my Unity code, can then figure out the ratio of physical space relative to the virtual space and can establish a multiplier for the movement to react accordingly.
With lateral movement feeling pretty good and natural in front of the Kinect, I wanted to reevaluate the rotation mechanic. I stated before, tracking the user’s head rotation in order to rotate the virtual camera had some issues. But I had not given up on giving the user the ability to change what they are looking at, even though physically, they would always being looking straight ahead at the monitor.
The plan I came up with was to allow the user to control what they are looking at by simply turing the virtual camera towards who they are talking to!
Using voice commands to control the rotation of the camera eliminates the wavy, feeling as though you are sea, messiness that happens when I was tracking the user’s head rotation. It also allows the user to look at different things in the virtual scene without having to turn their head away from the monitor.
This method of rotation, combined with the ability to proportionally translate movement in the physical space to movement in the virtual space, seems pretty natural and, I think, works rather well. Watch the video below to see it all in action.
Leave a Reply
You must be logged in to post a comment.