3D Head Tracking in Video « : How a simple model can do big. Free-D

3D Head Tracking in Video How a simple model can do big.

Written By: urbeller - May• 14•13

In this post, I will describe my own implementation of a head tracker. 3D Head Tracking (HT) consists of inferring the 3D orientation and displacement of the head, often from a (single) video source. Here, the video source will be a Logitech C910 webcam. Of course, any webcam will do. Video grabing and
image processing will be done using OpenCV library.

The outline of the algorithm is as follow:

Grab a frame and detect 2D features.
Initialize the head pose.
Compute 3D features→FT_old.
Grab a frame and detect 2D features.
Compute 3D features →FT_new.
Compute motion that registers _{FTnew→FT_old.}
Update head pose.
FT_old = FTnew and go to 4.

At first glance, the toughest step in this outliine seems to be the 2D→3D features conversion. It turns out this is among the easiest task thanks to a simple idea: Cylindrical head model. In a nutshell, 2D features are unprojected from the camera reference to a virtual cylinder. This intersection provides the
sought 3D positions of the image features. But first thing first…

Grabing an image is easy using OpenCV. Boiler plate code for that is a loop that looks like:

Mat frame, img;
VideoCapture capture;
int dev_id = 1; //Device number.

capture.open(dev_id);
if (!capture.isOpened()){
    cerr<< "Failed to open video device "
        << dev_id<<" \n"<<endl;
    return 1;
}

for (;;){
    capture>>frame;
    if ( frame.empty() )
        continue;

    frame.copyTo(image);
    imshow( window_name , image );
    char key = (char) waitKey(5);

    if( key == ' ' )
        break;
 }

In each input frame, 2D features are detected. Among the myriad of features, KLT are probably the most suited to our real-time needs. Indeed, KLT are easy and fast to compute because there is no descriptor computation and no scale-space analysis is involved (at least not as SIFT). Using OpenCV, KLT features are retrieved as follow:

int MAX_COUNT=100;
TermCriteria termcrit(CV_TERMCRIT_ITER|
                      CV_TERM_CRIT_EPS,
                      20, 0.3);
// We use two sets of points in order to swap
// pointers.
vector<Point2d> points[2];
Size subPixWinSize(10,10), winSize(21,21);

//Convert image to gray scale.
cvtColor(image,gray,CV_RGB2GRAY);

//Feature detection is performed here...
goodFeaturesToTrack(gray, points[1], MAX_COUNT,
                    0.01, 10, Mat(), 3, 0, 0.04);
cornerSubPix(gray, points[1], subPixWinSize,
             Size(-1,-1), termcrit);<br />

Now that features are detected, they are unprojected and intersected with the virtual cylinder. Exact solution to this ray-cylinder intersection could easily be found on the net. Now that we have 3D positions of features at time T_t-1 the same features are tracked in the upcoming frame using optical flow routine from OpenCV:


calcOpticalFlowPyrLK(prev_gray, gray,
                     points[0], points[1],
                     status, err);

The result of this tracking is a set of features at time T_t. To get the change in head pose, we register the 3D features at time T_t-1 with 2D features at time T_t. This is performed using a PnP algorithm. Because the virtual cylinder represents the head (a rough estimate!), it must be updated with the incremental pose
just computed. In a sens, the cylinder is a state object of the tracked head.

The head pose algorithm runs comfortably on a 2.4 ghz laptop using a Logitech C910 webcam as the following video depicts:

Posted in BLOG, experiments , C++, cylindrical head, head tracking, KLT, opencv, video | 7 Comments »

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

7 Comments

urbeller says:

May 14, 2013 at 19:42

good !

Reply
Vince says:

June 16, 2013 at 13:49

Nice job! Thanks for sharing. I think I will try this technique one of these days.

Reply
KD says:

June 16, 2013 at 14:58

Really interesting.

Reply
Graham Fogarty says:

June 17, 2013 at 13:21

Nice demonstration Jamil. Have you tried other tracking methods – with LK tracking points drift over time and it cant handle occulsion as you are no doubt aware. I would be interested in trying other tracking techniques based on your code if you are willing to share.

Regards, Graham

Reply
- urbeller says:
  
  June 17, 2013 at 14:00
  
  Hey there ! thank you for stopping by 🙂
  Actually, my strategy was to redetect points “on-demand”. When a face
  turns left for example, I redetect points on the opposite side (the one
  that is not occluded). The beauty of the cylinderical model is the
  fact that it holds the “state” of the head at any time. A drift is
  less likely to happen in this case.
  
  Reply
Kiran says:

January 15, 2014 at 17:29

Jamil, Thanks. Not sure how many times I watched your video on youtube :). Have some doubts.

1. I use Posit instead of solvePnp as in ehci project which uses sinusoidal head model. ehci. If head is not placed centrally in a video, will I need to translate image points to origin ? How to do that ? Ehci subtracts 160x/120y on every image point for resolution 320*240. I am getting wrong rotation when ever head is not places centrally in video. Kindly suggest.

2. Your code shows 100 corner points to detect. But your youtube video didn’t contain as much points. Surprisingly no tracking points on mouth corners. How many ever times, I run, cvGoodFeaturesToTrack with what ever parameters, I get features at mouth corners. You don’t have them!!

3. Is cornerSubPix providing any improvement for face ?

4. Can you elaborate a little more on detecting points on non occluded side of face 🙂 ? I didn’t see that trick in the youtube video.

Regards,
Kiran.

Reply
- urbeller says:
  
  January 16, 2014 at 02:26
  
  Kiran…ray of light ? 🙂
  
  Thank you for your interest in my little project. Before going any forward, I am working on
  a second version that will include 2 major additions.
  
  1/ I am not familiar with ehci, though I saw a video demo of it. In my case, I assume
  that the initial face position is fronto-parallel (basically rotation is identity).
  Also, for efficiency, the face must be in a region of interest (could that be a rectangle).
  Then, the result of tracking will determine the rotation and translation of the face at the
  same time. I noticed that PnP gave better results. Posit assumes an orthographic or affine
  projection…I think !
  
  2/ The 100 points in my code is the maximum features point. In practice, less than that are
  found. Of course, I am only interested in reliable features. I do get some features at mouth
  corners when I pretend speaking 🙂
  3/ Haven’t tested the improvment. Since computation time didn’t suffer from cornerSubPix, I kept
  it.
  4/ This is a work in progress. Once the points are detected and tracked, their normals can be estimated
  (they lie on a cylinder). I use the normal direction to weigh the feature’s contribution. I haven’t
  talked about it in my blog because it’s not finished yet. Stay tuned !!!
  
  Reply

Free-D

3D Head Tracking in Video How a simple model can do big.

7 Comments

Leave a Reply Cancel reply

Recent Posts