TED-SMPLX

A Tool for Extracting 3D Avatar-Ready Gesture Animations from Monocular Videos

MIG 2022

Responsive image

3D Avatar-Ready Gesture Animations from Monocular Videos

Abstract

Modeling and generating realistic human gesture animations from speech audios has great impacts on creating a believable virtual human that can interact with human users and mimic real-world face-to-face communications. Large-scale datasets are essential in data-driven research, but creating multi-modal gesture datasets with 3D gesture motions and corresponding speech audios is either expensive to create via traditional workflow such as mocap, or producing subpar results via pose estimations from in-the-wild videos. As a result of such limitations, existing gesture datasets either suffer from shorter duration or lower animation quality, making them less ideal for training gesture synthesis models. Motivated by the key limitations from previous datasets and recent progress in human mesh recovery (HMR), we developed a tool for extracting avatar-ready gesture motions from monocular videos with improved animation quality. The tool utilizes a variational autoencoder (VAE) to refine raw gesture motions. The resulting gestures are in a unified pose representation that includes both body and finger motions and can be readily applied to a virtual avatar via online motion retargeting. We validated the proposed tool on existing datasets and created the refined dataset TED-SMPLX by re-processing videos from the original TED dataset. The new dataset will be made available for future research. Samples showing the extracted gesture motion can be found in the video link at https://youtu.be/nmef_FUavzU.

Video

Download

We provide the SMPLX pose parameters corresponding to each video clip from the original TED-Gesture dataset, along with tutorial code to visualize the data.

You can request download here.

Referencing the TED-SMPLX Dataset

@conference{ted_smplx:MIG:2022,
  title = {A Tool for Extracting 3D Avatar-Ready Gesture Animations from Monocular Videos},
  author = {Andrew Feng, Samuel Shin, and Youngwoo Yoon},
  booktitle = {Motion, Interaction and Games},
  pages = {0000--0000}, 
  year = {2022}, 
}

Contact

If you have any questions or comments regarding the dataset, please contact Andrew Feng (feng@ict.usc.edu).

Created by USC Institute for Creative Technologies.