Collection of Text2Motion-Generation Method, mostly published on TopConference, continue updating....
C. Ahuja and L. Morency, “Language2pose: Natural language grounded pose forecasting,” in Int. Conf. on 3D Vis., 2019, pp. 719–728. http://chahuja.com/language2pose
Ghosh, Anindita, et al. "Synthesis of compositional animations from textual descriptions." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
https://github.com/anindita127/Complextext2animation
Guo, Chuan, et al. "Generating diverse and natural 3d human motions from text." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
https://ericguo5513.github.io/text-to-motion
Petrovich, Mathis, Michael J. Black, and Gül Varol. "TEMOS: Generating diverse human motions from textual descriptions." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
https://mathis.petrovich.fr/temos/
Athanasiou, Nikos, et al. "Teach: Temporal action composition for 3d humans." 2022 International Conference on 3D Vision (3DV). IEEE, 2022.
G. Tevet, B. Gordon, A. Hertz, A. H. Bermano, and D. Cohen-Or, “Motionclip: Exposing human motion generation to clip space,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 358–374.
https://guytevet.github.io/motionclip-page/
Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang,and Ziwei Liu. 2022. AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars. ACM Trans. Graph. 41, 4, Article 161 (July 2022),19 pages.
https://github.com/hongfz16/AvatarCLIP
Guo, Chuan, et al. "Tm2t: Stochastic and tokenized modeling for the reciprocal generation of 3d human motions and texts." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
https://ericguo5513.github.io/TM2T/
Lin, Junfan, et al. "Being comes from not-being: Open-vocabulary text-to-motion generation with wordless training." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.
https://github.com/junfanlin/oohmg
Kim, Jihoon, Jiseob Kim, and Sungjoon Choi. "Flame: Free-form language-based motion synthesis & editing." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 7. 2023.
https://github.com/kakaobrain/flame
Dabral, Rishabh, et al. "Mofusion: A framework for denoising-diffusion-based motion synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
https://vcai.mpi-inf.mpg.de/projects/MoFusion/
J. Zhang, Y. Zhang, X. Cun, Y. Zhang, H. Zhao, H. Lu, X. Shen, and Y. Shan, “T2m-gpt: Generating human motion from textual descriptions with discrete representations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., June 2023, pp. 14 730–14 740.
https://mael-zys.github.io/T2M-GPT/
Wang Y, Leng Z, Li F W B, et al. Fg-t2m: Fine-grained text-driven human motion generation via diffusion model[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 22035-22044.
Zhou, Zixiang, and Baoyuan Wang. "Ude: A unified driving engine for human motion generation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
https://zixiangzhou916.github.io/UDE/
Guo, Chuan, et al. "MoMask: Generative Masked Modeling of 3D Human Motions." arXiv preprint arXiv:2312.00063 (2023).
https://ericguo5513.github.io/momask/
Barquero, German, Sergio Escalera, and Cristina Palmero. "Seamless Human Motion Composition with Blended Positional Encodings." arXiv preprint arXiv:2402.15509 (2024).
https://barquerogerman.github.io/FlowMDM/
Petrovich, Mathis, et al. "Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation." arXiv preprint arXiv:2401.08559 (2024).