Spatio-semantic Task Recognition: Unsupervised Learning of Task-discriminative Features for Segmentation and Imitation

Abstract: Discovering task subsequences from a continuous video stream facilitates a robot imitation of sequential tasks. In this research, we develop unsupervised learning of the task subsequences which does not require a human teacher to give the supervised label of the subsequence. Task-discriminative feature, in the form of sparsely activated cells called task capsules, is proposed for self-training to preserve spatio-semantic information of a visual input. The task capsules are sparsely and exclusively activated with respect to the spatio-semantic context of the task subsequence: a type and location of the object. Therefore, the generalized purpose in multiple videos is unsupervisedly discovered according to the spatio-semantic context, and the demonstration is segmented into the task subsequences in an object-centric way. In comparison with the existing studies on unsupervised task segmentation, our work has the following distinct contribution: 1) the task provided as a video stream can be segmented without any pre-defined knowledge, 2) the trained features preserve spatio-semantic information so that the segmentation is object-centric. Our experiment shows that the recognition of the task subsequence can be applied to robot imitation for a sequential pick-and-place task by providing the semantic and location information of the object to be manipulated.

Bibtex

@article{park2021spatio,
  title={Spatio-semantic Task Recognition: Unsupervised Learning of Task-discriminative Features for Segmentation and Imitation},
  author={Park, J Hyeon and Kim, Jigang and Kim, H Jin},
  journal={International Journal of Control, Automation and Systems},
  volume={19},
  number={10},
  pages={3409--3418},
  year={2021},
  publisher={Springer}
}