Skip to main content

    kei shimonishi

    Generation of natural human motion is one of key techniques for multimodal dialogue systems with a human-like avatar. In particular, natural and expressive lip motion synthesis is necessary to make conversation between a user and an... more
    Generation of natural human motion is one of key techniques for multimodal dialogue systems with a human-like avatar. In particular, natural and expressive lip motion synthesis is necessary to make conversation between a user and an avatar richer. However, such expressive lip motion is often difficult to be generated automatically because it can be changed depending on phonemic context and prosody. To address this difficulty, we introduce a novel motion generation method on the basis of the modulation of a set of dynamic models learned from neutral motion data. As a suitable model for lip motion generation, we adopt a hybrid dynamical system, which consists of linear dynamical systems for each motion unit and a symbolic automaton for switching between these units. We show that, from the viewpoint of control theory, it is possible to modulate linear dynamical systems for various types of motion. Early results demonstrate the applicability of the proposed method using lip motion synthesis for simple phoneme sequences.
    When we choose items among alternatives, we sometimes face a problem of mismatch between what we actually want and selected items. Therefore, if an interactive system can probe our interests from several modalities (e.g, eye movements and... more
    When we choose items among alternatives, we sometimes face a problem of mismatch between what we actually want and selected items. Therefore, if an interactive system can probe our interests from several modalities (e.g, eye movements and speech recognition) and decrease these mismatch, the system can be helpful for decision making with a satisfaction. In order to build such interactive decision support systems, the systems need to estimate both users' interests (selection criteria for that decision) and users' knowledge about the content domain. Here, not only users' knowledge but also users' selection criteria can be changed; for example, users' selection criteria converge as a reaction to system's recommendation. Therefore, the system needs to understand the dynamics of users' selection criteria in order to choose appropriate actions. What makes more difficult is that the dynamics of users' internal states themselves can change depend on a phase of...
    This paper analyzes a self-height estimation method from a single-shot image using a convolutional architecture. To estimate the height where the image was captured, the method utilizes object-related scene structure contained in a single... more
    This paper analyzes a self-height estimation method from a single-shot image using a convolutional architecture. To estimate the height where the image was captured, the method utilizes object-related scene structure contained in a single image in contrast to SLAM methods, which use geometric calculation on sequential images. Therefore, a variety of application domains from wearable computing (e.g., estimation of wearer’s height) to the analysis of archived images can be considered. This paper shows that (1) fine tuning from a pretrained object-recognition architecture contributes also to self-height estimation and that (2) not only visual features but their location on an image is fundamental to the self-height estimation task. We verify these two points through the comparison of different learning conditions, such as preprocessing and initialization, and also visualization and sensitivity analysis using a dataset obtained in indoor environments.
    Multiple criteria decision making (MCDM) is a fundamental part of our daily lives. To support solving the problem with conflicting multiple criteria, several analysis methods for MCDM are hitherto proposed. Here, the choice of evaluation... more
    Multiple criteria decision making (MCDM) is a fundamental part of our daily lives. To support solving the problem with conflicting multiple criteria, several analysis methods for MCDM are hitherto proposed. Here, the choice of evaluation criteria is a key to successful decision-making support, including interactive assistance systems. With appropriate evaluation criteria, decision-support systems can help users find an importance weight on each of the criteria and organize their selection interests. This process of preference structuring is helpful for users who want to select a target from alternatives, particularly when they have uncertain preferences and have not yet detailed their needs enough to search targets using appropriate keywords. Here, the questions are how to prepare evaluation criteria beforehand and how to estimate user preferences by using these evaluation criteria. This thesis introduces aspects that represent “why the users look at items,” which provide possible v...
    While many spoken dialog systems are recently developed, users need to summarize and convey what they want the system to do clearly. However, in a human dialog, a speaker often summarize what to say incrementally, provided that there is a... more
    While many spoken dialog systems are recently developed, users need to summarize and convey what they want the system to do clearly. However, in a human dialog, a speaker often summarize what to say incrementally, provided that there is a good listener who responds to the speaker's utterances at appropriate timing. We consider that generating backchannel responses, where appropriate, overlapped with the user's utterances is crucial for an artificial listener system that can promote user's utterances since such overlaps are the norm in human dialogs. Toward the goal to realize such a listener system, in this paper, we propose a voting-based algorithm of predicting the end of utterances early (i.e., before the utterances end) using audio-visual information. In the evaluation, we demonstrate the effectiveness of using audio-visual information and the applicability of the voting-based prediction algorithm with some early results.
    While eye gaze data contain promising clues for inferring the interests of viewers of digital catalog content, viewers often dynamically switch their focus of attention. As a result, a direct application of conventional behavior analysis... more
    While eye gaze data contain promising clues for inferring the interests of viewers of digital catalog content, viewers often dynamically switch their focus of attention. As a result, a direct application of conventional behavior analysis techniques, such as topic models, tends to be affected by items or attributes of little or no interest to the viewer. To overcome this limitation, we need to identify “when” the user compares items and to detect “which attribute types/values” reflect the user’s interest. This paper proposes a novel two-step approach to addressing these needs. Specifically, we introduce a likelihood-based short-term analysis method as the first step of the approach to simultaneously determine comparison phases of browsing and detect the attributes on which the viewer focuses, even when the attributes cannot be directly obtained from gaze points. Using probabilistic latent semantic analysis, we show that this short-term analysis step greatly improves the results of th...