Skip to main content
O. Schreer

    O. Schreer

    RUSHES—an annotation and retrieval engine for multimedia semantic units
    In this work, a novel and fast algorithm for real-time 3D body reconstruction from stereo sequences is proposed. The main contributions of this work consist of a novel approach for a statistically guided stereo processing and a data... more
    In this work, a novel and fast algorithm for real-time 3D body reconstruction from stereo sequences is proposed. The main contributions of this work consist of a novel approach for a statistically guided stereo processing and a data parallel iteration scheme for 3D estimation that includes temporal predecessors from a local spatial neighborhood. A purely GPU based implementation is provided that exhibits a nearly linear scaling of the runtime with respect to the number of GPUs. This leads to an inherent sub-pixel processing due to the availability of hardware supported texture lookups. Our implementation is able to process 4K (UHD) stereo streams on a 4×4 grid with 30 fps on a single state-of-the-art consumer graphics card. The algorithmic performance of our approach is demonstrated in the context of an immersive TV application.
    Up until now, TV has been a one-to-many proposition apart from a few exceptions. The TV Stations produced and packaged their shows and the consumers had to tune in at a specific time to watch their favourite show. However, new... more
    Up until now, TV has been a one-to-many proposition apart from a few exceptions. The TV Stations produced and packaged their shows and the consumers had to tune in at a specific time to watch their favourite show. However, new technologies are changing the way we watch and produce television programs. For example, viewers often use second screen applications and are engaged in lively discussions via social media channels while watching TV. Nevertheless, immediate live interaction with broadcast media is still not possible. In this paper, the latest results of the European funded project ACTION-TV, which is developing novel forms of user interaction based on advanced Computer Vision and Mixed-Reality technologies, are presented. The aim of this research project is to let viewers actively participate in pre-produced live-action television shows. This expands the horizon of the interactive television concept towards engaging television shows. The paper explains the concept, challenges ...
    Annotation of digital recordings in humanities research still is, to a large extend, a process that is performed manually. This paper describes the first pattern recognition based software components developed in the AVATecH project and... more
    Annotation of digital recordings in humanities research still is, to a large extend, a process that is performed manually. This paper describes the first pattern recognition based software components developed in the AVATecH project and their integration in the annotation tool ELAN. ...
    This survey paper discusses the 3D image processing challenges posed by present and future immersive telecommunications, especially immersive video conferencing and television. We introduce the concepts of presence, immersion and... more
    This survey paper discusses the 3D image processing challenges posed by present and future immersive telecommunications, especially immersive video conferencing and television. We introduce the concepts of presence, immersion and co-presence, and discuss their relation with virtual collaborative environments in the context of communications. Several examples are used to illustrate the current state of the art. We highlight the crucial need of real-time, highly realistic video with adaptive viewpoint for future, immersive communications, and identify calibration, multiple view analysis, tracking, and view synthesis as the fundamental image processing modules addressing such need. For each topic, we sketch the basic problem and representative solutions from the image processing literature.
    Research Interests:
    Recent advances in volumetric capture technology have started to enable the creation of high-quality 3D video content for free-viewpoint rendering on VR and AR glasses. This allows highly immersive viewing experiences, which are currently... more
    Recent advances in volumetric capture technology have started to enable the creation of high-quality 3D video content for free-viewpoint rendering on VR and AR glasses. This allows highly immersive viewing experiences, which are currently limited to experiencing pre-recorded content. However, for an immersive experience, interaction with virtual humans plays an important role. In this paper, we address interactive applications of free-viewpoint volumetric video and present a new framework for the creation of interactive volumetric video content of humans as well as real-time rendering and streaming. Re-animation and alteration of an actor’s performance captured in a volumetric studio becomes possible through semantic enrichment of the captured data and new hybrid geometryand video-based animation methods that allow a direct animation of the high-quality data itself instead of creating an animatable model that resembles the captured data. As interactive content presents new challenge...
    ABSTRACT Unedited audio¿visual footage known as rushes shares many features with general¿purpose multimedia data, but it also shows special characteristics. Rushes are often single¿shot sequences at a single location, sparsely edited,... more
    ABSTRACT Unedited audio¿visual footage known as rushes shares many features with general¿purpose multimedia data, but it also shows special characteristics. Rushes are often single¿shot sequences at a single location, sparsely edited, with repetitive content and soundtrack is frequently irrelevant. This leads to additional challenges beyond the existing ones in multimedia indexing and retrieval. The joint effort of a number of research groups all over the world resulted in a ‘rushes exploitation’ task in the TRECVID video analysis international benchmark organized this year. In addition, the European FP6 project RUSHES is fully dedicated to research and development of a system for indexing, accessing and delivering raw, unedited audio¿visual footage and to enable indexing, search and retrieval of rushes archives to ease in¿house postproduction or reuse in a media professional environment. In this special session, latest research results on indexing, search and retrieval with focus on raw unedited audiovisual content are presented.
    ABSTRACT We identify why current teleconferencing systems tend to be rigid system specific solutions. We present a flexible teleconferencing system. This is both scalable: depending on the display size and number of conferees,... more
    ABSTRACT We identify why current teleconferencing systems tend to be rigid system specific solutions. We present a flexible teleconferencing system. This is both scalable: depending on the display size and number of conferees, axe-parallel narrow-baseline stereo camera pairs for image capture and disparity estimation can be added or removed, and modular: the multi-view synthesis process being dynamic enough to produce virtual views from an N camera set-up.
    This paper presents a novel, real-time disparity algorithm developed for immersive teleconferencing. The algorithm combines the Census transform with a hybrid blockand pixelrecursive matching scheme. Computational effort is minimised by... more
    This paper presents a novel, real-time disparity algorithm developed for immersive teleconferencing. The algorithm combines the Census transform with a hybrid blockand pixelrecursive matching scheme. Computational effort is minimised by the efficient selection of a small number of candidate vectors, guaranteeing both spatial and temporal consistency of disparities. The latter aspect is crucial for 3-D videoconferencing applications, where novel views of the remote conferees must be synthesised with the correct motion parallax. This application requires video processing at ITU-Rec. 601 resolution. The algorithm generates disparity maps in real time and for both directions (left-to-right and right-to-left) on a Pentium III, 800 MHz processor with good quality.
    We introduce a new concept of an immersive 3D video-conference system and immersive tele-collaboration. Based on the principle of a shared virtual table environment (SVTE), the key features of the new system concept are explained and... more
    We introduce a new concept of an immersive 3D video-conference system and immersive tele-collaboration. Based on the principle of a shared virtual table environment (SVTE), the key features of the new system concept are explained and compared to the other SVTE approaches like tele-cubicles. Furthermore we present a complete system architecture based on MPEG-4 technology and discuss first implementations.
    Abstract The paper discusses an advanced approach for 3DTV services that is based on the concept of an Ntimes video-plus-depth data representation. It particularly considers aspects of interoperability, scalability, and adaptability for... more
    Abstract The paper discusses an advanced approach for 3DTV services that is based on the concept of an Ntimes video-plus-depth data representation. It particularly considers aspects of interoperability, scalability, and adaptability for the case that different multi-baseline ...
    This paper introduces newly developed hardware components and configuration aspects for a complex video-processing system based on PCI architectures. Originally designed for an immersive videoconference system, the modular, scalable and... more
    This paper introduces newly developed hardware components and configuration aspects for a complex video-processing system based on PCI architectures. Originally designed for an immersive videoconference system, the modular, scalable and cost efficient multi-processor structure can also be applied to other demanding signal processing applications. Announcing the videoconference system, we derive and explain the structure of a PCI subsystem based on four TriMedia processors and an open interface architecture. First application results of the introduced components are presented regarding PCI compatibility, processing power as well as the system architecture for an immersive videoconference system.
    Abstract A new approach for real-time shadow detection and elimination is presented. In contrast to existing methods, hue and saturation is approximated in the YUV-colour space straight away. We show the linear influence of shadow to the... more
    Abstract A new approach for real-time shadow detection and elimination is presented. In contrast to existing methods, hue and saturation is approximated in the YUV-colour space straight away. We show the linear influence of shadow to the YUV-values and exploit this ...
    ABSTRACT In future 3D videoconferencing systems, depth estimation is required to support autostereoscopic displays and even more important, to provide eye contact. Real-time D video processing is currently possible, but within some... more
    ABSTRACT In future 3D videoconferencing systems, depth estimation is required to support autostereoscopic displays and even more important, to provide eye contact. Real-time D video processing is currently possible, but within some limits. Since traditional CPU centred sub-pixel disparity estimation is computationally expensive, the depth resolution of fast stereo approaches is directly linked to pixel quantization and the selected stereo baseline. In this work we present a novel, highly parallelizable algorithm that is capable of dealing with arbitrary depth resolutions while avoiding texture interpolation related runtime penalties by application of GPU centred design. The cornerstone of our patch sweeping approach is the fusion of space sweeping and patch based 3D estimation techniques. Especially for narrow baseline multi-camera configurations, as commonly used for 3D videoconferencing systems (e.g. [1]), it preserves the strengths of both techniques and avoid their shortcomings at the same time. Moreover, we provide a sophisticated parameterization and quantization scheme that establishes a very good scalability of our algorithm in terms of computation time and depth estimation quality. Furthermore, we present an optimized CUDA implementation for a multi GPU setup in a cluster environment. For each GPU, it performs three pair wise high quality depth estimations for a trifocal narrow baseline camera configuration on a 256x256 image block within real-time.
    The interest in immersive 3D video conference systems exists now for many years from both sides, the commercialization point of view as well as from a research perspective. Still, one of the major bottlenecks in this context is the... more
    The interest in immersive 3D video conference systems exists now for many years from both sides, the commercialization point of view as well as from a research perspective. Still, one of the major bottlenecks in this context is the computational complexity of the required algorithmic modules. This paper discusses this problem from a hardware point of view. We use new
    ABSTRACT Multi-view camera calibration is an essential task in the filed of 3D reconstruction which holds especially for immersive media applications like 3D videocommunication. Although the problem of multi-view calibration is basically... more
    ABSTRACT Multi-view camera calibration is an essential task in the filed of 3D reconstruction which holds especially for immersive media applications like 3D videocommunication. Although the problem of multi-view calibration is basically solved, there is still space to improve the calibration process and to increase the accuracy during acquisition of calibration patterns. It is commonly known that robust and accurate calibration requires feature points that are equally distributed in 3D space covering the whole volume of interest. In this paper, we propose a user guided calibration based on a graphical user interface, which drastically simplifies the correct acquisition of calibration patterns. Based on an optimized selection of patterns and their corresponding feature points, the multi-view calibration becomes much faster in terms of data acquisition as well as computational effort by reaching the same accuracy with standard unguided acquisitions of calibration pattern.
    Research Interests:
    ABSTRACT The application of D scene reconstruction techniques in the area of automatic semantic annotation, search and retrieval of unedited video footage has become an interesting field of research for some specific type of video... more
    ABSTRACT The application of D scene reconstruction techniques in the area of automatic semantic annotation, search and retrieval of unedited video footage has become an interesting field of research for some specific type of video content. Usually static key-frames extracted from a sequence of images are analyzed in order to annotate the content. In the case of a moving camera, the temporal properties of the video can be exploited as well. Based on state of the art camera self calibration techniques a powerful analysis chain has been developed, which allows annotation with regard to specific properties of the 3D scene structure. It is demonstrated, that the reconstructed 3D scene information can be used to generate, accurate low level scene descriptors as well as meaningful medium and high level semantic information. The specific frame-based properties of the triangulated 3D scene contain a lot of potential for semantic annotation, which goes beyond standard 2D scene descriptors.
    Research Interests:
    ABSTRACT In this paper we discuss the application of 3D scene reconstruction techniques in the area of automatic semantic annotation, search and retrieval of unedited video footage. Rather than working with static key-frames we exploit... more
    ABSTRACT In this paper we discuss the application of 3D scene reconstruction techniques in the area of automatic semantic annotation, search and retrieval of unedited video footage. Rather than working with static key-frames we exploit the time-depended dynamic properties of a moving camera. Based on state of the art camera self calibration techniques we develop a powerful analysis chain. We demonstrate, that the reconstructed 3D scene information can be used to generate both, accurate low level scene descriptors as well as meaningful medium and high level semantic information. We show, that the proposed algorithms work even in case of sparse data sets. The proposed algorithms provide a powerful working base for further investigations in the area of low, medium and high level extraction of semantic information for unedited video.
    Research Interests:
    Research Interests:
    Research Interests:
    The interest in immersive 3D video conference systems exists now for many years from both sides, the commercialization point of view as well as from a research perspective. Still, one of the major bottlenecks in this context is the... more
    The interest in immersive 3D video conference systems exists now for many years from both sides, the commercialization point of view as well as from a research perspective. Still, one of the major bottlenecks in this context is the computational complexity of the required algorithmic modules. This paper discusses this problem from a hardware point of view. We use new
    Research Interests:
    This paper introduces newly developed hardware components and configuration aspects for a complex video-processing system based on PCI (Peripheral Component Interconnect) architectures. Originally designed for an immersive videoconference... more
    This paper introduces newly developed hardware components and configuration aspects for a complex video-processing system based on PCI (Peripheral Component Interconnect) architectures. Originally designed for an immersive videoconference system, the modular, scalable and cost efficient multi-processor structure can also be applied to other demanding signal processing applications. Announcing the videoconference system, the structure of a PCI subsystem based on four
    ABSTRACT This paper presents a novel real-time approach for robust high precision and high quality depth estimation. It extends recent work on real-time Patch-Sweeping by combining the advantages of a robust hybrid stereo-based disparity... more
    ABSTRACT This paper presents a novel real-time approach for robust high precision and high quality depth estimation. It extends recent work on real-time Patch-Sweeping by combining the advantages of a robust hybrid stereo-based disparity estimator with the high accuracy of the Patch-Sweeping approach. It overcomes limitations of the existing Patch-Sweep approach, such as limited search range. Further, it implicitly benefits from the high robustness as well as time consistency of the disparity estimator. The presented overall algorithmic system concept introduces a powerful alternative to traditional real-time depth estimation approaches. Additionally, the proposed algorithmic structures allow a high degree of parallelization. Based on this, the computational effort could be efficiently balanced between GPU and CPU processing. The target platform of the proposed algorithmic chain is a real-time immersive D video communication system which requires highly accurate 3D estimation results for a high quality virtual eye contact generation.
    Evolution and changes of all modern languages is a well-known fact. However, recently it is reaching dynamics never seen before, which results in loss of the vast amount of information encoded in every language. In order to preserve such... more
    Evolution and changes of all modern languages is a well-known fact. However, recently it is reaching dynamics never seen before, which results in loss of the vast amount of information encoded in every language. In order to preserve such rich heritage, and to carry out linguistic research, properly annotated recordings of world languages are necessary. Since creating those annotations is a very laborious task, reaching times 100 longer than the length of the annotated media, innovative video processing algorithms are needed, in order to improve the efficiency and quality of annotation process. This is the scope of the AVATecH project presented in this article.
    Research Interests:
    ... The first full-length 3D movie was then presented in Los Angeles in 1922, and, in 1928, John Logie Baird applied the ... and viewing conditions as well as related human factor aspects [17] had to be taken into account by the camera... more
    ... The first full-length 3D movie was then presented in Los Angeles in 1922, and, in 1928, John Logie Baird applied the ... and viewing conditions as well as related human factor aspects [17] had to be taken into account by the camera man during shooting like Panum's ...
    ... Serap Askar, Peter Kauff, Nicole Brandenburg, Oliver Schreer ... 3] J.-R. Ohm, “Bildverarbeitung 11“, Skript, TU-Berlin, Institut fiir Nachrichtentechnik und Theoretische Elektrotechnik, 1999 [4]... more
    ... Serap Askar, Peter Kauff, Nicole Brandenburg, Oliver Schreer ... 3] J.-R. Ohm, “Bildverarbeitung 11“, Skript, TU-Berlin, Institut fiir Nachrichtentechnik und Theoretische Elektrotechnik, 1999 [4] http://www.pcigeomatics.com/cgi-bin/pcihlp/REGlBACKGROUND [5] http://www.cee.hw.ac ...

    And 33 more