Skip to main content

    Chin-Wan Chung

    This paper investigates the MaxRS problem in spatial databases. Given a set O of weighted points and a rectangu-lar region r of a given size, the goal of the MaxRS problem is to find a location of r such that the sum of the weights of all... more
    This paper investigates the MaxRS problem in spatial databases. Given a set O of weighted points and a rectangu-lar region r of a given size, the goal of the MaxRS problem is to find a location of r such that the sum of the weights of all the points covered by r is maximized. This problem is use-ful in many location-based applications such as finding the best place for a new franchise store with a limited delivery range and finding the most attractive place for a tourist with a limited reachable range. However, the problem has been studied mainly in theory, particularly, in computational ge-ometry. The existing algorithms from the computational geometry community are in-memory algorithms which do not guarantee the scalability. In this paper, we propose a scalable external-memory algorithm (ExactMaxRS) for the MaxRS problem, which is optimal in terms of the I/O com-plexity. Furthermore, we propose an approximation algo-rithm (ApproxMaxCRS) for the MaxCRS problem that is a circle vers...
    This paper describes a decision tree model and 3-dimensional representation of information retrieved from various weblogs in relation to argumentative logics. The weblogs are considered as datasets that show significant correlations... more
    This paper describes a decision tree model and 3-dimensional representation of information retrieved from various weblogs in relation to argumentative logics. The weblogs are considered as datasets that show significant correlations between the queries applied to them. We have extracted a compact set of rules to support the dataset with the queries and employed effective evaluation metrics to evaluate the weighted average of the weblogs categorized into different types. The opinions from the weblogs are retrieved and represented as an object oriented 3-Dimensional system. The goal of our approach is to generate rules from rough sets and to represent them in a 3-dimensional interactive program, Blog Cosmos. We used rough set theory as a candidate framework for query refinement.
    The learning-enhanced relevance feedback has been one of the most active research areas in content-based image re-trieval in recent years. However, few methods using the rel-evance feedback are currently available to process relatively... more
    The learning-enhanced relevance feedback has been one of the most active research areas in content-based image re-trieval in recent years. However, few methods using the rel-evance feedback are currently available to process relatively complex queries on large image databases. In the case of complex image queries, the feature space and the distance function of the user’s perception are usually different from those of the system. This difference leads to the represen-tation of a query with multiple clusters (i.e., regions) in the feature space. Therefore, it is necessary to handle disjunc-tive queries in the feature space. In this paper, we propose a new content-based image retrieval method using adaptive classification and cluster-merging to find multiple clusters of a complex image query. When the measures of a retrieval method are invariant under linear transformations, the method can achieve the same re-trieval quality regardless of the shapes of clusters of a query. Our method a...
    With the advances in multimedia databases on the World Wide Web, it becomes more important to provide users with the search capability of distributed multimedia data. While there have been many studies about the database selection and the... more
    With the advances in multimedia databases on the World Wide Web, it becomes more important to provide users with the search capability of distributed multimedia data. While there have been many studies about the database selection and the collection fusion for text databases. The multimedia databases on the Web have autonomous and heterogeneous properties and they use mainly the content based retrieval. The collection fusion problem of multimedia databases is concerned with the merging of results retrieved by content based retrieval from heterogeneous multimedia databases on the Web. This problem is crucial for the search in distributed multimedia databases, however, it has not been studied yet. This paper provides novel algorithms for processing the collection fusion of heterogeneous multimedia databases on the Web. We propose two heuristic algorithms for estimating the number of objects to be retrieved from local databases and an algorithm using the linear regression. Extensive ex...
    The betweenness centrality is a measure for the relative participation of the vertex in the shortest paths in the graph. In many cases, we are interested in the k-highest betweenness centrality vertices only rather than all the vertices... more
    The betweenness centrality is a measure for the relative participation of the vertex in the shortest paths in the graph. In many cases, we are interested in the k-highest betweenness centrality vertices only rather than all the vertices in a graph. In this paper, we study an efficient algorithm for finding the exact k-highest betweenness centrality vertices.
    With respect to the Semantic Web proposed to overcome the limitation of the Web, OWL has been recommended as the ontology language used to give a well-defined meaning to diverse data. OWL is the representative ontology language suggested... more
    With respect to the Semantic Web proposed to overcome the limitation of the Web, OWL has been recommended as the ontology language used to give a well-defined meaning to diverse data. OWL is the representative ontology language suggested by W3C. An efficient retrieval of OWL data requires a well-constructed storage schema. In this paper, we propose a storage schema construction technique which supports more efficient query processing. A retrieval technique corresponding to the proposed storage schema is also introduced. OWL data includes inheritance information of classes and properties. When OWL data is extracted, hierarchy information should be considered. For this reason, an additional XML document is created to preserve hierarchy information and stored in an XML database system. An existing numbering scheme is utilized to extract ancestor/descendent relationships, and order information of nodes is added as attribute values of elements in an XML document. Thus, it is possible to ...
    Content Based Image Retrieval (CBIR) is to store and retrieve images using the feature description of image contents. In order to support more accurate image retrieval, it has become necessary to develop features that can effectively... more
    Content Based Image Retrieval (CBIR) is to store and retrieve images using the feature description of image contents. In order to support more accurate image retrieval, it has become necessary to develop features that can effectively describe image contents. The commonly used low-level features, such as color, texture, and shape features may not be directly mapped to human visual perception. In addition, such features cannot effectively describe a single image that contains multiple objects of interest. As a result, the research on feature descriptions has shifted to focus on higher-level features, which support representations more similar to human visual perception like spatial relationships between objects. Nevertheless, the prior works on the representation of spatial relations still have shortcomings, particularly with respect to supporting rotational invariance, Rotational invariance is a key requirement for a feature description to provide robust and accurate retrieval of ima...
    Efficient query processing for complex spatial objects is one of the most challenging requirements in non-traditional applications such as geographic information systems, computer-aided design, and multimedia databases. The performance of... more
    Efficient query processing for complex spatial objects is one of the most challenging requirements in non-traditional applications such as geographic information systems, computer-aided design, and multimedia databases. The performance of spatial query processing can be improved by decomposing a complex object into a small number of simple components. This paper investigates the natural trade-off between the number and the complexity of decomposed components. In particular, we propose a new object decomposition method that can control the number of components using a parameter. This method enables the user to select the optimal trade-off by controlling the parameter. The proposed method is compared with traditional decomposition methods by an analytical study and experimental measurements. These comparisons show that our decomposition method outperforms traditional decomposition methods.
    Research Interests:
    We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reasoning. ONTOMS2 stores an OWL document and processes OWL-QL and SPARQL queries. Especially, ONTOMS2 supports SPARQL Update queries with an... more
    We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reasoning. ONTOMS2 stores an OWL document and processes OWL-QL and SPARQL queries. Especially, ONTOMS2 supports SPARQL Update queries with an incremental instance reasoning of inverseOf, symmetric and transitive properties.
    This article studies I/O-efficient algorithms for the triangle listing problem and the triangle counting problem , whose solutions are basic operators in dealing with many other graph problems. In the former problem, given an undirected... more
    This article studies I/O-efficient algorithms for the triangle listing problem and the triangle counting problem , whose solutions are basic operators in dealing with many other graph problems. In the former problem, given an undirected graph G , the objective is to find all the cliques involving 3 vertices in G . In the latter problem, the objective is to report just the number of such cliques without having to enumerate them. Both problems have been well studied in internal memory, but still remain as difficult challenges when G does not fit in memory, thus making it crucial to minimize the number of disk I/Os performed. Although previous research has attempted to tackle these challenges, the state-of-the-art solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically dif...
    Research Interests:
    ABSTRACT This article presents a novel type of queries in spatial databases, called the direction-aware bichromatic reverse k nearest neighbor(DBRkNN) queries, which extend the bichromatic reverse nearest neighbor queries. Given two... more
    ABSTRACT This article presents a novel type of queries in spatial databases, called the direction-aware bichromatic reverse k nearest neighbor(DBRkNN) queries, which extend the bichromatic reverse nearest neighbor queries. Given two disjoint sets, P and S, of spatial objects, and a query object q in S, the DBRkNN query returns a subset P′ of P such that k nearest neighbors of each object in P′ include q and each object in P′ has a direction toward q within a pre-defined distance. We formally define the DBRkNN query, and then propose an efficient algorithm, called DART, for processing the DBRkNN query. Our method utilizes a grid-based index to cluster the spatial objects, and the B+-tree to index the direction angle. We adopt a filter-refinement framework that is widely used in many algorithms for reverse nearest neighbor queries. In the filtering step, DART eliminates all the objects that are away from the query object more than a pre-defined distance, or have an invalid direction angle. In the refinement step, remaining objects are verified whether the query object is actually one of the k nearest neighbors of them. As a major extension of DART, we also present an improved algorithm, called DART+, for DBRkNN queries. From extensive experiments with several datasets, we show that DART outperforms an R-tree-based naive algorithm in both indexing time and query processing time. In addition, our extension algorithm, DART+, also shows significantly better performance than DART.
    In the maximizing range sum (MaxRS) problem, given (i) a set P of 2D points each of which is associated with a positive weight, and (ii) a rectangle r of specific extents, we need to decide where to place r in order to maximize the... more
    In the maximizing range sum (MaxRS) problem, given (i) a set P of 2D points each of which is associated with a positive weight, and (ii) a rectangle r of specific extents, we need to decide where to place r in order to maximize the covered weight of r - that is, the total weight of the data points covered by r . Algorithms solving the problem exactly entail expensive CPU or I/O cost. In practice, exact answers are often not compulsory in a MaxRS application, where slight imprecision can often be comfortably tolerated, provided that approximate answers can be computed considerably faster. Motivated by this, the present paper studies the (1 - ε)-approximate MaxRS problem, which admits the same inputs as MaxRS, but aims instead to return a rectangle whose covered weight is at least (1-ε) m *, where m * is the optimal covered weight, and ε can be an arbitrarily small constant between 0 and 1. We present fast algorithms that settle this problem with strong theoretical guarantees.
    Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, research on compressors for XML data has... more
    Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, research on compressors for XML data has been conducted. Some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Existing XML compressors do not provide the facility for updates on compressed XML data.In this article, we propose XPRESS, an XML compressor which supports direct updates and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method called reverse arithmetic encoding , which encodes label paths of XML data and applies diverse encoding methods depending on the types of data values. Experimental results with real-life data sets show that XPRESS achieves significant improvements on query performance for ...
    ABSTRACT This article studies the MaxRS problem in spatial databases. Given a set O of weighted points and a rectangle r of a given size, the goal of the MaxRS problem is to find a location of r such that the sum of the weights of all the... more
    ABSTRACT This article studies the MaxRS problem in spatial databases. Given a set O of weighted points and a rectangle r of a given size, the goal of the MaxRS problem is to find a location of r such that the sum of the weights of all the points covered by r is maximized. This problem is useful in many location-based services such as finding the best place for a new franchise store with a limited delivery range and finding the hotspot with the largest number of nearby attractions for a tourist with a limited reachable range. However, the problem has been studied mainly in the theoretical perspective, particularly in computational geometry. The existing algorithms from the computational geometry community are in-memory algorithms that do not guarantee the scalability. In this article, we propose a scalable external-memory algorithm (ExactMaxRS) for the MaxRS problem that is optimal in terms of the I/O complexity. In addition, we propose an approximation algorithm (ApproxMaxCRS) for the MaxCRS problem that is a circle version of the MaxRS problem. We prove the correctness and optimality of the ExactMaxRS algorithm along with the approximation bound of the ApproxMaxCRS algorithm. Furthermore, motivated by the fact that all the existing solutions simply assume that there is no tied area for the best location, we extend the MaxRS problem to a more fundamental problem, namely AllMaxRS, so that all the locations with the same best score can be retrieved. We first prove that the AllMaxRS problem cannot be trivially solved by applying the techniques for the MaxRS problem. Then we propose an output-sensitive external-memory algorithm (TwoPhaseMaxRS) that gives the exact solution for the AllMaxRS problem through two phases. Also, we prove both the soundness and completeness of the result returned from TwoPhaseMaxRS. From extensive experimental results, we show that ExactMaxRS and ApproxMaxCRS are several orders of magnitude faster than methods adapted from existing algorithms, the approximation bound in practice is much better than the theoretical bound of ApproxMaxCRS, and TwoPhaseMaxRS is not only much faster but also more robust than the straightforward extension of ExactMaxRS.
    1. ~STMCT ~ls paper presents the object-oriented image retrieval mechanism ~vhlchprovides the content model, the indexing scheme, and tie query processing techniques m a whole. Three types of image content i.e., visual features, semantic... more
    1. ~STMCT ~ls paper presents the object-oriented image retrieval mechanism ~vhlchprovides the content model, the indexing scheme, and tie query processing techniques m a whole. Three types of image content i.e., visual features, semantic features, and keywords, are define~ and they are repre- sented using the object-oriented data model. Three types of index structures corresponding to the identied features are

    And 90 more