Research Interests:
Engineering, Computer Science, Graph Theory, Audio Signal Processing, Content Analysis, and 11 moreMusic Industry, Multidisciplinary, Collaborative Filtering, Similarity, Language Culture and Communication, Filtering, Http, Internet, Psychology and Cognitive Sciences, measurement accuracy, and Similitude
This paper proposes a notion of time complexity in splicing systems. The time complexity of a splicing system at length n is defined to be the smallest integer t such that all the words of the system having length n are produced within t... more
This paper proposes a notion of time complexity in splicing systems. The time complexity of a splicing system at length n is defined to be the smallest integer t such that all the words of the system having length n are produced within t rounds. For a function t from the set of natural numbers to itself, the class of languages with splicing system time complexity t(n) is denoted by SPLTIME[f(n)]. This paper presents fundamental properties of SPLTIME and explores its relation to classes based on standard computational models, both in terms of upper bounds and in terms of lower bounds. As to upper bounds, it is shown that for any function t(n)SPLTIME[t(n)] is included in 1-NSPACE[t(n)]; i.e., the class of languages accepted by a t(n)-space-bounded non-deterministic Turing machine with one-way input head. Expanding on this result, it is shown that 1-NSPACE[t(n)] is characterized in terms of splicing systems: it is the class of languages accepted by a t(n)-space uniform family of extend...
Research Interests:
Computational Complexity, Complexity Theory, Production, Theoretical Computer Science, Computing, and 15 moreDNA computing, DNA, Finite State Automaton, Finite Automata, Mathematical Sciences, Computer Model, Word, Splicing, Time Complexity, Turing machine, Lower Bound, Upper Bound, Height, Production System, and Stack
We show the following results regarding complete sets.NP-complete sets and PSPACE-complete sets are many-one autoreducible. Complete sets of any level of PH, MODPH, or the Boolean hierarchy over NP are many-one autoreducible. EXP-complete... more
We show the following results regarding complete sets.NP-complete sets and PSPACE-complete sets are many-one autoreducible. Complete sets of any level of PH, MODPH, or the Boolean hierarchy over NP are many-one autoreducible. EXP-complete sets are many-one mitotic. NEXP-complete sets are weakly many-one mitotic. PSPACE-complete sets are weakly Turing-mitotic. If one-way permutations and quick pseudo-random generators exist, then NP-complete languages are m-mitotic. If there is a tally language in NP ∩ coNP - P, then, for every e > 0, NP-complete sets are not 2n(1+e)-immune. These results solve several of the open questions raised by Buhrman and Torenvliet in their 1994 survey paper on the structure of complete sets.
Research Interests:
Valiant (SIAM J. Comput. 8 (1979) 410-421) showed that the problem of computing the number of simple s-t paths in graphs is #P-complete both in the case of directed graphs and in the case of undirected graphs. Welsh (Complexity: Knots,... more
Valiant (SIAM J. Comput. 8 (1979) 410-421) showed that the problem of computing the number of simple s-t paths in graphs is #P-complete both in the case of directed graphs and in the case of undirected graphs. Welsh (Complexity: Knots, Colourings and Counting, Cambridge University Press, Cambridge, 1993, p. 17) asked whether the problem of computing the number of self-avoiding walks of a given length in the complete two-dimensional grid is complete for #P1, the tally-version of #P. This paper offers a partial answer to the question of Welsh: it is #P-complete to compute the number of self-avoiding walks of a given length in a subgraph of a two-dimensional grid. Several variations of the problem are also studied and shown to be #P-complete. This paper also studies the problem of computing the number of self-avoiding walks in a subgraph of a hypercube. Similar completeness results are shown for the problem. By scaling the computation time to exponential, it is shown that computing the...
Research Interests:
Convolutional neural networks (CNNs) have been successfully applied on both discriminative and generative modeling for music-related tasks. For a particular task, the trained CNN contains information representing the decision making or... more
Convolutional neural networks (CNNs) have been successfully applied on both discriminative and generative modeling for music-related tasks. For a particular task, the trained CNN contains information representing the decision making or the abstracting process. One can hope to manipulate existing music based on this 'informed' network and create music with new features corresponding to the knowledge obtained by the network. In this paper, we propose a method to utilize the stored information from a CNN trained on musical genre classification task. The network was composed of three convolutional layers, and was trained to classify five-second song clips into five different genres. After training, randomly selected clips were modified by maximizing the sum of outputs from the network layers. In addition to the potential of such CNNs to produce interesting audio transformation, more information about the network and the original music could be obtained from the analysis of the g...
Music is not only for entertainment and for pleasure, but has been used for a wide range of purposes due to its social and physiological effects. Traditionally musical information has been retrieved and/or classified based on standard... more
Music is not only for entertainment and for pleasure, but has been used for a wide range of purposes due to its social and physiological effects. Traditionally musical information has been retrieved and/or classified based on standard reference information, such as the name of the composer and the title of the work etc. These basic pieces information will remain essential, but information retrieval based on these are far from satisfactory. Huron points out that since the preeminent functions of music are social and psychological, the most useful characterization would be based on four types of information: the style, emotion, genre, and similarity [Huron,2000].
Research Interests:
Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-sensitive hashing (LSH) is one of the most popular solution approaches for ANNS. A... more
Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-sensitive hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS-L2, a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L1 distance. Another contribution of this work is to explain why a state-of-the-art ANNS-L1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. We show that MP-RW-LSH uses 15 to 53 times fewer hash tables than CP-LSH for achieving similar query accuracies.
Research Interests:
Research Interests:
Computer Science, Data Mining, Audio Signal Processing, Taxonomy, Music Information Retrieval, and 10 moreInformation Processing, Signal Analysis, Music Genre Classification, Feature Extraction, Digital music, Internet, Hierarchical Classification, Multiple Signal Classification, Information Retrieval systems, and Confusion Matrix
Research Interests:
Research Interests:
Deficits in motor movement in children with autism spectrum disorder (ASD) have typically been characterized qualitatively by human observers. Although clinicians have noted the importance of atypical head positioning (e.g. social peering... more
Deficits in motor movement in children with autism spectrum disorder (ASD) have typically been characterized qualitatively by human observers. Although clinicians have noted the importance of atypical head positioning (e.g. social peering and repetitive head banging) when diagnosing children with ASD, a quantitative understanding of head movement in ASD is lacking. Here, we conduct a quantitative comparison of head movement dynamics in children with and without ASD using automated, person-independent computer-vision based head tracking (Zface). Because children with ASD often exhibit preferential attention to nonsocial versus social stimuli, we investigated whether children with and without ASD differed in their head movement dynamics depending on stimulus sociality. The current study examined differences in head movement dynamics in children with ( = 21) and without ASD ( = 21). Children were video-recorded while watching a 16-min video of social and nonsocial stimuli. Three dimens...
Research Interests:
Mathematics, Communication, Medical Informatics, Machine Learning, Data Mining, and 11 moreImplementation Science, California, Humans, Computer Simulation, Feasibility Studies, Diffusion of Innovation, Information Storage and Retrieval, Sensitivity and Specificity, Translational Medical Research, Records as Topic, and Medical and Health Sciences
Research Interests:
Research Interests:
ABSTRACT Making better use of the cache is important for the modern computer programs and systems. One key step is understanding the data locality. In this article, we investigate one effective and important model of data... more
ABSTRACT Making better use of the cache is important for the modern computer programs and systems. One key step is understanding the data locality. In this article, we investigate one effective and important model of data locality—reference affinity. This model is superior to previous ones because it can express whole-scale locality in a more accurate and flexible way. Traces collected from different applications are a rich source for the analysis of reference affinity. In this article, we extend strict reference affinity to weak reference affinity ...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Clustering and Hierarchy
Research Interests:
Research Interests:
Research Interests:
Research Interests:
This paper presents the PlanMine sequence mining algorithm to extract patterns of events that predict failures in databases of plan executions. New techniques were needed because previous data mining algorithms were overwhelmed by the... more
This paper presents the PlanMine sequence mining algorithm to extract patterns of events that predict failures in databases of plan executions. New techniques were needed because previous data mining algorithms were overwhelmed by the staggering number of very frequent, but entirely unpredictive patterns that exist in the plan database. This paper combines several techniques for pruning out unpredictive and redundant patterns which reduce the size of the returned rule set by more than three orders of magnitude. PlanMine has also been fully integrated into two real-world planning systems. We experimentally evaluate the rules discovered by PlanMine, and show that they are extremely useful for understanding and improving plans, as well as for building monitors that raise alarms before failures happen. Keywords: Sequence Mining, Predicting Plan Failures, Plan Monitoring The University of Rochester Computer Science Department supported this work. Supported by NSF grants CCR-9705594, CC...
Introduction: Fidelity monitoring and feedback are critical components for implementing effective preventive interventions. However, these are also resource intensive for the research team even in efficacy/effectiveness trials, and... more
Introduction: Fidelity monitoring and feedback are critical components for implementing effective preventive interventions. However, these are also resource intensive for the research team even in efficacy/effectiveness trials, and generally too expensive to maintain when the program is implemented fully within the host institution or community. A reasonable goal is to reduce the cost of obtaining reliable and valid fidelity ratings for behavioral intervention programs by an order of magnitude or more. We present a proof of concept of a computational method that measures fidelity in the Familias Unidas intervention. Familias Unidas is a family-focused prevention intervention targeting externalizing behaviors among Hispanic youth. It is delivered by school counselors in Spanish to parents and in the home in English and Spanish. Our work uses speech analysis, knowledge engineering, and computational linguistics to measure fidelity. Methods/ Results: The existing method for rating fide...
Abstract Classification algorithms are difficult to apply to sequential examples because there is a vast number of potentially useful features for describing each example. Past work on feature selection has focused on searching the space... more
Abstract Classification algorithms are difficult to apply to sequential examples because there is a vast number of potentially useful features for describing each example. Past work on feature selection has focused on searching the space of all subsets of features, which is intractable for large feature sets. We adapt sequence mining techniques to aEi as a preprocessor to select features for standard classification algorithms such as Naive Bayes and Winnow. Our experiments on three different datasets show that the features produced ...
Research Interests:
A typical behavioral intervention requires recording the audio or video of the session lead by an intervention agent (facilitator). These recordings offer valuable information for training, supervision, and analysis of intervention... more
A typical behavioral intervention requires recording the audio or video of the session lead by an intervention agent (facilitator). These recordings offer valuable information for training, supervision, and analysis of intervention impact. Yet, they are costly and resource intensive. During efficacy and effectiveness trials, typically more than 90% of the sessions recordings are not coded for fidelity. During implementation of prevention programs, even less sessions are coded for fidelity, and often the required procedures for recording the session and assessing fidelity becomes unsustainable due to cost and time. We propose a way to use computational methods to analyze fidelity automatically. These methods provide promise that enables local organizations to better implement carefully tested evidence-based programs. Our method analyzes the communication between of the facilitator and at least one individual who is the target of the intervention. We demonstrate how these method can u...
Abstract We proposed a method to classify songs in the Million Song Dataset according to song genre. Since songs have several data types, we trained sub-classifiers by different types of data. These sub-classifiers are combined using both... more
Abstract We proposed a method to classify songs in the Million Song Dataset according to song genre. Since songs have several data types, we trained sub-classifiers by different types of data. These sub-classifiers are combined using both classifier authority and classification confidence for a particular instance. In the experiments, the combined classifier surpasses all of these sub-classifiers and the SVM classifier using concatenated vectors from all data types. Finally, the genre labels for the Million Song Dataset are provided.
Research Interests:
Research Interests:
Purpose – Library data are often hard to analyze because these data come from unconnected sources, and the data sets can be very large. Furthermore, the desire to protect user privacy has prevented the retention of data that could be used... more
Purpose – Library data are often hard to analyze because these data come from unconnected sources, and the data sets can be very large. Furthermore, the desire to protect user privacy has prevented the retention of data that could be used to correlate library data to non-library data. The research team used data mining to determine library use patterns and to determine whether library use correlated to students’ grade point average. Design/methodology/approach – A research team collected and analyzed data from the libraries, registrar and human resources. All data sets were uploaded into a single, secure data warehouse, allowing them to be analyzed and correlated. Findings – The analysis revealed patterns of library use by academic department, patterns of book use over 20 years and correlations between library use and grade point average. Research limitations/implications – Analysis of more narrowly defined user populations and collections will help develop targeted outreach efforts...
Research Interests:
Research Interests:
Research Interests:
Abstract Social tags have been acknowledged as a highly useful resource in retrieving music by moods or topics. However, since social tags are open for labeling, some social tags are inaccurate. In this paper, we present a new framework... more
Abstract Social tags have been acknowledged as a highly useful resource in retrieving music by moods or topics. However, since social tags are open for labeling, some social tags are inaccurate. In this paper, we present a new framework to identify accurate social tags of songs. In our framework, we first clean and filter music tags. Then we apply an improved hierarchical clustering algorithm to group the tags to build a tag category. Based on the category, we classify music songs using lyrics. In order to extend the semantic ...