Skip to main content
    Packet header traces are widely used in network analysis. Header traces are the aggregate of traffic from many concurrent applications. We present a methodology, based on machine learning, that can break the trace down into clusters of... more
    Packet header traces are widely used in network analysis. Header traces are the aggregate of traffic from many concurrent applications. We present a methodology, based on machine learning, that can break the trace down into clusters of traffic where each cluster has different traffic characteristics. Typical clusters include bulk transfer, single and multiple transactions and interactive traffic, amongst others. The paper includes a description of the methodology, a visualisation of the attribute statistics that aids in recognising cluster types and a discussion of the stability and effectiveness of the methodology.
    Data Mining: Practical Machine Learning Tools and ... Witten and Frank's textbook was one of two books that 1 used for a data mining class in the Fall of 2001. ...
    More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been re- written entirely from scratch, evolved substantially and now accompanies a text on data mining (35). These days, WEKA... more
    More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been re- written entirely from scratch, evolved substantially and now accompanies a text on data mining (35). These days, WEKA enjoys widespread acceptance in both academia and busi- ness, has an active community, and has been downloaded more than 1.4 million
    Association rule mining is a data mining technique that reveals interesting relationships in a database. Existing approaches employ different parameters to search for interesting rules. This fact and the large number of rules make it... more
    Association rule mining is a data mining technique that reveals interesting relationships in a database. Existing approaches employ different parameters to search for interesting rules. This fact and the large number of rules make it difficult to compare the output of confidence ...
    ABSTRACT Inducing classifiers that make accurate predictions on future data is a driving force for research in inductive learning. However, also of importance to the users is how to gain information from the models produced.... more
    ABSTRACT Inducing classifiers that make accurate predictions on future data is a driving force for research in inductive learning. However, also of importance to the users is how to gain information from the models produced. Unfortunately, some of the most powerful inductive learning algorithms generate “black boxes”—that is, the representation of the model makes it virtually impossible to gain any insight into what has been learned. This paper presents a technique that can help the user understand why a classifier makes the predictions that it does by providing a two-dimensional visualization of its class probability estimates. It requires the classifier to generate class probabilities but most practical algorithms are able to do so (or can be modified to this end).
    ABSTRACT The much-publicized Netflix competition has put the spotlight on the application domain of collaborative filtering and has sparked interest in machine learning algorithms that can be applied to this sort of problem. The demanding... more
    ABSTRACT The much-publicized Netflix competition has put the spotlight on the application domain of collaborative filtering and has sparked interest in machine learning algorithms that can be applied to this sort of problem. The demanding nature of the Netflix data has lead to some interesting and ingenious modifications to standard learning methods in the name of efficiency and speed. There are three basic methods that have been applied in most approaches to the Netflix problem so far: stand-alone neighborhood-based methods, latent factor models based on singular-value decomposition, and ensembles consisting of variations of these techniques. In this paper we investigate the application of forward stage-wise additive modeling to the Netflix problem, using two regression schemes as base learners: ensembles of weighted simple linear regressors and k-means clustering—the latter being interpreted as a tool for multi-variate regression in this context. Experimental results show that our methods produce competitive results.
    ABSTRACT
    Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. We address this issue by using the... more
    Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. We address this issue by using the AIC criterion [1] instead of cross-...
    Research Interests:
    ABSTRACT={Profile Hidden Markov Models (PHMMs) have been widely used as models for Multiple Sequence Alignments. By their nature, they are generative one-class classifiers trained only on sequences belonging to the target class they... more
    ABSTRACT={Profile Hidden Markov Models (PHMMs) have been widely used as models for Multiple Sequence Alignments. By their nature, they are generative one-class classifiers trained only on sequences belonging to the target class they represent. Nevertheless, they ...
    We investigate a simple semi-naive Bayesian ranking method that combines naive Bayes with induction of decision tables. Naive Bayes and decision tables can both be trained effi- ciently, and the same holds true for the combined semi-naive... more
    We investigate a simple semi-naive Bayesian ranking method that combines naive Bayes with induction of decision tables. Naive Bayes and decision tables can both be trained effi- ciently, and the same holds true for the combined semi-naive model. We show that the resulting ranker, compared to ei- ther component technique, frequently significantly increases AUC. For some datasets it significantly improves on both techniques. This is also the case when attribute selection is performed in naive Bayes and its semi-naive variant.
    Research Interests:
    The Weka workbench is an organized collection of state-of-the-art machine lear-ning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient... more
    The Weka workbench is an organized collection of state-of-the-art machine lear-ning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient interactive graphical user ...
    ... Vazirgiannis 1. Data Pre-processing and Quality Assessment 663 2. Evaluation of Classification Methods 664 3. Association Rules 671 4. Cluster Validity 675 References 694 31 Data Mining Model Comparison 697 Paolo Giudici 1. Data... more
    ... Vazirgiannis 1. Data Pre-processing and Quality Assessment 663 2. Evaluation of Classification Methods 664 3. Association Rules 671 4. Cluster Validity 675 References 694 31 Data Mining Model Comparison 697 Paolo Giudici 1. Data Mining and Statistics 697 2. Data Mining ...
    Data Mining: Practical Machine Learning Tools and ... Witten and Frank's textbook was one of two books that 1 used for a data mining class in the Fall of 2001. ...