Skip to main content

    Kiril Simov

    Linked Open Data movement is maturing. LOD cloud increases by billions of triples yearly. Technologies and guidelines about how to produce LOD fast, how to assure their quality, and how to provide vertical oriented data services are being... more
    Linked Open Data movement is maturing. LOD cloud increases by billions of triples yearly. Technologies and guidelines about how to produce LOD fast, how to assure their quality, and how to provide vertical oriented data services are being developed (LOD2, LATC, baseKB). Little is said however about how to include reasoning in the LOD framework, and about how to cope with its diversity. This paper deals with this topic. It presents a data service–FactForge–the biggest body of general knowledge from LOD on which inference ...
    This paper is an improvement over the work done on POS disambiguation for Bulgarian via Neural Networks (Vlasseva 1999). Our improvements are in several directions: (1) we extended the range of grammatical features predicted by the system... more
    This paper is an improvement over the work done on POS disambiguation for Bulgarian via Neural Networks (Vlasseva 1999). Our improvements are in several directions: (1) we extended the range of grammatical features predicted by the system to cover almost all paradigmatic members of Bulgarian words, (2) we changed the encoding schemata for grammatical features in order to minimize the computation and to use more extensively the context layer of the network, (3) we changed the evaluation of the network output in order to minimize the side effects from evaluating cases that are not relevant in a particular instance of ambiguity. Besides the improvements when using neural networks, we did some improvements on the choice of the training corpus and we added a rule-based preprocessing component in order to disambiguate the cases for which there are rules ensuring 100% correct results
    Research Interests:
    This paper discusses in detail the design and implementation phases during the creation of the Bulgarian HPSG-based treebank (BulTreeBank). First, the interconnection of the HPSG language model, the linguistic parameters of the annotation... more
    This paper discusses in detail the design and implementation phases during the creation of the Bulgarian HPSG-based treebank (BulTreeBank). First, the interconnection of the HPSG language model, the linguistic parameters of the annotation scheme and the underlying ...
    D. Dicheva and D. Dochev (Eds.): AIMSA 2010, LNAI 6304, pp. 269–270, 2010. © Springer-Verlag Berlin Heidelberg 2010 ... Mapping Data Driven and Upper Level Ontology ... Mariana Damova, Svetoslav Petrov, and Kiril Simov ... Ontotext AD,... more
    D. Dicheva and D. Dochev (Eds.): AIMSA 2010, LNAI 6304, pp. 269–270, 2010. © Springer-Verlag Berlin Heidelberg 2010 ... Mapping Data Driven and Upper Level Ontology ... Mariana Damova, Svetoslav Petrov, and Kiril Simov ... Ontotext AD, Tsarigradsko Chosse 135, ...
    There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This... more
    There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the neighboring areas of Greece and Bulgaria and to raise awareness about their common cultural identity, the focus being on literature, folklore and language. To this end, a bilingual collection of literary and folklore texts in Greek and Bulgarian was developed along with a number of accompanying resources. The authors present the methodology adopted for the automatic annotation of the textual data at various levels of linguistic analysis elaborating on the Greek and Bulgarian text processing tools that are integrated in the cross-lingual search and retrieval mechanisms, and discuss issues and problems encountered in the course of the project life-cycle.
    Research Interests:
    The paper describes an approach for semantic annotation of multimedia objects stored in a Digital Library implemented as a Web Service. The Library has its own fixed annotation schema and provides a set of functions accessible as Web... more
    The paper describes an approach for semantic annotation of multimedia objects stored in a Digital Library implemented as a Web Service. The Library has its own fixed annotation schema and provides a set of functions accessible as Web Service operations. The main objective of semantic annotations (supported by ontologies) is to extend both the Library functionality and the scope of the knowledge in it.
    Introduction Our group is currently working on a Knowledge Control System (KCS) which is considered a backbone for robust ontology middleware. The KCS is a part of the On-ToKnowledge Project and uses Sesame as a repository access layer.... more
    Introduction Our group is currently working on a Knowledge Control System (KCS) which is considered a backbone for robust ontology middleware. The KCS is a part of the On-ToKnowledge Project and uses Sesame as a repository access layer. The following features have been considered: . Versioning (tracking changes) of knowledge bases; . Access control (security) system; . Meta-information for knowledge bases. These three aspects are interrelated as depicted on the following scheme. Knowledge ControlSystem MetaInformation Access Control Tracking Changes S to re a s T ra c k b F i lt e re d a nd p re s e rved b y C urrent U ser Info. Change Investigation Tracking Changes, Versioning We aim to provide versioning of RDF(S) on a structural level in the spirit of the software source control systems. The main principles are outlined below: VPR1: The RDF statement is the smallest directly manageable piece of knowledge. VPR2: An RDF statement cannot be ch
    Research Interests:
    Abstract The chapter introduces the process of design of two upper-level ontologies—PROTON and UMBEL—into reference ontologies and their integration in the so-called Reference Knowledge Stack (RKS). It is argued that RKS is an important... more
    Abstract The chapter introduces the process of design of two upper-level ontologies—PROTON and UMBEL—into reference ontologies and their integration in the so-called Reference Knowledge Stack (RKS). It is argued that RKS is an important step in the efforts of the Linked Open Data (LOD) project to transform the Web into a global data space with diverse real data, available for review and analysis. RKS is intended to make the interoperability between published datasets much more efficient than it is now. The ...
    Classifying linguistic objects is a widespread and important linguistic task, but hand deducinga classificatory system from a general linguistic theory can consume much effort and introducepernicious errors. We present an abstract... more
    Classifying linguistic objects is a widespread and important linguistic task, but hand deducinga classificatory system from a general linguistic theory can consume much effort and introducepernicious errors. We present an abstract prototype device that effectively deduces an accurate classificatorysystem from a finite linguistic theory.
    The paper discusses shallow semantic annotation of Bulgarian treebank. Our goal is to construct the next layer of linguistic interpretation over the morphological and syntactic layers that have already been encoded in the treebank. The... more
    The paper discusses shallow semantic annotation of Bulgarian treebank. Our goal is to construct the next layer of linguistic interpretation over the morphological and syntactic layers that have already been encoded in the treebank. The annotation is called shallow because it encodes only the senses for the non-functional words and the relations between the semantic indices connected to them. We
    One of the goals of the “Language Technology for LifeLong Learning” project is the creation of an appropriate methodology to support both formal and informal learning. Services are being developed that are based on the interaction between... more
    One of the goals of the “Language Technology for LifeLong Learning” project is the creation of an appropriate methodology to support both formal and informal learning. Services are being developed that are based on the interaction between a formal representation of (domain) knowledge in the form of an ontology created by experts and a social component which complements it, that is tags and social networks. It is expected that this combination will improve learner interaction, knowledge discovery as well as knowledge co- ...
    In this paper we are reporting about an ongoing project LT4eL (Language Technolohy for eLearning) aiming at improving the effectiveness of retrieval and accessibility of learning objects within a learning management system. We elaborate... more
    In this paper we are reporting about an ongoing project LT4eL (Language Technolohy for eLearning) aiming at improving the effectiveness of retrieval and accessibility of learning objects within a learning management system. We elaborate the process of building the domain ontology and present the multilingual support offered to the application.
    Introduction In this paper we describe the architecture and the intended applications of the CLaRK system. The development of the CLaRK system started under the Tbingen-Sofia International Graduate Programme in Computational Linguistics... more
    Introduction In this paper we describe the architecture and the intended applications of the CLaRK system. The development of the CLaRK system started under the Tbingen-Sofia International Graduate Programme in Computational Linguistics and Represented Knowledge (CLaRK). The main aim behind the design of the system is the minimization of the human work during creation of corpora. Creation of corpora is
    The paper outlines a hybrid architecture for a partial parser based on regular grammars over XML documents. The parser is used to support the annotation process in the BulTreeBank project. Thus the parser annotates only the... more
    The paper outlines a hybrid architecture for a partial parser based on regular grammars over XML documents. The parser is used to support the annotation process in the BulTreeBank project. Thus the parser annotates only the 'sure' cases. To maximize the number of the analyzed phrases the parser applies a set of grammars in a dynamic fashion. Each grammar determines not only the constituent structure (plus some syntactic dependencies internal to the structure), but also a description of the local and global context of the recognized phrase. The grammars available to the parser are arranged in a network. The order of the grammars application depends on the initial ordering in the network and the descriptions associated with the grammars. Thus the traverse is not deterministic. Additionally, the application of the grammars can be interleaved with the applications of other XML tools like remove, insert and transform operations. This architecture provides a flexible means for g...
    CLaRK an XML-based System for Corpora Development * Kiril Simov, Alexander Simov, Hristo Ganev, Milen Kouylekov, Ilko Grigorov, Krasimira Ivanova. BulTreeBank Project http://www.BulTreeBank.org Linguistic Modelling Laboratory, Bulgarian... more
    CLaRK an XML-based System for Corpora Development * Kiril Simov, Alexander Simov, Hristo Ganev, Milen Kouylekov, Ilko Grigorov, Krasimira Ivanova. BulTreeBank Project http://www.BulTreeBank.org Linguistic Modelling Laboratory, Bulgarian Academy of Sciences ...
    Reliable automatic semantic annotation systems do not exist for many languages. Their creation depends in many respects on construction of gold standard corpora. In this paper we present a system for supporting the semi-automatic... more
    Reliable automatic semantic annotation systems do not exist for many languages. Their creation depends in many respects on construction of gold standard corpora. In this paper we present a system for supporting the semi-automatic construction of such corpora. The ...
    ... 1998. Gonzalo et al. 1998] Gonzalo, Julio, Felisa Verdejo, Irina Chugur and Juan Cigar-ran. ... Strzalkowski et al. 98] Strzalkowski, Tomek, Louise Guthrie, Jussi Karlgren, Jim Leistensnider, Fang Lin, Jose Perez-Carballo, Troy... more
    ... 1998. Gonzalo et al. 1998] Gonzalo, Julio, Felisa Verdejo, Irina Chugur and Juan Cigar-ran. ... Strzalkowski et al. 98] Strzalkowski, Tomek, Louise Guthrie, Jussi Karlgren, Jim Leistensnider, Fang Lin, Jose Perez-Carballo, Troy Straszheim, Jin Wang and Jon Wilding. ...
    Research Interests:
    The paper presents the strategies and conversion principles of BulTreeBank into Universal Dependencies annotation scheme. The mappings are discussed from linguistic and technical point of view. The mapping from the original resource to... more
    The paper presents the strategies and conversion principles of BulTreeBank into Universal Dependencies annotation scheme. The mappings are discussed from linguistic and technical point of view. The mapping from the original resource to the new one has been done on morphological and syntactic level. The first release of the treebank was issued in May 2015. It contains
    125 000 tokens, which cover roughly half of the corpus data.
    Research Interests:
    Research Interests:
    Research Interests:
    Research Interests:
    This report is an extension of a paper published at RANLP conference 2001. The paper is an improvement over the work done on POS disambiguation for Bulgarian via Neural Networks (Vlasseva 1999). Our improvements are in several directions:... more
    This report is an extension of a paper published at RANLP conference 2001. The paper is an improvement over the work done on POS disambiguation for Bulgarian via Neural Networks (Vlasseva 1999). Our improvements are in several directions: (1) we extended the range of grammatical features predicted by the system to cover almost all paradigmatic members of Bulgarian words, (2) we changed the encoding schemata for grammatical features in order to minimize the computation and to use more extensively the context layer of the network, (3) we changed the evaluation of the network output in order to minimize the side effects from evaluating cases that are not relevant in a particular instance of ambiguity. Besides the improvements when using neural networks, we did some improvements on the choice of the training corpus and we added a rule-based preprocessing component in order to disambiguate the cases for which there are rules ensuring 100% correct results
    Research Interests: