Co-presented by Vincenzo Orabona, Enterprise Architect, Eustema Spa
In this demo, we want to report an application of our semantic CMS to a real case study: the “Intrage” Portal (http://www.intrage.it). In particular, our aim is to show the advantages related to the application of semantic technologies for managing contents within a Web Information Portal. First of all, we present the capabilities of our solution, then we’ll show in which stage of the Content Lifecycle Process we are able to create added value respect to a standard CMS solution and, at last, the measures used to evaluate the results . In the following list, we report the features offered by the framework:
- Entity Extraction: extraction of entities belonging to particular dataset from a text;
- Classification: classification of structured (for example contained in a database) and unstructured information with respect to a reference model, such as a taxonomy, supported by inference mechanisms, also, and not purely keyword-based;
- Relations Extraction: ability to extract relations between concepts reported in a text;
- Language Detection: capability to identify the language used in a text;
- Document Similarity: given a document or a portion of text, being able to propose similar documents based on their content;
- Query expansion: expanding the initial keyword used to search documents, with a series of related terms (extracted from the ontology/taxonomy);
- Linked Open Data: links association of extracted entities to public datasets to facilitate the search of documents from the web and for improving SEO ranking.
The Intrage Web Portal (http://www.intrage.it/) was created to help “over 50” people to fulfill all those needs concerning with their life and to have some suggestions about future issues such as: retirement and social assistance. It means that the main topics used within it are: jobs, retirements, health, insurance, taxation, social welfare, homelife, consumer. The ontology, implemented to enhance content metadata, has been represented by a SKOS taxonomy where the “top” concepts were defined starting from the thematic channels already provided (main topics) that were enriched with other concepts provided by the “Nuovo Soggettario” taxonomy, published by the Library of Florence (http://thes.bncf.firenze.sbn.it/). Particular SKOS properties have been used to link the different concepts, together with some references to Wikipedia. Other resources have been prepared filtering NER data from a DBPedia dump and collecting geospatial data from Geonames. They have been used to enable entity annotation and linking process, according to the LOD paradigm. Semantic retrieval facilities are finally available to find contents of interest both for editing and browsing/searching aims. In the following, we summarize all the main advantages derived by the application of semantic technologies to the content management problem for the proposed case study.
- To facilitate the content editing task through the suggestion of similar previously published content, during the production step – When new contents are generates some facilities are provided to users suggesting previously published contents with “similar” keywords or topics. Thus, the editor has the possibility to link to other sources and verify the “originality” of what is being produced.
- To improve content annotation through the suggestion of a set of metadata and tags automatically extracted from the content itself – In addition to Named Entity Recognition (NER) utilities, our system provides on the base of a statistical analysis of the text a set of metadata and tags that can be useful to describe a content. This information is then exploited in the indexing stage for creating an index of terms.
- To improve the “visibility” of the published pages by injecting in the HTML code particular tags for search engines – In the Content Editor tool, the annotation results obtained from the text analysis are integrated as microdata within the HTML page, using additional span tags. These annotations are built by reusing the standard vocabulary Schema.org. Since search engines are able to process this additional information, and exploit them in the results they produce, they can apply higher rating criteria for pages containing this data, increasing their visibility.
- To provide effective search mechanisms for content produced by both editors and final users, allowing to perform queries according to criteria used in the information extraction stage – During the production of new contents, a tool is provided for users to determine correlated articles in the Knowledge Base. In the search phase, users can browse contents using the index of terms and exploit relations among ontology concepts for improving search results.
- To enhance user experience using content recommendation facilities, based on user profile (favourite pages, subscribed channels, etc.) - All recommendation systems exploit user profiles to provide suggestions about contents related to particular topics, concepts or entities. For the Intrage Portal, content recommendation is implemented in “My Home” section, where user can view targeted recommendation boxes. These are determined by an ad-hoc algorithm, which assigns a particular score to contents of interest on the base of user feedbacks about followed channels, favourite tags, etc.
We have to try to “measure” the introduced benefits for final users derived from the application of semantic technologies to a CMS environment. In particular, measurements have been performed to “quantify” the effective utility of the discussed advantages, defining different types of indicators using a “ five stars” rating model. During the demo we’ll show all these advantages, together with the indicators and their values, used for the results evaluation.