ࡱ > [ @Y bjbj ΐ ΐ @Q r r L t j L L L $ " H L g . ćY} U D 0 t h# U L h# h# ( L ^ . L z L L L L L L t h# L L L L L L L L L r { : Data Mining Semantic Web Mining
Presented by: R.S.Ch.Sridevi,M.B.Sruthi,K.N.V.Bhavani
4thyr B.Tech,
M.V.G.R.Coll.of Engg
Abstract
This paper gives as an overview of semantic
web mining, semantic web languages and
applications of semantic web. Semantic Web Mining
aims at combining the two fast-developing research
areas Semantic Web and Web Mining. The idea is to
improve, on the one hand, the results of Web Mining
by exploiting the new semantic structures in the
Web; and to make use of Web Mining, on the other
hand, for building up the Semantic Web.
Keywords Semantic Web, Web Mining, World
Wide Web, Meta data, RDF, Web Development.
I INTRODUCTION
Semantic Web Mining aims at combining the two
areas Semantic Web and Web Mining. This vision
follows our observation that trends converge in both
areas: increasing numbers of researchers work on
improving the results of Web Mining by exploiting (the
new) semantic structures in the Web, and make use of
Web Mining techniques for building the Semantic Web.
Last but not least, these techniques can be used for
mining the Semantic Web itself. The wording Semantic
Web Mining emphasizes this spectrum of possible
interaction between both research areas: it can be read
both as Semantic (Web Mining) and as (Semantic Web)
Mining.
In the past few years, there have been many
attempts at breaking the syntax barrier1 on the Web. A
number of them rely on the semantic information in text
corpora that is implicitly exploited by statistical
methods. Some methods also analyze the structural
characteristics of data; they profit from standardized
syntax like XML. In this paper, we concentrate on
markup and mining approaches that refer to an explicit
conceptualization of entities in the respective domain.
These relate the syntactic tokens to background
knowledge represented in a model with formal
semantics. When we use the term semantic, we thus
have in mind a formal logical model to represent
knowledge.
The aim of this paper is to give an overview of
where the two areas of Semantic Web and Web Mining
meet today. In our survey, we will first describe the
current state of the two areas and then discuss, using an
example, their combination, thereby outlining future
research topics. We will provide references to typical
approaches. Most of them have not been developed
explicitly to close the gap between the Semantic Web
and Web Mining, but they fit naturally into this scheme.
II SEMANTIC WEB
The Semantic Web was thought up by Tim
Berners-Lee, inventor of the WWW, URIs, HTTP, and
HTML. There is a dedicated team of people at the World
Wide Web consortium (W3C) working to improve,
extend and standardize the system, and many languages,
publications, tools and so on have already been
developed.
Today it is almost impossible to integrate
information that is spread over several Web or intranet
pages. Consider, e. g., the query for a data mining expert
in a company intranet, where the only explicit
information stored are the relationships between people
and the projects they work in on the one hand, and
between projects and the topics they address on the other
hand. In that case, a skills management system should be
able to combine the information on the employees home
pages with the information on the projects home pages
in order to find the respective expert. To realize such
scenarios, metadata have to be interpreted and
appropriately combined by machines.
The process of building the Semantic Web is
still in genesis, but first standards, e.g. for the underlying
data model and an ontology language already appeared.
However, those structures are now to be filled with life
in applications. In order to make this task feasible, one
should start with the simpler tasks first. The following
steps show the direction where the Semantic Web is
heading:
1. providing a common syntax for machine
understandable statements,
2. establishing common vocabularies,
3. agreeing on a logical language,
4. using the language for exchanging proofs.
2.1.Layers of Semantic Web
Fig1. Layers of Semantic Web
The semantic web layers are suggested by
Berner;s Lee. On the first two layers, a common syntax
is provided. Uniform resource identifiers (URIs) provide
a standard way to refer to entities,3 while Unicode is a
standard for exchanging symbols. top of that sits
syntactic interoperability in the form of XML. The
Extensible Markup Language (XML) fixes a notation for
describing labeled trees, and XML Schema allows the
definition of grammars for valid XML documents. XML
documents can refer to different namespaces to make
explicit the context (and therefore meaning) of different
tags.
The Resource Description Framework (RDF)
can be seen as the first layer where information becomes
machine-understandable: According to the W3C
recommendation4, RDF is a foundation for processing
metadata; it provides interoperability between
applications that exchange machine-understandable
information on the Web.
RDF documents consist of 3 types of entities:
resources, properties, and statements. Resources may be
Web pages, parts or collections of Web pages, or any
objects which are not directly part of the WWW.
Properties are specific attributes, characteristics, or
relations describing resources. A resource together with
a property having a value for that resource forms an
RDF statement. A value is either a literal, a resource, or
another statement. Statements can thus be considered as
objectattributevalue triples.
Example of RDF statements: In fig. 2. Two of
the authors of the present paper are represented as
resources URI-GST and URIAHO. The statement on
the lower right consists of the resource URI-AHO and
the property cooperates-with with the value URI-
GST.The resource URISWMining has as value for the
property title the literal Semantic Web Mining.
RDF Schema was designed to be a simple data
typing model for RDF. Using RDF Schema, we can
create properties and classes, as well as doing some
slightly more "advanced" stuff such as creating ranges
and domains for properties. In fig. 2 URI-SWMining is
an instance of the concept Project, and thus by
inheritance also of the concept Top. The RDF Schema
is having additional properties. i.e. The schema allow us
to say that one class or property is a sub class or sub
property of another, and provides us ranges and domains
let us say what classes the subject and object of each
property must belong to, RDF Schema also contains a
set of properties for annotating schemata (plural form of
schema), providing comments, labels, and the like.
Fig 2. Example of RDF Statements
The next layer is the ontology vocabulary. An
ontology is an explicit formalization of a shared
understanding of a conceptualization. This high-level
definition is realized differently by various research
communities and thereby in ontology representation
languages. However, most of these languages have a
certain understanding in common, as most of them
include a set of concepts, a hierarchy on them, and
relations between concepts. Some of them also include
axioms in some specific logic. We will discuss the most
prominent approaches in more detail in the next section.
Logic is the next layer according to Figure 1.
However, nowadays research considers usually the
ontology and the logic levels together, as ontologies are
already based on logic and should allow for logical
axioms. By applying logical deduction, one can infer
new knowledge from the information which is stated
explicitly. For instance, the axiom saying that the
cooperates with-relation is symmetric (Figure 2) allows
to logically infer that the person addressed by URI-
AHO is cooperating with the person addressed by URI-
GST although only the person GST specifies his
cooperation with the person AHO. The kind of
inference that is possible depends heavily on the logics
chosen.
Proof and trust are the remaining layers. They
follow the understanding that it is important to be able to
check the validity of statements made in the (Semantic)
Web. Therefore the creators of statements should be able
to provide a proof which is verifiable by a machine. At
this level, it is not required that the machine of the reader
of the statements finds the proof itself, it just has to
check the proof provided by the creator. These two
layers are rarely tackled in todays research.
III LANGUAGES & TOOLS
The Semantic Web should enable greater access not only
to content but also to services on the Web. Users and
software agents should be able to discover, invoke,
compose, and monitor Web resources offering particular
services and having particular properties, and should be
able to do so with a high degree of automation if desired.
Powerful tools should be enabled by service
descriptions, across the Web service lifecycle.
Ontologies: Languages and Tools. Any knowledge
representation mechanism can play the role of a
Semantic Web language. Frame Logic (or FLogic) is
one candidate, since it provides a semantically founded
knowledge representation based on the frame-and-slot
metaphor. The most popular framework at the moment
are Description Logics (DL). DLs are subsets of first
order logic which aim at being as expressive as possible
while still being decidable.
The description logic SHIQ provides the basis
for DAML+OIL, which, in its turn, is a result of joining
the efforts of two projects: The DARPA Agent Markup
Language DAML was created as part of a research
programme started in August 2000 by DARPA, a US
governmental research organization. OIL (Ontology
Inference Layer) is an initiative funded by the European
Union programme.The latest version of DAML+OIL has
been released as a W3C Recommendation under the
name OWL.
OWL :
An important goal for Semantic Web markup
languages, then, is to establish a framework within
which these descriptions are made and shared. Web sites
should be able to employ a standard ontology, consisting
of a set of basic classes and properties, for declaring and
describing services, and the ontology structuring
mechanisms of OWL provide an appropriate, Web-
compatible representation language framework within
which to do this. The OWL-S ontology as a language for
describing services, reflecting the fact that it provides a
standard vocabulary that can be used together with the
other aspects of the OWL description language to create
service descriptions.
The OWL S provides three kinds of tasks:
1. Automatic Web service discovery. Automatic
Web service discovery is an automated process
for location of Web services that can provide a
particular class of service capabilities, while
adhering to some client-specified constraints.
2. Automatic Web service invocation. Automatic
Web service invocation is the automatic
invocation of an Web service by a computer
program or agent, given only a declarative
description of that service, as opposed to when
the agent has been pre-programmed to be able to
call that particular service.
3. Automatic Web service composition and
interoperation. This task involves the automatic
selection, composition, and interoperation of
Web services to perform some complex task,
given a high-level description of an objective.
DAML :
DAML is a language created by DARPA as an
ontology and inference language based upon RDF.
DAML takes RDF Schema a step further, by giving
us more in depth properties and classes. DAML
allows one to be even more expressive than with
RDF Schema, and brings us back on track with our
Semantic Web discussion by providing some simple
terms for creating inferences.
DAML+OIL
DAML provides us a method of saying things
such as inverses, unambiguous properties, unique
properties, lists, restrictions, cardinalities, pair wise
disjoint lists, data types, and so on.
One DAML construct that we shall run through is
the daml:inverseOf property. Using this property, we
can say that one property is the inverse of another.
The rdfs:range and rdfs:domain values of
daml:inverseOf is rdf:Property. Here is an example
of daml:inverseOf being used:-
:hasName daml:inverseOf :isNameOf .
:Sean :hasName "Sean" .
"Sean" :isNameOf :Sean .
The second useful DAML construct that we shall
go through is the daml:UnambiguousProperty
class. Saying that a Property is a
daml:UnambiguousProperty means that if the
object of the property is the same, then the
subjects are equivalent. For example:-
foaf:mbox rdf:type daml:UnambiguousProperty .
:x foaf:mbox .
:y foaf:mbox .
implies that:- :x daml:equivalentTo :y.
DAML is only one in a series of languages and
so forth that are being used.
TOOLS:
Several tools are in use for the creation and
maintenance of ontologies and metadata, as well as for
reasoning within them. Ontoedit is an ontology editor
which is connected to Ontobroker, an inference engine
for FLogic. It provides means for semantics-based
query handling over distributed resources. FLogic has
also influenced the development of Triple, an inference
engine based on Horn logic, which allows the modelling
of features of UML, Topic Maps, or RDF Schema. It can
interact with other inference engines, for example with
FaCT or RACER.
FaCT provides inference services for the
Description Language SHIQ. In, reasoning within SHIQ
and its relationship to DAML+OIL are discussed.
Reasoning is implemented in the FaCT inference engine,
which also underlies the ontology editor OilEd.
SAHA:
SAHA is an annotation tool. Saha is used with a
web browser, and it supports collaborative distributed
creation of metadata by centrally storing annotations,
which can be viewed and edited by different annotators.
Saha supports collaborative annotation of web-
documents and it can utilize ontology services for
sharing URIs and importing concepts defined in various
external ontologies. The tool is targeted especially for
creating metadata of web resources in semantic web
portals.
In Saha, our primary goal has not been the
automation of the annotation process, but rather to
support the creation of annotations that cannot be
produced automatically. Although requiring a lot of
work, such annotation can be seen as a collaborative
effort, comparable to the creation of different kinds of
Wikis. The basic architecture of Saha is depicted in
figure 3. It consists of the following functional parts:
1) Saha application, which is run on a web-server. It
stores and distributes annotations and creates web
pages which form the user interface used in creating
annotations.
2) PostgreSQL-database, which is used to store the
Jenas ontology model containing schema and
annotations.
3) Annotators using web browsers to interact with the
system.
4) The ONKI ontology-service, which is used to fetch
concepts defined in external ontologies and to share
instances created by the annotators.
5) Applications using the annotations created with
Saha.Annotations can be retrieved in RDF/XML using
HTTPGET
Saha is a web application implemented using the
Apache Cocoon and Jena frameworks. It is designed as a
web application in order to impose as little requirements
as possible to the end users computational environment.
To use Saha, all an annotator needs is an appropriate
web browser with an Internet connection. Saha uses
extensively techniques such as Javascript and Ajax in
order to provide annotator with simple and versatile user
interface.
Fig 3. Architecture of SAHA
Annotations created in Saha are based on
different metadata annotation schemas, which are
defined in OWL. The schema helps annotators to
describe resources in a consistent way and it can be
effectively used to construct a generic user-interface for
the application. In Saha, annotations are instantiated
classes and properties of an annotation schema, which
are linked to the document being described. The linking
plays an essential role in annotations, because 1) an
annotation is separate from the document it describes,
and 2) the way linking is done affects the meaning of the
relation between the annotation and the document. There
are two ways to associate an annotation with a document
in Saha. The first one makes the assertion that the
document is an instance of one specific class defined in
the schema. It can be thus used efficiently to classify
documents. An example of this kind of an annotation is
expressed in RDF/XML below:
SemanticComputingResearch
Group
The second method associates an annotation
with the document by using a named property, which is
defined in the annotation schema. This idea is similar,
e.g., to the usage the property annotates in Annotea :
SemanticComputingResearchGroup
Open Source Tools
The project has developed the OntoViews tool
including the semantic search engine Ontogator [28] and
the recommendation server Ontodella [41] for creating
semantic portals. In our work, we have generalized the
multi-facet search paradigm into using semantic web
ontologies, reasoning, and standards [14]. The first
application demonstrating the usability of our method
and tool set was the Museum Finland portal26. Since
then, the tool has been extended and applied in various
other demonstrational portals Creation of metadata is a
central bottleneck in fulfilling the vision of the semantic
web. In order to enable distributed metadata annotation
by using centralized ontology servers, we have created a
prototype of the SAHA annotation editor.
The project also develops tools for
semiautomatic content annotation, such as Terminator
used for term extraction, and Annomobile for matching
keywords with ontology URIs.Natural language
processing techniques are being developed for creating
tools for annotating Finnish text documents.
IV APPLICATION AREAS
Different application areas benefit from the
Semantic Web. We briefly present some areas which
are currently under development.
E-Learning
Sharing knowledge is the main idea of
education. With the growing amount of educational
material in the WWW, this idea gets a new dimension
and generates new technical challenges. Metadata
schemes for the exchange of educational Web resources
have been in use for a number of years. These metadata
schemes, for example LOM (Learning Objects Meta-
data), usually extend the Dublin Core standard7.
However, these standards lack a precise machine-
interpretable semantics to describe the content of the
learning objects. In an approach for accessing and
browsing distributed learning repositories is described
/ R S f s t
G
H
z
{
$ % M N O øθøٰvvv hwF 5B*CJ \aJ ph hwF B*CJ aJ ph hwF 5B*CJ \aJ ph hwF B*CJ aJ ph hwF hwF B*ph hwF 5B*\ph h 5B*\ph hwF 6B*]ph hwF 56B*\]ph hwF B*CJ, aJ, ph hwF B*CJ aJ ph * R S f g t
H
{
$ d 1$ 7$ 8$ H$ ^ a$gd $
rz8 d 1$ 7$ 8$ H$ ^ a$gd
$1$ 7$ 8$ H$ a$gd $a$gd d 1$ 7$ 8$ @&