Similarity definitions with SpringYou can use the provided spring namespace to define easily your measures. Your configuration file should look something like : <?xml version="1.0" encoding="UTF-8" ?>
<beans xmlns="http://www.springframework.org/schema/beans"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://www.gusto.com/schema/semsim http://www.gusto.com/schema/semsim.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ss="http://www.gusto.com/schema/semsim">
<!-- Define a similarity measure, here random value -->
<ss:randomValue id="randomValueSim" />
</beans>MeasuresYou can Define as many measure as you want and compose them using composed, compound and property similarities. IdentityReturns 1.0 if the resources or values are similar and 0.0 otherwise. It also allows to define a number of stop words, for which the identity doesn't apply. <ss:identity id="identity">
<ss:stopwords>
<ss:stopword>other</ss:stopword>
<ss:stopword>sample</ss:stopword>
</ss:stopwords>
</ss:identity>Intervals<ss:interval id="interval">
<ss:entries>
<ss:intervalEntry from="0" to="2" sim="1" />
<ss:intervalEntry from="2" to="5" sim="0.82" />
<ss:intervalEntry from="5" to="9" sim="0.6" />
<ss:intervalEntry from="9" to="15" sim="0.4" />
<ss:intervalEntry from="15" to="20" sim="0.2" />
</ss:entries>
</ss:interval>
<ss:interval id="runtimeInterval">
<ss:entries>
<ss:intervalEntry from="0" to="10" sim="1" />
<ss:intervalEntry from="10" to="25" sim="0.81" />
<ss:intervalEntry from="25" to="40" sim="0.6" />
</ss:entries>
</ss:interval>
<ss:dateInterval id="releaseDateInterval" unit="year">
<ss:entries>
<ss:intervalEntry from="0" to="2" sim="0.76" />
<ss:intervalEntry from="2" to="5" sim="0.5" />
<ss:intervalEntry from="5" to="15" sim="0.35" />
<ss:intervalEntry from="15" to="30" sim="0.2" />
</ss:entries>
</ss:dateInterval>String<ss:jaroWinkler id="jarowinklers" /> <ss:wordnet id="wns" firstWordOnly="false"
wordnetConfig="${wordnet.config}"
infocontent="${wordnet.infocontent}"
mapping="${wordnet.mapping}" />You can force the 3 WordNet parameters or just define them in a properties file. wordnet.config=config/wordnet/wordnet.xml
wordnet.infocontent=file:config/wordnet/ic-bnc-resnik-add1.dat
wordnet.mapping=file:config/wordnet/domain_independent.txt Domain SpecificAll the measures that are specific to a type of data, like Zip Codes, ... Zip code allows to calculate the similarity between 2 ZIP Codes. Actually it is designed for codes on 5 positions. level1 is the similarity if all 5 digits are the same, level2 is when 4 digits are the same, etc. <ss:zipCode id="zip" level1="1.0" level2="0.71" level3="0.61" level4="0.47" level5="0.21" /> MatrixIn the first example we define the matrix in-line. <ss:matrix id="mpaaMatrix2" prefix="http://www.ini-cerist.dz/movie-lens.owl#">
<ss:entries>
<ss:matrixEntry val1="PG" val2="PG-13" sim="0.7" />
<ss:matrixEntry val1="R" val2="NC-17" sim="0.8" />
<ss:matrixEntry val1="R" val2="PG-13" sim="0.3" />
</ss:entries>
</ss:matrix>If the matrix entries are too important, it's better to externalize them to an external file. Here it's a classpath resource names matrix-mpaa.properties containing : PG***PG-13=0.7
R***NC-17=0.81
R***PG-13=0.3 Notice that we can choose the separator between the 2 dimensions. Here we have chosen *** <ss:matrix id="mpaaMatrix" file="classpath:config/movielens/matrix-mpaa.properties" fileSeparator="***" prefix="">
<ss:stopwords>
<ss:stopword>other</ss:stopword>
</ss:stopwords>
</ss:matrix>It's also possible to define stopwords. SetsJaccardBinary can be defined on Values (ex1) or on Resources (ex2) <ss:jaccardBinary id="vjss" type="VALUE">
<ss:stopwords>
<ss:stopword>other</ss:stopword>
<ss:stopword>misc</ss:stopword>
</ss:stopwords>
</ss:jaccardBinary>
<ss:jaccardBinary id="rjss" type="RESOURCE">
<ss:stopwords>
<ss:stopword>other</ss:stopword>
<ss:stopword>misc</ss:stopword>
</ss:stopwords>
</ss:jaccardBinary>We define the similarity measure that will be used in ressemblance via the similarity property. If first example we use language similarity, in the second we use rvs. Notice that those similarities are defined somewhere in the document. <ss:ressemblance id="languagesRess" type="RESOURCE" similarity="language">
<ss:stopwords>
<ss:stopword>other</ss:stopword>
<ss:stopword>misc</ss:stopword>
</ss:stopwords>
</ss:ressemblance>
<ss:ressemblance id="rRESSEBLANCEss" type="RESOURCE" similarity="rvs" />Edge CountingEdge counting is the method that consists in considering the position of two resources in a hierarchy of terms to get the similarity. The implemented method is 'Wu & Palmer'. You can describe EdgeCounting with the maximal depth, here equals 8; define parent properties, which are properties allowing to navigate to the parent element; and define the stop resources, which are special resources' ids that are not considered as resources and thus cannot be used in the process. <ss:edgeCounting id="language" depth="8">
<ss:parents>
<ss:parent>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</ss:parent>
<ss:parent>http://www.w3.org/2000/01/rdf-schema#subClassOf</ss:parent>
</ss:parents>
<ss:stops>
<ss:stop>http://www.w3.org/2000/01/rdf-schema#Resource</ss:stop>
<ss:stop>http://www.w3.org/2002/07/owl#Class</ss:stop>
<ss:stop>http://www.w3.org/2000/01/rdf-schema#Class</ss:stop>
</ss:stops>
</ss:edgeCounting>You can simplify your measure configuration by using the stereotype. If you define your measure with the SEMANTIC stereotype the properties will be automatically injected. <ss:edgeCounting id="languageBis" depth="8" stereotype="SEMANTIC">
<!-- No need to define the properties -->
<!-- You can specify extra properties, in addition to the stereotype ones -->
</ss:edgeCounting> CompositionEach param is composed of a property, a type(Value|Resource|List), a weight and the similarity to apply on the property value. <ss:composed id="movieSim">
<ss:composedParam type="VALUE" weight="7" similarity="jarowinklers" property="hasTitle" />
<ss:composedParam type="VALUE" weight="4" similarity="jarowinklers" property="hasAlternativeTitle" />
<ss:composedParam type="VALUE" weight="2" similarity="jarowinklers" property="hasTagline" />
<ss:composedParam type="VALUE" weight="1" similarity="jarowinklers" property="hasPlotOutline" />
<ss:composedParam type="SET" weight="3" similarity="vjss" property="hasKeyWords" />
<ss:composedParam type="VALUE" weight="3" similarity="releaseDateInterval" property="hasReleaseDate" />
<ss:composedParam type="VALUE" weight="2" similarity="runtimeInterval" property="hasRuntime" />
<ss:composedParam type="SET" weight="5" similarity="languagesRess" property="hasLanguage" />
<ss:composedParam type="SET" weight="1" similarity="rjss" property="hasColor" />
<ss:composedParam type="SET" weight="9" similarity="rjss" property="hasGenre" />
<ss:composedParam type="RESOURCE" weight="3" similarity="mpaaMatrix" property="hasMPAA" />
<ss:composedParam type="RESOURCE" weight="3" similarity="identity" property="hasCompany" />
<ss:composedParam type="RESOURCE" weight="1" similarity="identity" property="hasAspectRation" />
</ss:composed> <ss:composed id="movieGenreSim">
<ss:composedParam type="SET" weight="2" similarity="rjss" property="hasGenre" />
</ss:composed> PropertyDefined by its type (Value|Resource|List) and the similarity that is applied on the property value. In this example, the similarity is based on a unique property hasTitle and we apply on it the jarowinkler similarity. <ss:property id="movie2Sim" type="VALUE" similarity="jarowinklers" property="hasTitle" /> CompoundA compound similarity that will be applied on an object of type User. <ss:compound id="zipcountrySim" similarity="zip">
<ss:property name="country" property="hasCountry" />
<ss:property name="zip" property="hasZipCode" />
</ss:compound> A composed similarity that will integrate several properties plus the compound one define above. <ss:composed id="userSim">
<ss:composedParam type="RESOURCE" weight="4" similarity="identity" property="hasOccupation" />
<ss:composedParam type="VALUE" weight="1" similarity="identity" property="hasSex" />
<ss:composedParam type="VALUE" weight="3" similarity="interval" property="hasAge" />
<ss:composedParam type="VALUE" weight="3" similarity="zip" property="hasZipCode" />
<!-- Integrating the compound similarity -->
<ss:composedParam type="RESOURCE" similarity="zipcountrySim" />
</ss:composed>
Notice that composedParam allows to define - a property : if the property attribute is defined.
- the object itself : if the property attribute is not defined.
|