Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources

ABSTRACT
The quality of web sources has been traditionally evaluated using
exogenous signals such as the hyperlink structure of the graph. We
propose a new approach that relies on endogenous signals, namely,
the correctness of factual information provided by the source. A
source that has few false facts is considered to be trustworthy.
The facts are automatically extracted from each source by information
extraction methods commonly used to construct knowledge
bases. We propose a way to distinguish errors made in the extraction
process from factual errors in the web source per se, by using
joint inference in a novel multi-layer probabilistic model.
We call the trustworthiness score we computed Knowledge-Based
Trust (KBT). On synthetic data, we show that our method can reliably
compute the true trustworthiness levels of the sources. We
then apply it to a database of 2.8B facts extracted from the web,
and thereby estimate the trustworthiness of 119M webpages. Manual
evaluation of a subset of the results confirms the effectiveness
of the method.

From Knowledge-Based Trust: Estimating
the Trustworthiness of Web Sources [PDF]