Abstract
We present a novel framework for uncertain data management, called ActivePDB. We are given a relational probabilistic database, where each tuple is correct with some probability; e.g., a database constructed from textual data using information extraction. We are now given a query and we want to determine the correctness of its results. Unlike probabilistic databases, we have an oracle that can resolve the uncertainty, such as a domain expert that can verify data against their sources. Since verification may be costly, our goal is to determine the correct output of the query, while asking the oracle to verify as few tuples as possible. ActivePDB provides an end-to-end solution to this problem. In a nutshell, we first track provenance to identify which input tuples contribute to the derivation of each output tuple, and in what ways. We then design an active learning solution to iteratively choose tuples to be verified based on the provenance structure and on an evolving estimation of the probability of the tuples correctness. We will demonstrate ActivePDB in the context of the NELL database of extracted facts, allowing participants to both pose queries and play the role of oracles.
Original language | English |
---|---|
Pages (from-to) | 3638-3641 |
Number of pages | 4 |
Journal | Proceedings of the VLDB Endowment |
Volume | 15 |
Issue number | 12 |
DOIs | |
State | Published - 2022 |
Event | 48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia Duration: 5 Sep 2022 → 9 Sep 2022 |
Bibliographical note
Publisher Copyright:© 2022, VLDB Endowment. All rights reserved.