Optimised Storage for Datalog Reasoning
Xinyue Zhang
Pan Hu
Yavor Nenov
Ian Horrocks
[Paper]
[Benchmarks]

Abstract

Materialisation facilitates Datalog reasoning by precomputing all consequences of the facts and the rules so that queries can be directly answered over the materialised facts. However, storing all materialised facts may be infeasible in practice, especially when the rules are complex and the given set of facts is large. We observe that for certain combinations of rules, there exist data structures that compactly represent the reasoning result and can be efficiently queried when necessary. In this paper, we present a general framework that allows for the integration of such optimised storage schemes with standard materialisation algorithms. Moreover, we devise optimised storage schemes targeting at transitive rules and union rules, two types of (combination of) rules that commonly occur in practice. Our experimental evaluation shows that our approach significantly improves memory consumption, sometimes by orders of magnitude, while remaining competitive in terms of query answering time.


Code

The linux executables are available here and are explained as follows.
  • standard: applies the seminaive algorithm for materialisation and uses just normal tables for storage.
  • multiScheme: uses plain table, TC and Union schemes discussed in this paper.
  • TCModule [1]: applies an optimised application of TC rules, and a standard seminaive algorithm for the remaining rules, but only a plain table is used for storage.
The source code is available upon request at xinyue.zhang AT cs.ox.ac.uk.
[1] Hu, Pan, Boris Motik, and Ian Horrocks. "Modular materialisation of datalog programs." Artificial Intelligence 308 (2022): 103726.


Datasets and Datalog Programs

The test datasets and Datalog programs used in our paper are available here. The data description is shown below:
  • facts/ (input facts)
    • DBpedia/ (the subsections of DBpedia_2016-05.tar.gz)
    • DAG.bz2 (facts in DAG, can be used without decompressing)
    • Relations.bz2 (facts in Relations, can be used without decompressing)

  • rules/ (containing Datalog programs used in the evaluation)
    • DAG-R.dlog (the rules of DAG)
    • Relations.dlog (the rules of Relations)
    • DBpedia-SKOS.dlog (the SKOS rules of DBpedia)

This repo also includes the instructions to reproduce the experiment results.


Paper and Supplementary Material

Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks.
Optimised Storage for Datalog Reasoning.
In AAAI, 2024.
camera ready



@inproceedings{zhang2024optimised,
  title={Optimised Storage for Datalog Reasoning},
  author={Zhang, Xinyue and Hu, Pan and Nenov, Yavor and Horrocks, Ian},
  booktitle={The 38th Annual AAAI Conference on Artificial Intelligence},
  year={2024}
}


Acknowledgements: The design of this page was modified from the template made by Phillip Isola and Richard Zhang.