Artificial Intelligence and the Black Box Problem

David Rito

February 4, 2021

The complexity of society due to the availability of big data demands novel approaches that only AI can offer. In the case of Tax Administrations, AI is of paramount importance to process all the necessary data to enforce tax compliance. Black box models are one of the available options to deal with complexity and dimension of processed data. However, challenges arise when black box models are implemented, namely legal concerns. Based on Poland's implementation of System Teleinformatyczny Izby Rozliczeniowej (STIR) as well as an analysis of the Taxation Modernization Act implemented by Germany, this paper shows certain requirements need to be met when using AI for processing data. It is suggested that using tools already in place to detect discrimination in targeting in the web can be taken as an example to be introduced in the tax domain to make black box models GDPR-proof and in doing so, compliant with EU law.

Legal Information Retrieval & Entailment with Legal Embeddings and Boosting

Houda Alberts, Akin Ipek, Roderick Lucas, and Phillip Wozny

December 11, 2020

In this paper we investigate three different methods for several legal document retrieval and entailment tasks; namely, new low complexity pre-trained embeddings, specifically trained on documents in the legal domain, transformer models and boosting algorithms. Task 1, a case law retrieval task, utilized a pairwise CatBoost resulting in an F1 score of 0.04. Task 2, a case law entailment task, utilized a combination of BM25+, embeddings and natural language inference (NLI) features winning third place with an F1 of 0.6180. Task 3, a statutory information retrieval task, utilized the aforementioned pre-trained embeddings in combination with TF-IDF features resulting in an F2 score of 0.4546. Lastly, task 4, a statutory entailment task, utilized BERT embeddings with XGBoost and achieved an accuracy of 0.5357. Notably, our Task 2 submission was the third best in the competition. Our findings illustrate that using legal embeddings an auxiliary linguistic features, such as NLI, show the most promise for future improvements.