Automatic Analysis of Annual Financial Reports: A Case Study

Jasmina Smailović, Martin Žnidaršič, Aljoša Valentinčič, Igor Lončarski, Marko Pahor, Pedro Tiago Martins, Senja Pollak


The main goal of reporting in the financial system is to ensure high quality and useful information about the financial position of firms, and to make it available to a wide range of users, including existing and potential investors, financial institutions, employees, the government, etc. Formal reports contain both strictly regulated, financial sections, and unregulated, narrative parts. Our research starts from the hypothesis that there is a relation between business performance and not only content, but also the linguistic properties of unregulated parts of annual reports. In the paper we first present our dataset of financial reports and the techniques we used to extract the unregulated textual parts. Next, we introduce our approaches of differential content analysis and analysis of correlation with financial aspects. The differential content analysis is based on TF-IDF weighting and is aimed at finding the characteristic terms for each year (i.e. the terms which were not prevailing in the previous reports by the same firm). For correlation of linguistic characteristics of reports with financial aspects, an array of linguistic features was considered and selected financial indicators were used. Linguistic features range from measurements, such as personal/impersonal pronouns ratio, to assessments of characteristics like financial sentiment, trust, doubt, and discursive features expressing certainty, modality, etc. While some features show strong correlation with industry (e.g., shorter and more personal reports by IT industry compared to automotive industry), doubt, communication – as well as necessity and cognition words to some extent – are positively correlated with failure.


Financial reports, 10-K, differential content analysis, linguistic characteristics, financial indicators

Full Text: PDF