CATMA – Computer Aided Textual Markup & Analysis
CATMA is a practical and intuitive tool for literary scholars, students and other parties with an interest in text analysis and literary research. Being implemented as a web application in the newest version, CATMA also facilitates the exchange of analytical results via the internet, which makes collaborative work more comfortable.
CATMA integrates three functional, interactive modules: the Tagger, the Analyzer and the Visualizer. The Tagger module offers an intuitive graphical interface and a wide range of options for the definition of Tags suitable for marking up a text. Due to the use of Feature Structures, CATMA allows for flexibility and still corresponds to relevant XML and TEI standards, enabling tools’ interoperability. Being a Standoff Markup technique, Feature Structures also permit overlapping Markup. The Analyzer module contains different text analytical functions as well as a natural language based Query Builder, allowing the user to execute complex and powerful Queries without having to learn a complicated Query language. The Visualizer module offers the possibility of generating distribution charts of the results of analyses, making the evaluation of results more comfortable.
Within the frameworks of the project heureCLÉA, we are aiming to implement further (semi-)automated functions using machine learning processes. The ambition is to enable CATMA to generate automated Markup of time-related phenomena in literary texts up to a certain level of complexity as well as to point out the cases in which automated Markup is impossible due to high complexity or ambiguity.
CATMA can be used here.
HeidelTime is a multilingual, cross-domain temporal tagger. It extracts temporal expressions from text documents and normalizes them according to the TIMEX3 annotation standard, which is part of the temporal markup language TimeML. In contrast to most other temporal taggers, HeidelTime is not focused on the news domain, but aims at extracting and normalizing temporal expressions with high quality from multiple domains, e.g., news documents, narrative-style documents (such as Wikipedia articles), and colloquial text (e.g., tweets). Different domains possess different characteristics and thus result in different challenges for temporal taggers. Only if these challenges are taken into account, temporal taggers can achieve high quality extraction and normalization results on different domains.