The Ivy Text Analyser lets you perform text analysis on structured, semi-structured and unstructured text files (e.g. unstructured text, delimited, json, and csv files as well as kdb and odbc datasources). Using the Text Analyser, analysts can perform the following operations:
- Group documents or tokens in a variety of ways using clustering algorithms
- Find terms and phrases related to a token
- Search by keywords, regular expressions
- Search by similarity to a document or similarity to a collection of documents
- Search with multi-token co-occurance
- Search with lack of co-occurance in an n-sentence window
- Automatic entity recognition to find proper nouns, dates, times, phone numbers, email addresses, URLs, money, postal/zip codes
- Parse dates, times, and money
- Keyword detection and relevance ranking
- Tokenizing, stemming and sentence detection
- Find word frequency differences between document collections
- Create stop word lists
- Manage and maintain complex multi-step search history and document collection histories
Visualize unstructure text
And since all of the tables generated by the Text Analyser are kdb+ tables, analysts can use the Visual Query Builder and Visual Inspector to search, filter, and visualize their data in any way, shape or form. In addition, specialized visualizations have been created to display keyword occurences and references to date and time. In the following example, keyword occurances for various topics in Moby Dick are shown relative to each other. This is just one of the standard visualizations available from the Visual Inspector.
All of the unstructured text processing functions are available through an API for writing more complext queries and analytics functions via the q programming language. This means that entire import, transform and unstructured text analysis, filter and query operations can be automated for any kind of data source.
In the video below, an analyst explores the text of 845 unstructured news articles to trace events surrounding a fictitious kidnapping as part of one of the VAST challenges. The video illustrates several of the Text Analyser features such as keyword searching, document similarity, automatic entity recognition, and datasource history.