Image
Blog
Here’s an overview of our latest blog posts on enterprise search, document intelligence and legal tech.
Blog
26.08.2015
Language Identification and Language Chunking
Identifying the language of a given text is a crucial preprocessing step for almost all text analysis methods. It is considered as a solved problem since more than 20 years. Available solutions build on the simple observation that for all languages typical letter sequences (letter n-grams) exist, that occur significantly more frequent in this language than in other languages.
Blog
07.07.2015
The difference between stemming and lemmatization
"Stemming" as well as "Lemmatization" are commonly used buzzwords in the field of Information Retrieval (IR), particularly in the development of powerful search engines. [...]
So what exactly is the difference between these two methods? What are the advantages and disadvantages and which one should be preferred? [...]
So what exactly is the difference between these two methods? What are the advantages and disadvantages and which one should be preferred? [...]
Blog
13.04.2015
Approximative data structures for natural language processing
Some say software developers draw their motivation from minimizing or maximizing numbers in any given problem. That's a smug innuendo. From my experience, developers are always on the lookout for beautiful solutions, of which numbers are but a symptom. The usage of approximative data structures for language processing is one such example of a beautiful idea with nice numbers.
Questions? We’re happy to answer them!
Have feedback or a question about a blog post?
Or would you like to learn more about a specific topic?
Or would you like to learn more about a specific topic?