Tag: nlp

Classifying messy documents: A common-sense approach (Part II)

by Daria Rostovtseva • August 16, 2021 • Comments Off

In Part I of this blog post, I provided an overview of the approach my team and I took tackling the problem of classifying diverse, messy documents at scale. I shared the details of how we chose to preprocess the data and how we created features from documents of interest […]

Classifying messy documents: A common-sense approach (Part II) was published on SAS Users.

Classifying messy documents: A common-sense approach (Part I)

by Daria Rostovtseva • August 4, 2021 • Comments Off

Unstructured text data is ubiquitous in both business and government and extracting value from it at scale is a common challenge. Organizations that have been around for a while often have vast paper archives. Digitizing these archives does not necessarily make them usable for search and analysis, since documents are […]

Classifying messy documents: A common-sense approach (Part I) was published on SAS Users.

Convert written numbers to Arabic numerals

by Heuristic Andrew • March 22, 2012 • Comments Off

In Natural Language Processing it can be helpful within a larger body of text to standardize written numbers to Arabic numerals. For example, we will change “I am forty-six years old” to “I am 46 years old,” so the age … C…