Working with non-numerical data can be tough, even for experienced data scientists. A typical machine learning model expects its features to be numbers, not words, emails, website pages, lists, graphs, or probability distributions. To be useful, data has to be transformed into a vector space first. But how? One popular approach would be to treat a non-numerical feature as categorical. This could work well if the number of categories is small (for example, if data indicates a profession or a country). However, if we try to apply this method to emails, we will likely get as many categories as there…

This story continues at The Next Web
The text above is a summary, you can read full article here.