Word spacing is one of the important parts of the preprocessing of Korean text analysis. Accurate spacing greatly affects the accuracy of subsequent text analysis. `KoSpacing` has fairly accurate automatic word spacing performance, especially good for online text originated from SNS.
`KoSpacing` is based on Deep Learning model trained from large corpus(more than 100 million NEWS articles from [Chan-Yub Park](https://github.com/mrchypark)).
#### Performance
| Test Set | Accuracy |
|---|---|
| Sejong(colloquial style) Corpus(1M) | 97.1% |
| OOOO(literary style) Corpus(3M) | 94.3% |
- Accuracy = # correctly spaced characters/# characters in the test data.
- Might be increased performance if normalize compound words.
#### Install
You need to install conda binary from https://www.anaconda.com/download/. Please install Python 3.6 version or later.