Commit db64b463 authored by haven-jeon's avatar haven-jeon
Browse files

some work

parent ccd73228
Loading
Loading
Loading
Loading
+4 −4
Original line number Diff line number Diff line
include kospacing/resources/dicts/*
include kospacing/resources/models/* 
include kospacing/embedding_maker.py
include kospacing/kospacing.py
include pykospacing/resources/dicts/*
include pykospacing/resources/models/* 
include pykospacing/embedding_maker.py
include pykospacing/kospacing.py
+7 −8
Original line number Diff line number Diff line
KoSpacing 
PyKoSpacing 
---------------

Python package for automatic Korean word spacing.

R verson can be found [here](https://github.com/haven-jeon/PyKoSpacing).

[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)


#### Introduction

Word spacing is one of the important parts of the preprocessing of Korean text analysis. Accurate spacing greatly affects the accuracy of subsequent text analysis. `KoSpacing` has fairly accurate automatic word spacing performance, especially good for online text originated from SNS.
Word spacing is one of the important parts of the preprocessing of Korean text analysis. Accurate spacing greatly affects the accuracy of subsequent text analysis. `PyKoSpacing` has fairly accurate automatic word spacing performance, especially good for online text originated from SNS.

`KoSpacing` is based on Deep Learning model trained from large corpus(more than 100 million NEWS articles from [Chan-Yub Park](https://github.com/mrchypark)). 
`PyKoSpacing` is based on Deep Learning model trained from large corpus(more than 100 million NEWS articles from [Chan-Yub Park](https://github.com/mrchypark)). 


#### Performance
@@ -27,16 +28,14 @@ Word spacing is one of the important parts of the preprocessing of Korean text a

#### Install

You need to install conda binary from https://www.anaconda.com/download/. Please install Python 3.6 version or later.

To install from GitHub, use

    pip install git+git://github.com/haven-jeon/.git
    pip install git+git://github.com/haven-jeon/PyKoSpacing.git


#### Example 

    >>> from kospacing import spacing
    >>> from pykospacing import spacing
    >>> spacing("김형호영화시장분석가는'1987'의네이버영화정보네티즌10점평에서언급된단어들을지난해12월27일부터올해1월10일까지통계프로그램R과KoNLP패키지로텍스트마이닝하여분석했다.")
    "김형호 영화시장 분석가는 '1987'의 네이버 영화 정보 네티즌 10점 평에서 언급된 단어들을 지난해 12월 27일부터 올해 1월 10일까지 통계 프로그램 R과 KoNLP 패키지로 텍스트마이닝하여 분석했다."

@@ -51,7 +50,7 @@ To install from GitHub, use
```markdowns
@misc{heewon2018,
author = {Heewon Jeon},
title = {KoSpacing: Automatic Korean word spacing with R},
title = {KoSpacing: Automatic Korean word spacing},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/KoSpacing}}
+0 −0

File moved.

+3 −3
Original line number Diff line number Diff line
@@ -8,12 +8,12 @@ os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import numpy as np
from keras.models import load_model

from kospacing.embedding_maker import load_vocab, encoding_and_padding
from pykospacing.embedding_maker import load_vocab, encoding_and_padding
import pkg_resources, warnings


model_path = pkg_resources.resource_filename('kospacing', os.path.join("resources", "models", "kospacing"))
dic_path = pkg_resources.resource_filename('kospacing', os.path.join("resources", "dicts", "c2v.dic"))
model_path = pkg_resources.resource_filename('pykospacing', os.path.join("resources", "models", "kospacing"))
dic_path = pkg_resources.resource_filename('pykospacing', os.path.join("resources", "dicts", "c2v.dic"))
model = None
model = load_model(model_path)
model._make_predict_function()
Loading