Update readme (59cc8645) · Commits · git-mirror / BallonsTranslator

README_EN.md

+8 −29

Original line number	Diff line number	Diff line
		@@ -209,41 +209,20 @@ This project is heavily dependent upon [manga-image-translator](https://github.c
		[Sugoi translator](https://sugoitranslator.com/) is created by [mingshiba](https://www.patreon.com/mingshiba).

		## Text detection
		* Support English and Japanese text detection, training code and more details can be found at [comic-text-detector](https://github.com/dmMaze/comic-text-detector)
		* Support using text detection from [Starriver Cloud (Tuanzi Manga OCR)](https://cloud.stariver.org.cn/). Username and password need to be filled in, and automatic login will be performed each time the program is launched.

		* For detailed instructions, see Tuanzi OCR Instructions: ([Chinese](doc/团子OCR说明.md) & [Brazilian Portuguese](doc/Manual_TuanziOCR_pt-BR.md) only)
		* You can find information about Text detection modules [here.](doc/modules/detector.md)

		## OCR
		* All mit* models are from manga-image-translator, support English, Japanese and Korean recognition and text color extraction.
		* [manga_ocr](https://github.com/kha-white/manga-ocr) is from [kha-white](https://github.com/kha-white), text recognition for Japanese, with the main focus being Japanese manga.
		* Support using OCR from [Starriver Cloud (Tuanzi Manga OCR)](https://cloud.stariver.org.cn/). Username and password need to be filled in, and automatic login will be performed each time the program is launched.
		* The current implementation uses OCR on each textblock individually, resulting in slower speed and no significant improvement in accuracy. It is not recommended. If needed, please use the Tuanzi Detector instead.
		* When using the Tuanzi Detector for text detection, it is recommended to set OCR to none_ocr to directly read the text, saving time and reducing the number of requests.
		* For detailed instructions, see Tuanzi OCR Instructions: ([Chinese](doc/团子OCR说明.md) & [Brazilian Portuguese](doc/Manual_TuanziOCR_pt-BR.md) only)
		* Added as an "optional" PaddleOCR module. In Debug mode you will see a message stating that it is not there. You can simply install it by following the instructions described there. If you don’t want to install the package yourself, just uncomment (remove the `#`) the lines with paddlepaddle(gpu) and paddleocr. Bet everything at your own peril andrisk. For me (bropines) and two testers, everything was installed fine, you may have an error. Write about it in issue and tag me.

		* You can find information about OCR modules [here.](doc/modules/Translators.md)

		## Inpainting
		* AOT is from [manga-image-translator](https://github.com/zyddnys/manga-image-translator).
		* All lama* are finetuned using [LaMa](https://github.com/advimman/lama)
		* PatchMatch is an algorithm from [PyPatchMatch](https://github.com/vacancy/PyPatchMatch), this program uses a [modified version](https://github.com/dmMaze/PyPatchMatchInpaint) by me.

		* You can find information about Inpainting modules [here.](doc/modules/inpaint.md)

		## Translators
		Available translators: Google, DeepL, ChatGPT, Sugoi, Caiyun, Baidu. Papago, and Yandex.
		* Google shuts down translate service in China, please set corresponding 'url' in config panel to *.com.
		* [Caiyun](https://dashboard.caiyunapp.com/), [ChatGPT](https://platform.openai.com/playground), [Yandex](https://yandex.com/dev/translate/), [Baidu](http://developers.baidu.com/), and [DeepL](https://www.deepl.com/docs-api/api-access) translators needs to require a token or api key.
		* DeepL & Sugoi translator (and it's CT2 Translation conversion) thanks to [Snowad14](https://github.com/Snowad14).
		* Sugoi translates Japanese to English completely offline. Download [offline model](https://drive.google.com/drive/folders/1KnDlfUM9zbnYFTo6iCbnBaBKabXfnVJm), move "sugoi_translator" into the BallonsTranslator/ballontranslator/data/models.
		* [Sakura-13B-Galgame](https://github.com/SakuraLLM/Sakura-13B-Galgame), check ```low vram mode``` in config panel if you\'re running it locally on a single device and encountered a crash due to vram OOM (enabled by default).
		* DeepLX: Please refer to [Vercel](https://github.com/bropines/Deeplx-vercel) or [deeplx](https://github.com/OwO-Network/DeepLX)
		* Added the [Translators](https://github.com/UlionTse/translators) library, which supports access to some translator services without api keys. You can find out about supported services [here](https://github.com/UlionTse/translators#supported-translation-services).
		* Supports two versions of OpenAI-compliant translators that work with official or third-party LLM providers compatible with the OpenAI API, requiring configuration in the settings panel.
		* The non-suffix version consumes fewer tokens but has slightly weaker sentence splitting stability, which may cause issues with long text translations.
		* The 'exp' suffix version uses more tokens, but has better stability and includes "jailbreaking" in the Prompt, making it suitable for long text translations.

		For other good offline English translators, please refer to this [thread](https://github.com/dmMaze/BallonsTranslator/discussions/515).
		To add a new translator, please reference [how_to_add_new_translator](doc/how_to_add_new_translator.md), it is simple as subclass a BaseClass and implementing two interfaces, then you can use it in the application, you are welcome to contribute to the project.

		* You can find information about Translators modules [here.](doc/modules/Translators.md)

		## FAQ & Misc
		* If your computer has an Nvidia GPU or Apple silicon, the program will enable hardware acceleration.
		@@ -251,4 +230,4 @@ To add a new translator, please reference [how_to_add_new_translator](doc/how_to
		* Accelarate performance if you have a [NVIDIA's CUDA](https://pytorch.org/docs/stable/notes/cuda.html) or [AMD's ROCm](https://pytorch.org/docs/stable/notes/hip.html) device as most modules uses [PyTorch](https://pytorch.org/get-started/locally/).
		* Fonts are from your system's fonts.
		* Thanks to [bropines](https://github.com/bropines) for the Russian localization.
		* Added Export to photoshop JSX script by [bropines](https://github.com/bropines). </br> To read the instructions, improve the code and just poke around to see how it works, you can go to `scripts/export to photoshop` -> `install_manual.md`.
		~~* Added Export to photoshop JSX script by [bropines](https://github.com/bropines). </br> To read the instructions, improve the code and just poke around to see how it works, you can go to `scripts/export to photoshop` -> `install_manual.md`.~~ (This script is unstable on versions other than PS 2020)

doc/modules/detector.md

0 → 100644

+60 −0

Original line number	Diff line number	Diff line
		# Ballon Translator: Detector Modules

		- [Ballon Translator: Detector Modules](#ballon-translator-detector-modules)
		- [General Information about Detectors](#general-information-about-detectors)
		- [Available Detector Modules](#available-detector-modules)
		- [CTD (Comic Text Detector)](#ctd-comic-text-detector)
		- [Starriver Detector (Tuanzi Manga Detector)](#starriver-detector-tuanzi-manga-detector)

		---

		## General Information about Detectors

		* Detector modules are responsible for identifying and locating text regions within an image. These detected regions are then typically passed to an OCR module for text recognition.
		* CTD (Comic Text Detector) is based on the [comic-text-detector](https://github.com/dmMaze/comic-text-detector) project. It is designed for detecting text in comics and manga.
		* Starriver Detector (Tuanzi Manga Detector) is the text detection component of [Starriver Cloud (Tuanzi Manga OCR)](https://cloud.stariver.org.cn/). It is an online service requiring account credentials.

		---

		## Available Detector Modules

		### CTD (Comic Text Detector)

		* Source: [comic-text-detector](https://github.com/dmMaze/comic-text-detector)
		* Purpose: Designed for detecting text specifically in comics and manga images.
		* Operation: Local, offline processing.

		Settings Fields:

		* detect_size: Specifies the size of the input image for the detection model. Available options: `896`, `1024`, `1152`, `1280`. Larger sizes may improve detection of small text but increase processing time and resource usage.
		* det_rearrange_max_batches: Controls the maximum number of batches for rearranging detected text boxes. Options: `1`, `2`, `4`, `6`, `8`, `12`, `16`, `24`, `32`. Adjusting this parameter can affect memory usage and processing efficiency, especially for images with a large amount of text.
		* device: Choose between `CPU` or `CUDA` for processing. `CUDA` is recommended for faster detection if a compatible NVIDIA GPU is available.

		---

		### Starriver Detector (Tuanzi Manga Detector)

		* Provider: [Starriver Cloud (Tuanzi Manga OCR)](https://cloud.stariver.org.cn/). This is the detection component of the Tuanzi OCR service.
		* Requires Account: Yes, requires a Starriver Cloud account, username, and password for API access.
		* Operation: Online, API-based text detection.

		Settings Fields:

		* User: Your Starriver Cloud account username.
		* Password: Your Starriver Cloud account password.
		* Security Note: Passwords are stored in plain text. Be cautious when using on shared or public computers.
		* expand_ratio: Controls the expansion ratio for detected text boxes. A value like `0.01` expands the boxes slightly, potentially capturing more of the text and surrounding area.
		* refine: Enable text refinement processing. May improve the quality of detected text regions.
		* filtrate: Enable text filtration processing. May help to filter out noise or unwanted elements from detected regions.
		* disable_skip_area: Disables the use of predefined skip areas during text detection.
		* detect_scale: Scaling factor for text detection. Increase for better detection of small text.
		* merge_threshold: Threshold for merging detected text boxes into text blocks. Adjust to control how close text boxes need to be to be grouped together.
		* low_accuracy_mode: Enable a faster, less accurate detection mode for quicker processing.
		* force_expand: Forcefully expand detected text regions, potentially capturing more text.
		* font_size_offset: Offset value for font size detection. (Further details on how this affects detection are needed).
		* font_size_min (set to -1 to disable): Minimum font size for text detection. Set to `-1` to disable minimum size filtering.
		* font_size_max (set to -1 to disable): Maximum font size for text detection. Set to `-1` to disable maximum size filtering.
		* font_size_multiplier: Multiplier for font size detection. (Further details on how this affects detection are needed).
		* 更新 Token (Update Token): Button to manually update the API authentication token for Starriver Cloud.

		---

doc/modules/inpaint.md

0 → 100644

+93 −0

Original line number	Diff line number	Diff line
		# Ballon Translator: Inpainting Modules

		[Table of Contents](#table-of-contents-inpainting)

		- [Ballon Translator: Inpainting Modules](#ballon-translator-inpainting-modules)
		- [General Information about Inpainting](#general-information-about-inpainting)
		- [Available Inpainting Modules](#available-inpainting-modules)
		- [AOT Inpainter](#aot-inpainter)
		- [Lama Inpainters (lama\_large\_512px, lama\_mpe)](#lama-inpainters-lama_large_512px-lama_mpe)
		- [lama\_large\_512px](#lama_large_512px)
		- [lama\_mpe](#lama_mpe)
		- [PatchMatch Inpainter](#patchmatch-inpainter)
		- [OpenCV- телеа Inpainter](#opencv--телеа-inpainter)

		---

		## General Information about Inpainting

		* Inpainting modules are used to fill in masked regions of an image, often to remove text or other unwanted elements and reconstruct the background.
		* AOT Inpainter is sourced from the [manga-image-translator](https://github.com/zyddnys/manga-image-translator) project.
		* Lama Inpainters (`lama_large_512px`, `lama_mpe`) are fine-tuned models based on the [LaMa](https://github.com/advimman/lama) inpainting technique.
		* PatchMatch Inpainter utilizes a modified version of the [PatchMatch algorithm](https://github.com/vacancy/PyPatchMatch) from [PyPatchMatch](https://github.com/vacancy/PyPatchMatch), adapted by [dmMaze](https://github.com/dmMaze/PyPatchMatchInpaint).

		---

		## Available Inpainting Modules

		### AOT Inpainter

		* Source: [manga-image-translator](https://github.com/zyddnys/manga-image-translator) project.
		* Technology: Based on the AOT-GAN architecture (Adaptive Operation Transformer Generative Adversarial Network).
		* Suitable for: General inpainting tasks, potentially optimized for manga/anime style images.

		Settings Fields:

		* inpaint_size: Specifies the processing size for the inpainting model. Available options may include: `512`, `768`, `1024`, `1536`, `2048`. Larger sizes may offer better inpainting quality but require more computational resources.
		* device: Choose between `CPU` or `CUDA` for processing. `CUDA` is recommended for faster inpainting if a compatible NVIDIA GPU is available.

		---

		### Lama Inpainters (lama\_large\_512px, lama\_mpe)

		* Technology: Based on the [LaMa (Large Mask Inpainting)](https://github.com/advimman/lama) model architecture.
		* Characteristics: Lama models are known for their ability to handle large and irregular masks effectively.

		#### lama\_large\_512px

		* Model Size: "Large" model variant.
		* Input Size: Optimized for input images around 512x512 pixels. While it can handle larger images, performance and quality might be best around this size.
		* Available Inpaint Sizes: `512`, `768`, `1024`, `1536`, `2048`.

		Settings Fields:

		* inpaint_size: Specifies the processing size. Options: `512`, `768`, `1024`, `1536`, `2048`.
		* device: Choose `CPU` or `CUDA`.
		* precision: Allows selecting the numerical precision for computation. Options might include `bf16` (BFloat16), `fp16` (Float16), `fp32` (Float32). Lower precision like `bf16` or `fp16` can speed up processing and reduce memory usage, especially on GPUs with Tensor Cores, but may slightly impact accuracy. `fp32` offers the highest precision but is generally slower and more memory-intensive.

		#### lama\_mpe

		* Model Variant: "MPE" likely refers to a specific fine-tuning or variant of the LaMa model, potentially optimized for a different type of image or mask. (Further details needed on "MPE" specifics)
		* Available Inpaint Sizes: `512`, `768`, `1024`, `1536`, `2048`.

		Settings Fields:

		* inpaint_size: Specifies the processing size. Options: `512`, `768`, `1024`, `1536`, `2048`.
		* device: Choose `CPU` or `CUDA`.

		---

		### PatchMatch Inpainter

		* Algorithm: [PatchMatch](https://github.com/vacancy/PyPatchMatch) algorithm, using a modified version [PyPatchMatchInpaint](https://github.com/dmMaze/PyPatchMatchInpaint).
		* Technology: A non-learning based algorithm that excels at fast, context-aware image completion by finding and transferring similar patches from unmasked regions to the masked area.
		* Characteristics: Generally faster than deep learning-based methods, especially for larger images, but may be less effective at hallucinating completely new content or handling complex semantic inpainting.
		* Suitable for: Filling in relatively simple or texture-based areas, extending existing patterns, and when speed is prioritized.

		Settings Fields:

		* No specific settings are listed for PatchMatch in the provided context. PatchMatch parameters are usually algorithm-specific and may be controlled internally or through more advanced, less exposed settings.

		---

		### OpenCV- телеа Inpainter

		* Technology: Utilizes inpainting methods available within the OpenCV (Open Source Computer Vision Library).
		* Characteristics: OpenCV inpainting methods are generally CPU-based and offer basic inpainting capabilities. `OpenCV-tela` is known for texture synthesis-based inpainting.
		* Suitable for: Quick and basic inpainting, potentially for less demanding tasks or as a fallback option.

		Settings Fields:

		* No specific settings are listed for OpenCV-tela inpainter in the provided context. OpenCV inpainting functions typically have limited adjustable parameters.

		---

doc/modules/ocr.md

0 → 100644

+227 −0

File added.

Preview size limit exceeded, changes collapsed.

doc/modules/translators.md

0 → 100644

+266 −0

File added.

Preview size limit exceeded, changes collapsed.

Original line number	Diff line number	Diff line
		# Ballon Translator: Detector Modules

		- [Ballon Translator: Detector Modules](#ballon-translator-detector-modules)
		- [General Information about Detectors](#general-information-about-detectors)
		- [Available Detector Modules](#available-detector-modules)
		- [CTD (Comic Text Detector)](#ctd-comic-text-detector)
		- [Starriver Detector (Tuanzi Manga Detector)](#starriver-detector-tuanzi-manga-detector)

		---

		## General Information about Detectors

		* Detector modules are responsible for identifying and locating text regions within an image. These detected regions are then typically passed to an OCR module for text recognition.
		* CTD (Comic Text Detector) is based on the [comic-text-detector](https://github.com/dmMaze/comic-text-detector) project. It is designed for detecting text in comics and manga.
		* Starriver Detector (Tuanzi Manga Detector) is the text detection component of [Starriver Cloud (Tuanzi Manga OCR)](https://cloud.stariver.org.cn/). It is an online service requiring account credentials.

		---

		## Available Detector Modules

		### CTD (Comic Text Detector)

		* Source: [comic-text-detector](https://github.com/dmMaze/comic-text-detector)
		* Purpose: Designed for detecting text specifically in comics and manga images.
		* Operation: Local, offline processing.

		Settings Fields:

		* detect_size: Specifies the size of the input image for the detection model. Available options: `896`, `1024`, `1152`, `1280`. Larger sizes may improve detection of small text but increase processing time and resource usage.
		* det_rearrange_max_batches: Controls the maximum number of batches for rearranging detected text boxes. Options: `1`, `2`, `4`, `6`, `8`, `12`, `16`, `24`, `32`. Adjusting this parameter can affect memory usage and processing efficiency, especially for images with a large amount of text.
		* device: Choose between `CPU` or `CUDA` for processing. `CUDA` is recommended for faster detection if a compatible NVIDIA GPU is available.

		---

		### Starriver Detector (Tuanzi Manga Detector)

		* Provider: [Starriver Cloud (Tuanzi Manga OCR)](https://cloud.stariver.org.cn/). This is the detection component of the Tuanzi OCR service.
		* Requires Account: Yes, requires a Starriver Cloud account, username, and password for API access.
		* Operation: Online, API-based text detection.

		Settings Fields:

		* User: Your Starriver Cloud account username.
		* Password: Your Starriver Cloud account password.
		* Security Note: Passwords are stored in plain text. Be cautious when using on shared or public computers.
		* expand_ratio: Controls the expansion ratio for detected text boxes. A value like `0.01` expands the boxes slightly, potentially capturing more of the text and surrounding area.
		* refine: Enable text refinement processing. May improve the quality of detected text regions.
		* filtrate: Enable text filtration processing. May help to filter out noise or unwanted elements from detected regions.
		* disable_skip_area: Disables the use of predefined skip areas during text detection.
		* detect_scale: Scaling factor for text detection. Increase for better detection of small text.
		* merge_threshold: Threshold for merging detected text boxes into text blocks. Adjust to control how close text boxes need to be to be grouped together.
		* low_accuracy_mode: Enable a faster, less accurate detection mode for quicker processing.
		* force_expand: Forcefully expand detected text regions, potentially capturing more text.
		* font_size_offset: Offset value for font size detection. (Further details on how this affects detection are needed).
		* font_size_min (set to -1 to disable): Minimum font size for text detection. Set to `-1` to disable minimum size filtering.
		* font_size_max (set to -1 to disable): Maximum font size for text detection. Set to `-1` to disable maximum size filtering.
		* font_size_multiplier: Multiplier for font size detection. (Further details on how this affects detection are needed).
		* 更新 Token (Update Token): Button to manually update the API authentication token for Starriver Cloud.

		---

Original line number	Diff line number	Diff line
		# Ballon Translator: Inpainting Modules

		[Table of Contents](#table-of-contents-inpainting)

		- [Ballon Translator: Inpainting Modules](#ballon-translator-inpainting-modules)
		- [General Information about Inpainting](#general-information-about-inpainting)
		- [Available Inpainting Modules](#available-inpainting-modules)
		- [AOT Inpainter](#aot-inpainter)
		- [Lama Inpainters (lama\_large\_512px, lama\_mpe)](#lama-inpainters-lama_large_512px-lama_mpe)
		- [lama\_large\_512px](#lama_large_512px)
		- [lama\_mpe](#lama_mpe)
		- [PatchMatch Inpainter](#patchmatch-inpainter)
		- [OpenCV- телеа Inpainter](#opencv--телеа-inpainter)

		---

		## General Information about Inpainting

		* Inpainting modules are used to fill in masked regions of an image, often to remove text or other unwanted elements and reconstruct the background.
		* AOT Inpainter is sourced from the [manga-image-translator](https://github.com/zyddnys/manga-image-translator) project.
		* Lama Inpainters (`lama_large_512px`, `lama_mpe`) are fine-tuned models based on the [LaMa](https://github.com/advimman/lama) inpainting technique.
		* PatchMatch Inpainter utilizes a modified version of the [PatchMatch algorithm](https://github.com/vacancy/PyPatchMatch) from [PyPatchMatch](https://github.com/vacancy/PyPatchMatch), adapted by [dmMaze](https://github.com/dmMaze/PyPatchMatchInpaint).

		---

		## Available Inpainting Modules

		### AOT Inpainter

		* Source: [manga-image-translator](https://github.com/zyddnys/manga-image-translator) project.
		* Technology: Based on the AOT-GAN architecture (Adaptive Operation Transformer Generative Adversarial Network).
		* Suitable for: General inpainting tasks, potentially optimized for manga/anime style images.

		Settings Fields:

		* inpaint_size: Specifies the processing size for the inpainting model. Available options may include: `512`, `768`, `1024`, `1536`, `2048`. Larger sizes may offer better inpainting quality but require more computational resources.
		* device: Choose between `CPU` or `CUDA` for processing. `CUDA` is recommended for faster inpainting if a compatible NVIDIA GPU is available.

		---

		### Lama Inpainters (lama\_large\_512px, lama\_mpe)

		* Technology: Based on the [LaMa (Large Mask Inpainting)](https://github.com/advimman/lama) model architecture.
		* Characteristics: Lama models are known for their ability to handle large and irregular masks effectively.

		#### lama\_large\_512px

		* Model Size: "Large" model variant.
		* Input Size: Optimized for input images around 512x512 pixels. While it can handle larger images, performance and quality might be best around this size.
		* Available Inpaint Sizes: `512`, `768`, `1024`, `1536`, `2048`.

		Settings Fields:

		* inpaint_size: Specifies the processing size. Options: `512`, `768`, `1024`, `1536`, `2048`.
		* device: Choose `CPU` or `CUDA`.
		* precision: Allows selecting the numerical precision for computation. Options might include `bf16` (BFloat16), `fp16` (Float16), `fp32` (Float32). Lower precision like `bf16` or `fp16` can speed up processing and reduce memory usage, especially on GPUs with Tensor Cores, but may slightly impact accuracy. `fp32` offers the highest precision but is generally slower and more memory-intensive.

		#### lama\_mpe

		* Model Variant: "MPE" likely refers to a specific fine-tuning or variant of the LaMa model, potentially optimized for a different type of image or mask. (Further details needed on "MPE" specifics)
		* Available Inpaint Sizes: `512`, `768`, `1024`, `1536`, `2048`.

		Settings Fields:

		* inpaint_size: Specifies the processing size. Options: `512`, `768`, `1024`, `1536`, `2048`.
		* device: Choose `CPU` or `CUDA`.

		---

		### PatchMatch Inpainter

		* Algorithm: [PatchMatch](https://github.com/vacancy/PyPatchMatch) algorithm, using a modified version [PyPatchMatchInpaint](https://github.com/dmMaze/PyPatchMatchInpaint).
		* Technology: A non-learning based algorithm that excels at fast, context-aware image completion by finding and transferring similar patches from unmasked regions to the masked area.
		* Characteristics: Generally faster than deep learning-based methods, especially for larger images, but may be less effective at hallucinating completely new content or handling complex semantic inpainting.
		* Suitable for: Filling in relatively simple or texture-based areas, extending existing patterns, and when speed is prioritized.

		Settings Fields:

		* No specific settings are listed for PatchMatch in the provided context. PatchMatch parameters are usually algorithm-specific and may be controlled internally or through more advanced, less exposed settings.

		---

		### OpenCV- телеа Inpainter

		* Technology: Utilizes inpainting methods available within the OpenCV (Open Source Computer Vision Library).
		* Characteristics: OpenCV inpainting methods are generally CPU-based and offer basic inpainting capabilities. `OpenCV-tela` is known for texture synthesis-based inpainting.
		* Suitable for: Quick and basic inpainting, potentially for less demanding tasks or as a fallback option.

		Settings Fields:

		* No specific settings are listed for OpenCV-tela inpainter in the provided context. OpenCV inpainting functions typically have limited adjustable parameters.

		---