How to scan books into ebooks creating ePub for Kindle?
Books are friends for most of us. After the pandemic, people who have the habit of reading are the most stable, as there are millions of them who are getting unstable due to not getting allowed to step out of their house. This is due to getting bored passing their time as opposed to utilizing.
But, how often have you thought, especially just before going to bed…if you had a handy device, it would make you more comfortable right from holding it to changing the size of the font according to your comfort.
Some people might not prefer it, but, eventually, everyone has to get used to it. As time flies, it is inevitable to avoid such gadgets. Let us in this article learn how to convert a physical book from scanning to converting it to an ebook.
Scanning Books
There are many types of scanners to scan photos, books, films, using these different types of scanners, depending on your requirement and time. If you want to fasten up the process, then you might have to cut open the spine of the book.
If you have enough time and you don’t want to have your books ripped, then there is an intact type of scanning where you don’t cut open the spine of the books available.
These are the overhead scanners that have cameras on each side that capture the pages of the books separately and store them on the computer.
There are other slower ways like scanning on the bed of flatbed scanners. This type of scanning is useful, especially with book covers. The Epson scanners scan these covers and store them as color photos.
All these types of scanning just scan the pages and store them in the computers. This is not all we want to enable searchability and convert them to eBooks.
It is the software that we discuss further in this article that enhances the pages of the scans, to keep them ready for the conversion.
Enhancing Pages
By enhancing pages, we mean improving the pages with respect to clarity and increasing searchability. It is here that you enable searchability by using open-source software.
The best and the most often used software is ScanTailor. This is a boon to the people who have any kind of book and who want to convert them to eBooks.
We have seen people surprised to see the result of the scans that we do at ScanJunction. ScanTailor increases the following qualities of the book that helps improve the conversion of image to text.
You can check out a step-by-step process on how to use ScanTailor in this article.
Sharpness
The most important part of the text enabling process is the sharpness of the text. The better the sharpness of the scanned text, the better the accuracy of the text.
Distributed Light
The second important thing while scanning and enhancing the pages is the distributed light. If the light is bright at a certain place and dark at some other place, then the fonts might blot.
This for sure makes it difficult for the software engines to detect the text accurately, resulting in poor accuracy of the converted text.
Straight text
Though this does not matter to the text conversion to that extent as the previous points, it is important that we maintain the content in a straight line as not doing so might impact the readability.
Once you enhance the pages, the next step is to enable searchability. Let us see what exactly this means and how do we do this in the next section.
Enabling Searchability
By enabling searchability, we mean that we are converting the images that we have just scanned and enhanced to text. This process is done using OCR.
OCR is expanded as Optical Character Recognition. This is a program or a software engine that runs on the images and converts the pixels into text.
The more intelligent the software is, the better the accuracy of the conversion. In addition, the quality of the scans in terms of resolution and enhancements in terms of the sharpness of the text matters the most here.
There are tons of OCR software that can do this job. This article covers a step-by-step process of doing this. We will explain to you the overall picture of how this is done.
We will, in this article explain to you how to use one such useful software that is open source.
The basic engine is called Tesseract OCR. There are various Graphical User Interfaces (GUIs) available for this engine and one such GUI is gImageReader, which runs both on Windows and Linux.
Mac Users can try this software engine called PDF OCRX from this link.
Some developers have built this GUI for lame people like us to use the beautiful software for free. So, let us see how to use this.
- Open a set of images that we have just scanned or the directory where there are these images.
- The images will be displayed as pages one after the other on the left sidebar
- You can either select all the images together or select individual images by clicking the Ctrl key
- Once you select the pages you want to read, you can click on the Recognize Selection from the top middle button
- This starts running the OCR engine on the text, only to start converting the scanned image to text
- Once you have all the images recognized, you can check the extracted image on the right side of the application
- You can now export the text in formats like a text file or PDF
Converting to eBooks
Once you have the text extracted into text, it has to be in a format that is readable in devices like kindle and other such small devices.
The disadvantage of PDFs in small devices is that you cannot change the font size or alignment. This is due to the fact that the alignment and typeface are stored in the PDF.
They are a part of the PDF and are rigid. For this reason, if we are considering reading the eBooks using a smaller device, it is better to convert these books into an ePub.
Epub is a format that fits into any device. The alignment of the text automatically gets adjusted to the size of the device. So is the font size.
You can see the below images to better understand these two formats.
But before creating the ePub, you will have to proofread the text that you just extracted from the scanned copy.
You can skip proofreading, if you are fine reading the book with spelling mistakes here and there.
You can do this by copying the text and pasting them in a browser that has Grammarly installed. Grammarly is a browser extension that corrects grammar and spelling mistakes.
Paste the text and keep fixing the errors (with only the ones that make sense).
Once you fix them, you can use applications like Sigil ePub creators and paste them there.
Create required headings and save the file as an ePub.
You can export to other devices like kindle by sending an email to your kindle email id.
This will be it. You just now converted by scanning your old physical book into an eBook adding them to your favorite device like Kindle. It is up to you to select the required scanner, resolution of the pages, converting to text using a reliable OCR software, proofreading, and finally, a software converting to ePub.