Step 1: Create your credentials and setup your environment In this tutorial covers the basics of how to run your first PDF Services API OCR operation using sample files for Node.js, Java, and. Or apply OCR to your PDFs from uploaded scans to allow them to be edited for use in onboarding workflows.ĭevelopers can get started in just a few minutes with the ready to run sample files provided for OCR. Create searchable archives from scanned PDF repositories to unlock important information and save time with quick searchability. Using our powerful cloud-based APIs, integrate OCR into any document workflow for the perfect solution to archiving, copying text, and creating searchable document indexes. With OCR (Optical Character Recognition) you can unlock scanned PDFs to extract text and create searchable files. ![]() Documents which vary greatly in the type and number of fonts will show less return on file size.Using Adobe PDF Services API to OCR PDF files Multiple page documents will show the greatest file size reduction. It does not rely on system fonts or any other font that may be installed on your system.įor single page typical legal documents, you may not see much difference in file size. ClearScan files are generally an accurate representation of the original document.ĬlearScan creates a custom font to match the character shape. To our knowledge, there has never been a challenge. Are ClearScan files admissible in court?.The Touchup Text Tool does not currently work on ClearScan files. Can I make changes to the text in a ClearScan file?.However, since a ClearScan files are so much smaller, you might consider using a 600 dpi input file as a starting point since there is little downside other than processing time. The accuracy will be identical for input files of the same dpi. Is OCR accuracy any different between ClearScan and Searchable Image styles?.Here are a few answers to the most common questions about ClearScan OCR. Change the PDF Output Style to ClearScan.Choose: Document-> OCR Text Recognition-> Recognize Text using OCR.How can I try ClearScan OCR?ĬlearScan OCR is not the default in Acrobat 9, so you'll need to change a setting to use it. Instead of sending large images to the printer, Acrobat can send the compact font information instead. In fact, if you run ClearScan OCR and choose File-> Document Properties and click on the Fonts tab, you'll see that custom fonts are created:īesides better visual appearance, print time is reduced. Rather, a custom font it is created to match the visual appearance of the pixels. Each character on the page is compared and all matching characters are replaced with a an outline character:ĬlearScan does not replace the font with your system fonts. How does ClearScan work?ĬlearScan works by turning the images which represent text characters on the page into smoothed vector outlines. At 600 dpi, the ClearScan file was seven times smaller and looked better. Visual Results and Total File SizeĪt 300 dpi, ClearScan offered improved visual quality at about one-third the total file size. The test machine has an IBM standard 320GB laptop hard drive running at 7200 rpm. The W500 is a current model laptop which runs an Intel Core 2 Duo CPU at 2.8 GHz. In addition to Acrobat, I also had Excel running. The test machine ran Vista Enterprise in 32-bit mode and has 4GB of RAM. I ran OCR and compared file sizes on my ThinkPad W500. 78-page image-only PDF document scanned at 600 dpi.78-page image-only PDF document scanned at 300 dpi.Read on to learn about size comparisons, how to use ClearScan OCR and a bit more about how it all works. I've recently completed some benchmarking which shows dramatic file size decreases and quality gains. ClearScan offers improved text quality with a decrease in file size: In Acrobat 9, Adobe engineers added a new flavor of OCR called ClearScan. ![]() 300 dpi) increases file size about three to four times.īecause of the image-heavy content, searchable image PDFs can take a long time to print.Īt 300 dpi, scanned documents are easily distinguishable in quality from computer-generated files. Scanning at higher resolutions (600 dpi Vs. Searchable Image OCR has some shortcomings:įor 300 dpi black and white scans, a typical file size is 15-40K per page. Searchable Image retains the underlying scanned image and adds an invisible layer of text on top, which may be selected: While OCR accuracy and language support have improved over the years, the default OCR "flavor"- Searchable Image- was the only useful choice. This technology has been available in Acrobat for about ten years. Optical Character Recognition ( OCR) converts scanned paper documents into searchable PDF documents.
0 Comments
Leave a Reply. |