AWS tools can finally play nicer with Word and PDF documents

File manager
(Image credit: Pixabay)

Amazon Web Services (AWS) is launching into the fight to cut down on unnecessary paperwork with a new tool that allows much easier document scanning.

The cloud computing giant has revealed that its AWS Comprehend tool will now be able to quickly and painlessly scan a range of business documents, saving time and resources for businesses of all sizes.  

Going forward, AWS Comprehend will be able to quickly scan the likes of PDF, Word and raw text documents to quickly extract the key information - cutting down on the need for extra paper.

AWS scans

In a blog post announcing the launch, AWS says the new feature will combine natural language processing (NLP) and Optical Character Recognition (OCR) to help reduce the amount of preprocessing or post-processing required to process documents.

The tool can process a number of different document layouts, including not just blocks of text (as had previously been the case for AWS Comprehend), but also lists or bullets in PDF and Word files.

Users can also integrate custom named entity recognition (NER) on more document types without needing to convert their files to raw text, greatly streamlining the process.

AWS says the new feature can be useful across a wide range of business use cases, helping save huge amounts of time previously needed to preprocess documents. For example, finance, mortgage and insurance documents in a variety of layouts and formats can now be quickly scanned and processed to draw out the key information needed.

Users will still need to meet some baseline requirements to utilise the service, which can't just be used for a single file, but starts at 250 documents and 100 annotations per entity type so that AWS can train a model that meets your needs. 

"The information locked within documents is important to business operations and by using AI, you can now automate the process while reducing manual efforts and improving productivity, which delivers answers to customers faster," wrote Andrea Morton-Youmans, a Product Marketing Manager on the AI Services team at AWS in a seperate blog post announcing the news.

Mike Moore
Deputy Editor, TechRadar Pro

Mike Moore is Deputy Editor at TechRadar Pro. He has worked as a B2B and B2C tech journalist for nearly a decade, including at one of the UK's leading national newspapers and fellow Future title ITProPortal, and when he's not keeping track of all the latest enterprise and workplace trends, can most likely be found watching, following or taking part in some kind of sport.

Read more
Shelves filled with folders of paper.
Best scanning software of 2025
A shelf filled with folder files
Best OCR software of 2025
Adobe Acrobat AI Assistant contract intelligence
Adobe's AI assistant can now decipher contract jargon in your PDFs
OCR on the scan app of the Onyx Boox Palma
Best document scanning app of 2025
Concept art representing cybersecurity principles
Safeguarding your digital information from cyber-attacks
Woman using Guided Frame to read document
Best document management software of 2025
Latest in Pro
cybersecurity
What's the right type of web hosting for me?
Security padlock and circuit board to protect data
Trust in digital services around the world sees a massive drop as security worries continue
Hacker silhouette working on a laptop with North Korean flag on the background
North Korea unveils new military unit targeting AI attacks
An image of network security icons for a network encircling a digital blue earth.
US government warns agencies to make sure their backups are safe from NAKIVO security issue
Laptop computer displaying logo of WordPress, a free and open-source content management system (CMS)
This top WordPress plugin could be hiding a worrying security flaw, so be on your guard
construction
Building in the digital age: why construction’s future depends on scaling jobsite intelligence
Latest in News
L-mount alliance
Sirui joins L-Mount Alliance to deliver its superb budget lenses for Leica, DJI, Sigma and Panasonic cameras
Security padlock and circuit board to protect data
Trust in digital services around the world sees a massive drop as security worries continue
Samuel and Romy standing very close together in A24's Babygirl movie
Everything new on Max in April 2025, including A24's Babygirl and The Last of Us season 2
An AMD Radeon RX 9070 XT made by Sapphire on a table with its retail packaging
AMD’s secret weapon against Nvidia seems to be stock – way more RX 9070 GPUs are rumored to be hitting shelves than RTX 5000 models
Hacker silhouette working on a laptop with North Korean flag on the background
North Korea unveils new military unit targeting AI attacks
Seth Milchick and Kier Eagan's animatronic speaking in Severance season 2 episode 10
Apple TV+ announces Severance has been renewed for season 3 after that devastating finale