An easy to use free web service to extract text from PDFs and other documents - OCR support included!
Give Me Text is an online service for converting many complex file formats into simple text. This is useful for all sorts of things, especially in the area of document processing and indexing. Using the form above you can upload any file and see what the Apache Tika software behind the site makes of it. The service will even run Optical Character Recognition on image formats in order to give you text from images. The list of file formats supported is long and can be found here.
Behind the site is an instance of the Apache Tika Server, which takes the files and processes them with the Tika engine. The endpoint for the complete API is at http://givemetext.okfnlabs.org/tika. The endpoint for the text extraction service is at http://givemetext.okfnlabs.org/tika/tika, but there is a bunch of other useful services too. Check out the Tika Documentation for full details, and don't forget the extra 'tika' in the path compared to the documentation.
Here's a simple example using curl to get text from a TIF image:
curl -T my_image.tif http:///beta.offenedaten.de:9998/tika
That command uses a
PUT request. If you'd like to
POST, for example using a web form, you can use http://givemetext.okfnlabs.org/tika/tika/form.
Note that there is some special advice on using OCR with the Tika server here.
Current TODOs include: