จำนวนหน้าตามหัวข้อ: [1 2] > | help needed on converting PDF to Word format เธรดต่อผู้เขียนข้อความ: Luke Mersh
| Luke Mersh สหราชอาณาจักร Local time: 17:36 ภาษาสเปน เป็น ภาษาอังกฤษ
Dear colleagues.
I have been sent some PDFs which are more like scanned documents saved as PDFs, but my problem is that when I convert them to Word format they are still like images, so I am unable to to a word count without re-typing the PDF into a word document.
Can anybody tell me if there is a way to convert these image type PDFs into a word document without re-typing the whole document, so I am able to get a word count.
many thanks | | | ..... (X) Local time: 01:36
Hi Luke,
The technology you are looking for is called OCR (optical character recognition). It will convert scanned PDFs or images to text format. The quality of the extraction will depend both on the tool you use and the quality of the PDF (is the scan clear, is it straight, is it high enough resolution, etc.).
The most popular tool for this is probably made by the company Abbyy. There are also a few free options from other companies.
Kevin | | | Luke Mersh สหราชอาณาจักร Local time: 17:36 ภาษาสเปน เป็น ภาษาอังกฤษ TOPIC STARTER
Thank you.
My only concern is that some of the PDFs are ECG graphs with printed results on them.
regards | | | Chris Pr สหราชอาณาจักร Local time: 17:36 ภาษาเยอรมัน เป็น ภาษาอังกฤษ Another two options to play with... | Aug 19, 2015 |
Hello Luke,
Kevin was entirely correct in his previous comment about optical recognition being the only applicable solution to your original question.
Two options you might like to (web search) check out:
Nuance PDF
Nitro PDF
Whether your ECG's will display correctly is entirely debatable, but these have been the best two performers I'm aware of to date.
And whether you're prepared to upload potentially sensitive docs for con... See more Hello Luke,
Kevin was entirely correct in his previous comment about optical recognition being the only applicable solution to your original question.
Two options you might like to (web search) check out:
Nuance PDF
Nitro PDF
Whether your ECG's will display correctly is entirely debatable, but these have been the best two performers I'm aware of to date.
And whether you're prepared to upload potentially sensitive docs for conversion remains another matter entirely. That said, trial versions for download are also available for (more discrete) local conversion offline. NDA items should never, ever be trusted to cloud solutions (not to mention TM's, termbases or any other material you'd consider sensitive or private).
On that last note, applicable to all translators generally, never underestimate that cloud=share (you're no longer in control of the information uploaded), period, full stop.
By the same simple equation, convenience=trade-off.
Best of British,
Chris ▲ Collapse | |
|
|
esperantisto Local time: 19:36 สมาชิก (2006) ภาษาอังกฤษ เป็น ภาษารัสเซีย + ... SITE LOCALIZER
Luke Mersh wrote:
when I convert them to Word format they are still like images
And how do you actually perform the conversion? | | | neilmac สเปน Local time: 18:36 ภาษาสเปน เป็น ภาษาอังกฤษ + ...
Nitro or Omipage are the best conversion programs I've found - I find Nitro is easier/less complicated to use. However, if they are scanned PDFs the results might never be optimum.
If the texts are short, I might prefer to retype/recreate the texts in Word. Or explain to the client that the format is causing problems and you will need to charge extra, or extend the agreed deadline, unless they can provide you with the text in a more workab... See more Nitro or Omipage are the best conversion programs I've found - I find Nitro is easier/less complicated to use. However, if they are scanned PDFs the results might never be optimum.
If the texts are short, I might prefer to retype/recreate the texts in Word. Or explain to the client that the format is causing problems and you will need to charge extra, or extend the agreed deadline, unless they can provide you with the text in a more workable format...
http://www.nuance.com/for-individuals/by-product/omnipage/index.htm
[Edited at 2015-08-19 09:32 GMT] ▲ Collapse | | | Paula Darwish สหราชอาณาจักร Local time: 17:36 สมาชิก (2013) ภาษาเติร์ค เป็น ภาษาอังกฤษ + ... OCR software and different alphabets | Aug 19, 2015 |
In my experience, some are better than others at recognising different alphabets so you need to try the software on your particular language. They can probably all do English OK but in my translation language (Turkish) Omnipage is best one I have found for recognising the characters of the Turkish alphabet. | | |
For ES-EN language pair, any commercially available OCR software should be OK. AFAIK the prices for a single computer licence are approx. EUR 100 - 150. With that price and with approx. 7 page a minute capacity, the return on investment is very quick. The OCR quality is also very high but it still depends on the image quality.
I 've recently worked on a PDF file with English text scanned from 34 paper pages. The character count was 53,500. The Spellcheck found not more than 30 error... See more For ES-EN language pair, any commercially available OCR software should be OK. AFAIK the prices for a single computer licence are approx. EUR 100 - 150. With that price and with approx. 7 page a minute capacity, the return on investment is very quick. The OCR quality is also very high but it still depends on the image quality.
I 've recently worked on a PDF file with English text scanned from 34 paper pages. The character count was 53,500. The Spellcheck found not more than 30 errors.
Of course, for more complex page layouts (tables etc.), more manual work is still needed in order to give the output DOC file the 'as-original' look.
Let me repeat esperantisto's question:
Luke, what do you mean in: "my problem is that when I convert them to Word format they are still like images, so I am unable to to a word count without re-typing the PDF into a word document."?
That's not an image-to-text conversion, for sure.
Rest regards
AM ▲ Collapse | |
|
|
Do you really need to convert the files? | Aug 19, 2015 |
In this situation (assuming you don't manage to get a good result with the OCR software) I would just quote a rate based upon the target language word count and type the translation into a Word file. Agencies have always been happy with this approach. | | | Chris Pr สหราชอาณาจักร Local time: 17:36 ภาษาเยอรมัน เป็น ภาษาอังกฤษ Very true... | Aug 19, 2015 |
Very true for agency work.
But direct clients can be charged a premium for providing a fully translated "clone" of the original PDF, complete with all charts, diagrams, images etc. perfectly formatted as in the original document - albeit in the docx format that these conversion softwares tend to export. | | | One more option | Aug 20, 2015 |
I usually work with ABBYY FineReader and the quality is very good.
My two cents. Good luck!! | | | Luke Mersh สหราชอาณาจักร Local time: 17:36 ภาษาสเปน เป็น ภาษาอังกฤษ TOPIC STARTER Abbyy finereader | Aug 20, 2015 |
After reading your posts.
I had already done a webinar on OCR, so I have decided to use the trial of Abbyy Finereader, which seems to do a good job. | |
|
|
just quote a rate based upon the target language...? | Aug 20, 2015 |
Rachel Waddington wrote:
In this situation (assuming you don't manage to get a good result with the OCR software) I would just quote a rate based upon the target language word count and type the translation into a Word file. Agencies have always been happy with this approach.
So, do you think an agency would wait until the job is done, and only then the translator payment would be calculated, and the customer would know how much they should pay?
Well, in my country the agencies normally know the job size, whether in words or characters, when they ask translators for availability. A good agency is expected to have and use a reliable OCR software just in order to tell their customer the price.
It's very unusual for an agency not to know the job size as the work time and all invoices are dependent thereon. Can happen when all the staff is young and unexperienced - but just once, and not again.
Let's not allow ourselves to do the agencies' job! Remember that they take a significant share of what the customers pay.
Regards
AM
[Edited at 2015-08-20 09:12 GMT] | | | Platary (X) Local time: 18:36 ภาษาเยอรมัน เป็น ภาษาฝรั่งเศส + ...
Andrzej Mierzejewski a écrit :
A good agency is expected to have and use a reliable OCR software just in order to tell their customer the price.
So "a good agency" should be able to send the translator an editable text, or?
If not, it's not "a good agency". And it's the case.
[Modifié le 2015-08-20 09:27 GMT] | | |
Andrzej Mierzejewski wrote:
Rachel Waddington wrote:
In this situation (assuming you don't manage to get a good result with the OCR software) I would just quote a rate based upon the target language word count and type the translation into a Word file. Agencies have always been happy with this approach.
So, do you think an agency would wait until the job is done, and only then the translator payment would be calculated, and the customer would know how much they should pay?
Well, in my country the agencies normally know the job size, whether in words or characters, when they ask translators for availability. A good agency is expected to have and use a reliable OCR software just in order to tell their customer the price.
It's very unusual for an agency not to know the job size as the work time and all invoices are dependent thereon. Can happen when all the staff is young and unexperienced - but just once, and not again.
Let's not allow ourselves to do the agencies' job! Remember that they take a significant share of what the customers pay.
Regards
AM [Edited at 2015-08-20 09:12 GMT]
Yes, in cases where the agency cannot provide an editable text I would always propose invoicing based on the target text and this has never been a problem. It's becoming less common nowadays, but still happens occasionally. In any case I would regard it as the agency's job to do the OCRing, not mine. Direct clients are a different thing, obviously. | | | จำนวนหน้าตามหัวข้อ: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » help needed on converting PDF to Word format TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| LinguaCore | AI Translation at Your Fingertips
The underlying LLM technology of LinguaCore offers AI translations of unprecedented quality. Quick and simple. Add a human linguistic review at the end for expert-level quality at a fraction of the cost and time.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |