Word count having 2 different languajes in source document
เธรดต่อผู้เขียนข้อความ: Ana Lopez
Ana Lopez
Ana Lopez  Identity Verified
สหรัฐเม็กซิโก
Local time: 23:43
สมาชิก (2013)
ภาษาอังกฤษ เป็น ภาษาสเปน
+ ...
Jun 4, 2014

Hello!!

I'm working on a PDF document that has German/English in two "columns" and I only have to translate the English part, do you know any way I can ONLY count the English words?

Trados has statistics, but I don't know if there is a tool to count by language.

The only way I can think of is to count them manually. Do you know anything faster?

Thank you.


 
Jack Doughty
Jack Doughty  Identity Verified
สหราชอาณาจักร
Local time: 06:43
ภาษารัสเซีย เป็น ภาษาอังกฤษ
+ ...
เพื่อระลึกถึง
Convert to Word Jun 4, 2014

You can convert it to Word using an OCR. Abbyy fine Reader and Abbyy PDF Converter come to mind.

 
Ana Lopez
Ana Lopez  Identity Verified
สหรัฐเม็กซิโก
Local time: 23:43
สมาชิก (2013)
ภาษาอังกฤษ เป็น ภาษาสเปน
+ ...
TOPIC STARTER
Can Word count by language? Jun 4, 2014

Thanks! I already converted it to Word however, since the columns are mixed with images I cannot just "select" the English column. Thus asking if there is any other way than by marking page by page. Maybe there isn't, just asking

 
Tony M
Tony M
ฝรั่งเศส
Local time: 07:43
ภาษาฝรั่งเศส เป็น ภาษาอังกฤษ
+ ...
SITE LOCALIZER
Are languages set? Jun 4, 2014

When you did the conversion using OCR, were you able to set the languages of the relevant bits?

If the text DOES have its 'language' attributes correctly set, then you can do an ordinary word count in Word; then search and replace all for 'any character' + language attribute = (say) German, replacing with nothing.

Then do another word count, and this will be the EN words without the German ones; in fact, you don't even need to have done the preliminary word count, I was
... See more
When you did the conversion using OCR, were you able to set the languages of the relevant bits?

If the text DOES have its 'language' attributes correctly set, then you can do an ordinary word count in Word; then search and replace all for 'any character' + language attribute = (say) German, replacing with nothing.

Then do another word count, and this will be the EN words without the German ones; in fact, you don't even need to have done the preliminary word count, I was just thinking of subtracting the EN from the total, since TOTAL – EN = German, of course!

Naturally, if the language attribute was NOT correctly set in the first place, this won't work; but at least you'll know for next time.

BTW, you say that the images are stopping you from selecting all the EN column, but why? Are they in merged cells or something? You ought to be able to process your table in such a way as to unmerge all the cells, which will probably push all the images into the l/h column or something, but will leave you with two clean columns you can select properly.

Your are SURE it is in a proper Word table? OCR conversions have a nasty habit of 'organizing' (well, that's not what I call it...) text into newspaper-style columns, in which case you'll have a harder job on your hands trying to sort it out. It might even be simpler to convert everything to single-column and remove all column breaks from the document, and then see what you have left...
Collapse


 
Tony M
Tony M
ฝรั่งเศส
Local time: 07:43
ภาษาฝรั่งเศส เป็น ภาษาอังกฤษ
+ ...
SITE LOCALIZER
Failing that... Jun 4, 2014

...if the original document really is organized neatly into two columns, why not just do another 'dummy' OCR run on it, selecting ONLY the EN column as you go through, so you'll actually have a document at the end of it that ONLY contains the EN you need to translate; you might even be able to use this for your translation, or at worst, it will be a useful intermediate stage for your word count.

[Modifié le 2014-06-04 20:58 GMT]


 
Ana Lopez
Ana Lopez  Identity Verified
สหรัฐเม็กซิโก
Local time: 23:43
สมาชิก (2013)
ภาษาอังกฤษ เป็น ภาษาสเปน
+ ...
TOPIC STARTER
I'll try the option Jun 4, 2014

I'll try making a dummy OCR conversion, from Abbyy, only identifying English as language, and see how it goes with the find & replace.

Thank you so much Tony M.!!


 
Ümit Karahan
Ümit Karahan  Identity Verified
ตุรกี
Local time: 08:43
ภาษาอังกฤษ เป็น ภาษาเติร์ค
+ ...
Paste only text Jun 5, 2014

Hi.

Try to copy the all by Ctrl+A, Ctrl+C and then choose to paste it as text only in a blank word page. So you can get rid of images.



[Edited at 2014-06-05 01:14 GMT]


 


To report site rules violations or get help, contact a site moderator:

ผู้ไกล่เกลี่ยของฟอรัมนี้
Maya Gorgoshidze[Call to this topic]
Prachya Mruetusatorn[Call to this topic]

You can also contact site staff by submitting a support request »

Word count having 2 different languajes in source document






Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »