MemoQ inserts space before/after every tag, wrong date/time format. How to fix/run a fix?
Thread poster: Mary McKee
Mary McKee
Mary McKee  Identity Verified
United States
Local time: 17:18
Spanish to English
May 27, 2020

I'm using MemoQ 8.3.8 to post edit .mqxliff files that a client has machine translated and sent to me. I'm having some issues with the files and wondered if you could help me figure out how to batch fix them to save a lot of time:

- Every single segment with a tag in the source has inserted a space before/after each tag in the MT output, which I must go through and manually delete. Every. Single. Tag.
- Every date is in the wrong date format (should be in format 26 May 2020, n
... See more
I'm using MemoQ 8.3.8 to post edit .mqxliff files that a client has machine translated and sent to me. I'm having some issues with the files and wondered if you could help me figure out how to batch fix them to save a lot of time:

- Every single segment with a tag in the source has inserted a space before/after each tag in the MT output, which I must go through and manually delete. Every. Single. Tag.
- Every date is in the wrong date format (should be in format 26 May 2020, not May 20, 2020)
- Every time is in the wrong time format (client has a preferential time format that is nonstandard)

These are likely caused by setting on their end that I cannot fix. I have requested that the client change their settings but of course I'm just one linguist and they have not replied or made any changes to the documents. I'm wasting so much time on these little fixes that I wonder whether I can fix it myself or at least run some kind of regex on the files when I receive them to fix all the errors at once.

I have never ever used RegEx so I'm not even sure how I would do this. But there has GOT to be a way. I'm spending multiple hours every day fixing these tiny things that the computer should be able to do on its own.

Please help!
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 01:18
Member (2006)
English to Afrikaans
+ ...
@Mary May 27, 2020

Mary McKee wrote:
- Every single segment with a tag in the source has inserted a space before/after each tag in the MT output, which I must go through and manually delete. Every. Single. Tag.


This sounds like something Google Translate does (and perhaps certain other machine translators, too). Google Translate doesn't understand what tags are, and so treats them like words, and words have spaces. I know this information doesn't help you.

I'm not sure if there is a setting in MemoQ that can improve this or not. I've never used MT from inside MemoQ -- does MemoQ normally have this problem when it uses MT?

It may be quicker to edit the text in MS Word. Try experimenting with the various options of bilingual review. Right-click the file, Export > Export Bilingual > Table RTF.

Are you allowed to machine translate the text yourself, and then edit that? Or must you use the client's machine translated machine translation?


[Edited at 2020-05-27 20:48 GMT]


Stepan Konev
 
Mary McKee
Mary McKee  Identity Verified
United States
Local time: 17:18
Spanish to English
TOPIC STARTER
unfortunately no... May 27, 2020

"Are you allowed to machine translate the text yourself, and then edit that? Or must you use the client's machine translated machine translation?"

Thanks for weighing in. Unfortunately I'm not allowed to do my own MT :'( It would be so much faster if so. I don't know what tool they're using for the MT, they could just be using another MT tool and then requiring me to use MemoQ? I've also not used the MemoQ MT for my own purposes.

I like the idea to use Word, except that
... See more
"Are you allowed to machine translate the text yourself, and then edit that? Or must you use the client's machine translated machine translation?"

Thanks for weighing in. Unfortunately I'm not allowed to do my own MT :'( It would be so much faster if so. I don't know what tool they're using for the MT, they could just be using another MT tool and then requiring me to use MemoQ? I've also not used the MemoQ MT for my own purposes.

I like the idea to use Word, except that I have at least a 16,000 segment TM that I can use to help speed up SOME of this work, and I wouldn't be able to have things auto-populate through the documents if I did it in word.

I wish I could figure out how to run a command like:

check if source has spaces around tag
if spaces are mismatched in target, follow source spacing
run

If Only I were a computer programming whiz :/
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 03:18
English to Russian
Sad but true May 28, 2020

MT is not supposed to process tags as Samuel mentioned above. By accepting MTPE jobs you agree to undertake all those issues.
I prefer to remove all tags (Ctrl+F8) before MTing the text. I think pressing one button to insert a tag is better than pressing several buttons to remove leading and trailing spaces or even move tags within the sentence.

What regards replacing, you can use the following regex:

Find what: (January|February|March|April|May|June|July|August|S
... See more
MT is not supposed to process tags as Samuel mentioned above. By accepting MTPE jobs you agree to undertake all those issues.
I prefer to remove all tags (Ctrl+F8) before MTing the text. I think pressing one button to insert a tag is better than pressing several buttons to remove leading and trailing spaces or even move tags within the sentence.

What regards replacing, you can use the following regex:

Find what: (January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{1,2}),\s(\d{4})
Replace with: $2 $1 $3
*Don't forget to enable the regex mode

This will change 'May 20, 2020' to '20 May 2020'.
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 01:18
Member (2006)
English to Afrikaans
+ ...
@Mary May 28, 2020

Stepan Konev wrote:
MT is not supposed to process tags as Samuel mentioned above. ... I prefer to remove all tags (Ctrl+F8) before MTing the text.


Some CAT tools do take steps to ensure that the spacing around tags are correct. I myself, when I machine translate text in Word, use a macro that adds dummy characters to spaces, which then confuses Google Translate in a more predictable way, and then a macro just removes them (and unnecessary spaces) afterwards. I know that OmegaT's actually compares source and target before and after sending content to the machine translator, to ensure that the spaces are dealt with before presenting the text to the translator. I was sure most other CAT tools do it, too. This is why I thought that perhaps Mary's client did not use MemoQ itself for the machine translation (i.e. they exported the file to some other format, translated that file externally, and then imported the "translation" back into MemoQ.

Mary McKee wrote:
I like the idea to use Word, except that I have at least a 16,000 segment TM that I can use...


I did not mean that you should do the translation in Word, but rather than you should export some (or all) of the yet-to-edit segments to Word so that you can remove all the space next to tags using a few find/replace operations, and then import it back into MemoQ. This will remove *all* spaces, but... inserting spaces is a lot quicker than deleting spaces, don't you agree?

Are you aware that you can move your cursor faster using Ctrl+arrow? Fortunately, in MemoQ, using Ctrl+arrow always gets the cursor to the start of an object, or to after a space, predictably (not all editors do that predictably). Out of interest, when you want to remove spaces manually, do you move your cursor using the keyboard keys, or do you use the mouse to click in the right places?



[Edited at 2020-05-28 08:25 GMT]


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 03:18
English to Russian
Regex to remove all leading and trailing spaces around tags May 28, 2020

Samuel Murray wrote:
OmegaT actually compares source and target before and after sending content to the machine translator, to ensure that the spaces are dealt with before presenting the text to the translator.

Right. Because tags are presented as plain text in OmegaT. <t0/> etc.

@Mary McKee:
You can use this regex to remove all leading and trailing spaces around tags:

Find what: (\s*)(<.*?>)(\s*)
Replace with: $2

This will remove all whitespaces before and after any tag. If you still need whitespaces for some specific tags, don't use 'Replace all'

[Edited at 2020-05-28 09:11 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 01:18
Member (2006)
English to Afrikaans
+ ...
Regex May 28, 2020

Stepan Konev wrote:
Find what: (\s*)(<.*?>)(\s*)
Replace with: $2


FWIW, "*" means "match zero or more" and "+" means "match 1 or more", so for me the regex only works if I change \s* to \s+ (although what you use will affect what happens, obviously). The "?" is supposed to make the expression lazy, but it doesn't prevent the regex from selecting multiple tags next to each other (although this isn't a problem because we retain $2 anyway).

Mary, I'm not sure how it works in your version of MemoQ, but in my case, I have to press Ctrl+H twice to bring up the "advanced" Find/Replace dialog, and select the option "Search within tags as well". Thanks, Stepan, I did not know that one could search tags or search within tags.


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 03:18
English to Russian
Bingo May 28, 2020

Samuel Murray wrote:
FWIW, "*" means "match zero or more"

This is exactly what we need to process all tags that may have or may not have a whitespace character before, or after, or before and after them. If you put + instead of *, some tags may remain with spaces.
According to my understanding, Mary wants all spaces around tags gone.

[Edited at 2020-05-28 09:45 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ inserts space before/after every tag, wrong date/time format. How to fix/run a fix?






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »