Finding and translating all subdirectories in an HTML file
Thread poster: Paul Lambert
Paul Lambert
Paul Lambert  Identity Verified
Sweden
Local time: 18:16
Member (2006)
Swedish to English
+ ...
Sep 25, 2020

I suspect this will be an obvious question to you younger tech-savvy types out there.
Lately, I have been getting plenty of jobs in translating websites. Often, the client does not provide me with a nice set of folders containing all the html text, but rather just a link to the web page he wants translated. Now, if it is just the one page that needs translating, it is simple enough to go to "view page source" source and get the html. However, I have now been asked to translate all of a rat
... See more
I suspect this will be an obvious question to you younger tech-savvy types out there.
Lately, I have been getting plenty of jobs in translating websites. Often, the client does not provide me with a nice set of folders containing all the html text, but rather just a link to the web page he wants translated. Now, if it is just the one page that needs translating, it is simple enough to go to "view page source" source and get the html. However, I have now been asked to translate all of a rather elaborate site containing many pages, including pages with links that map to pages with other links to pages with other links etc. I could use brute force and map out each page and gather the "view page source" for each page individually, but that would be painstaking and prone to me missing something. I must believe there is an easy way to go about it. So for instance if I went to a site called http://paulspage.com, I need to get all the page source for that page and all the subpages and the subpages of the subpages etc etc.

Any ideas?
Collapse


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 17:16
Danish to English
+ ...
Expression Web Sep 25, 2020

You could use Microsoft Expression Web 4 (the successor to FrontPage) to import the site, as described at https://www.expression-web-tutorials.com/import-site-wizard.html .

Expression Web is now free to download:
... See more
You could use Microsoft Expression Web 4 (the successor to FrontPage) to import the site, as described at https://www.expression-web-tutorials.com/import-site-wizard.html .

Expression Web is now free to download: https://answers.microsoft.com/en-us/windows/forum/all/microsoft-expression-web-4-download/e6a4eba5-2d7e-4eed-8fab-c945a83215c4 .
Collapse


 
Paul Lambert
Paul Lambert  Identity Verified
Sweden
Local time: 18:16
Member (2006)
Swedish to English
+ ...
TOPIC STARTER
Thanks Sep 25, 2020

Thanks, Thomas. I will check it out right now.

 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 17:16
Danish to English
+ ...
PS Sep 25, 2020

It's old software, but it still works, also on Windows 10.

Be sure to confirm with the client exactly which files with how many words you intend to translate. If they use advanced techniques such as SQL, Expression Web may not find them all.


Paul Lambert
 
Paul Lambert
Paul Lambert  Identity Verified
Sweden
Local time: 18:16
Member (2006)
Swedish to English
+ ...
TOPIC STARTER
Worked like a charm Sep 25, 2020

Thanks again. What great advice. This software is excellent.

And yes, I will confirm on Monday that everything is included. This is an enormous task. No point missing anything.

Have a great weekend.


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 17:16
Danish to English
+ ...
Glad it worked Sep 25, 2020

Thanks, you too.

 
Sheila Wilson
Sheila Wilson  Identity Verified
Spain
Local time: 17:16
Member (2007)
English
+ ...
My experience has been 100% negative Sep 25, 2020

The first couple of times I tried to gather all the text to work on, the client complained that I'd missed some and I had to do a rush job -- unpaid -- to complete it to their satisfaction. So then I insisted that the client (a communications agency) select the text. They grumbled but came up with it. A while after delivery, they came back with a hyper-urgent request for more text to be worked on. This time they'd missed it, and this time they had to pay my rush rate! I've since always insisted ... See more
The first couple of times I tried to gather all the text to work on, the client complained that I'd missed some and I had to do a rush job -- unpaid -- to complete it to their satisfaction. So then I insisted that the client (a communications agency) select the text. They grumbled but came up with it. A while after delivery, they came back with a hyper-urgent request for more text to be worked on. This time they'd missed it, and this time they had to pay my rush rate! I've since always insisted on receiving the text in Word or Excel files.Collapse


Endre Both
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 18:16
Member (2006)
English to Afrikaans
+ ...
A web site ripper, I imagine Sep 25, 2020

Paul Lambert wrote:
I must believe there is an easy way to go about it. So for instance if I went to a site called http://paulspage.com, I need to get all the page source for that page and all the subpages and the subpages of the subpages etc etc.


Yes, there are such utilities (web site rippers, strippers, or sometimes "offline browsers"), and 10-20 years ago when the web was younger, they were fairly reliable tools. However, web sites are no longer simple and web servers are no longer all the same, so many of these web site ripper programs no longer work as expected or promised.

A well-known free one is HTtrack, but I've never had good results with it. I've had reasonable results with VWget for ripping large archives (say, 10 000 HTML files in nested subfolders), but it's not easy to use (I've had most success with the commandline version).

See also my post here where I recommend Web Downloader 2.2, which you can still find on some download sites if you look really hard. I just tried it again, and it still works for simple sites. I've uploaded it here for 7 days.

[Edited at 2020-09-25 17:29 GMT]


 
Endre Both
Endre Both  Identity Verified
Germany
Local time: 18:16
English to German
Have the client send you the source files Sep 25, 2020

Approaching it from the public (Internet) side of things as web rippers do is absolutely the wrong way to go. Your client has access to all the source files (unless they want to translate a third party's site without their knowledge), even if they may not be aware of this.

So you need to get them to send you all source files.
For static websites, this is a matter of copying all files from an FTP server.
For dynamic websites, they have to export the strings from the dat
... See more
Approaching it from the public (Internet) side of things as web rippers do is absolutely the wrong way to go. Your client has access to all the source files (unless they want to translate a third party's site without their knowledge), even if they may not be aware of this.

So you need to get them to send you all source files.
For static websites, this is a matter of copying all files from an FTP server.
For dynamic websites, they have to export the strings from the database that is used to dynamically generate the site.

None of this is your business – you need to insist on being provided with all relevant files without ripping them from a website. As Sheila says, this also puts the onus on them to catch all content.

When you have got all files, you need to check what types they are and how to best translate them.
Collapse


Platary (X)
Recep Kurt
Sara Massons
 
Paul Lambert
Paul Lambert  Identity Verified
Sweden
Local time: 18:16
Member (2006)
Swedish to English
+ ...
TOPIC STARTER
Thanks. Forget the answer I just erased. Sep 25, 2020

I just seemed like a jerk. I meant to say, thank you.

So, yes, thank you. Indeed, I will try to get the HTML files in question from the client, and if that does not work, then as a second resort I will use what I got from the software discussed above.

Take care!

[Edited at 2020-09-26 18:31 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Finding and translating all subdirectories in an HTML file







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »