contact  |  about  |  sitemap

Problems with Get Web Page DOM vs HTML
Last Post 04 May 2010 12:41 PM by mazel. 3 Replies.
Sort:
PrevPrev NextNext
Author Messages
tr3online

--
26 Apr 2010 12:27 AM
Hi ladys and gents,
I'm trying to scrape a foreign website and am running into some issues with the language encoding in respects to DOM and HTML viewing.
If I try to scrape in DOM, the language is correct, but the line breaks are not preserved. If I view in HTML, the line breaks are preserved, but the language comes out in a mix of gibberish characters.
Is there any way for me to scrape data from a website, preserve the line breaks, and also scrape the right character encoding?
Tijn

--
26 Apr 2010 09:39 AM
Hi,
What is the URL?
tr3online

--
26 Apr 2010 08:16 PM
mazel

--
04 May 2010 12:41 PM
If I try IE and put page encoding to automatic, I get:

test1=15&test2=潮(しゆ)ん満ち里(さとぅ

Could that be right? It seems like an page encoding thing. The unicode is there, but you need the right encoding page to read it. Does that make sense?


Quick Reply
toggle
  Username:
Subject:
Body:
Security Code:
Enter the code shown above:

Submit

Powered by Active Forums

Forum participation and optional registration

You don't need to be registered to partcipate in the Djuggler forums, however if you want to subscribe to email notifications you need to register. You can also subscribe to the forum RSS feed.