contact  |  about  |  sitemap

Web Viewer Problem
Last Post 11 Aug 2009 10:50 AM by Umesh. 4 Replies.
Sort:
PrevPrev NextNext
Author Messages
Umesh

--
29 Jul 2009 12:15 PM
Hi all,

I am trying to parse data from some web pages in XML file, as I need to design data quite similar to native/original website, so I m using some simple swapping logic which replaces all bold values with <b></b> tags similarly in case of <li></li> and <ul></ul>. So that I can insert bullet and number in my web pages which calls this XML file.

But it seems the output is incorrect XML file, I tried to find out what the exact problem is, and finally I found the bug may be due to Web Viewer, there are some tags which web viewer replaces with null, so my script is not able to find our those null tags.

e.g. In case of IE and Mozilla the code is -:

<ul>
<li>Financial Accounting</li>
<li>Managerial Accounting</li>
<li>Cost and Management Accounting</li>
<li>Income Tax Accounting </li>
<li>Auditing </li>
<li>And More!</li>
</ul>


but in our web viewer this portion is coming as -:

<ul>
<li>Financial Accounting
<li>Managerial Accounting
<li>Cost and Management Accounting
<li>Income Tax Accounting
<li>Auditing
<li>And More! </li></ul>

So my output XML is incorrect.

Please replace these XML values with their original tags. Otherwise the browser will treat all left values as Strings.

<br/> : <br/>
<ul> : <ul>
</ul> : </ul>
<li> : <li>
<li> : </li>


Please find attachments, suggestion would be appreciated.

Thanks,
Umesh


chowan_University.zip

lameuse

--
29 Jul 2009 02:10 PM
Yes, this can be quite annoying. Djuggler uses IE's DOM code, which differs from IE's "view source", Firefox etc. One thing is that it often removes quotes around a value, but apparently it also removes </li> occasionaly.

When you start the scraping script, use Djuggler's own browser (at least for comparison) to copy and view HTML. I usually have a couple of browser's and HTTP viewers open, all suited at a different need (catching POSTS for example). But for Copy Text Betweens in djuggler scripts, for safety use the buildin browser.

In this case, scrape between the <li>'s by doing a Copy Text Between with "<li>" as pre string and "#13</li>" as post string (#13=<ENTER>). You only need to figure out how to get the last one, which is pended with an </li>.

Hope this clarifies things a bit...


Umesh

--
03 Aug 2009 09:15 AM
I also think this can be solved only through programming, I also found similar cases where web viewer skips some closing tags or opening tags,
Is it possible that your team can remove such kind or problem in next release?

Thanks for your kind support.

--Regards
Umesh


Tijn

--
03 Aug 2009 11:42 AM
The IE DOM is typically something you don't want to mess with.

If you don't need any dynamic events like the Execute Javascript or Fill Form field, you can also use the Open Page IE source action instead of the Open Web Page.

Then you have the same source as you would with IE menu: view source. It will also speed up your browsing, because this action just retrieves the page source from the web server without the overhead of pictures stylesheets and dynamic events.


Umesh

--
11 Aug 2009 10:50 AM
Thanks :)




Quick Reply
toggle
  Username:
Subject:
Body:
Security Code:
Enter the code shown above:

Submit

Powered by Active Forums

Forum participation and optional registration

You don't need to be registered to partcipate in the Djuggler forums, however if you want to subscribe to email notifications you need to register. You can also subscribe to the forum RSS feed.