|
|
|
|
  |
|
Web Viewer Problem
Last Post 11 Aug 2009 10:50 AM by Umesh. 4 Replies.
|
Sort:
|
Umesh
 |
| 29 Jul 2009 12:15 PM |
|
Hi all, I am trying to parse data from some web pages in XML file, as I need to design data quite similar to native/original website, so I m using some simple swapping logic which replaces all bold values with <b></b> tags similarly in case of <li></li> and <ul></ul>. So that I can insert bullet and number in my web pages which calls this XML file. But it seems the output is incorrect XML file, I tried to find out what the exact problem is, and finally I found the bug may be due to Web Viewer, there are some tags which web viewer replaces with null, so my script is not able to find our those null tags. e.g. In case of IE and Mozilla the code is -: <ul> <li>Financial Accounting</li> <li>Managerial Accounting</li> <li>Cost and Management Accounting</li> <li>Income Tax Accounting </li> <li>Auditing </li> <li>And More!</li> </ul> but in our web viewer this portion is coming as -: <ul> <li>Financial Accounting <li>Managerial Accounting <li>Cost and Management Accounting <li>Income Tax Accounting <li>Auditing <li>And More! </li></ul> So my output XML is incorrect. Please replace these XML values with their original tags. Otherwise the browser will treat all left values as Strings. <br/> : <br/> <ul> : <ul> </ul> : </ul> <li> : <li> <li> : </li> Please find attachments, suggestion would be appreciated. Thanks, Umesh
|
chowan_University.zip
|
|
|
|
lameuse
 |
| 29 Jul 2009 02:10 PM |
|
Yes, this can be quite annoying. Djuggler uses IE's DOM code, which differs from IE's "view source", Firefox etc. One thing is that it often removes quotes around a value, but apparently it also removes </li> occasionaly. When you start the scraping script, use Djuggler's own browser (at least for comparison) to copy and view HTML. I usually have a couple of browser's and HTTP viewers open, all suited at a different need (catching POSTS for example). But for Copy Text Betweens in djuggler scripts, for safety use the buildin browser. In this case, scrape between the <li>'s by doing a Copy Text Between with "<li>" as pre string and "#13</li>" as post string (#13=<ENTER>). You only need to figure out how to get the last one, which is pended with an </li>. Hope this clarifies things a bit... |
|
|
|
|
Umesh
 |
| 03 Aug 2009 09:15 AM |
|
I also think this can be solved only through programming, I also found similar cases where web viewer skips some closing tags or opening tags, Is it possible that your team can remove such kind or problem in next release? Thanks for your kind support. --Regards Umesh |
|
|
|
|
Tijn
 |
| 03 Aug 2009 11:42 AM |
|
The IE DOM is typically something you don't want to mess with. If you don't need any dynamic events like the Execute Javascript or Fill Form field, you can also use the Open Page IE source action instead of the Open Web Page. Then you have the same source as you would with IE menu: view source. It will also speed up your browsing, because this action just retrieves the page source from the web server without the overhead of pictures stylesheets and dynamic events.
|
|
|
|
|
Umesh
 |
| 11 Aug 2009 10:50 AM |
|
Thanks :) |
|
|
|
|
|
  |
 |
 |
 |
|
|
|
|