contact  |  about  |  sitemap

Wait for Content...Better Option
Last Post 11 May 2010 10:53 AM by mazel. 3 Replies.
Sort:
PrevPrev NextNext
Author Messages
Don Lee

--
07 May 2010 02:44 PM
I have a script that loops through a drop down box and extracts data.

The problem I am having is; once it selects a new item in dropdown, the page takes a while to load depending on the size of the data, there is nothing on each new page that makes it different from the last except of the data in the table ("wait for content" not possible). I tried a delay/refresh source function but impossible to time since size of data and speed of connection play a part..

Are there any tricks of the trade i can use to have the script wait for an "on web page reload" or "on source refresh" or anything else that moves the script forward without specific content necessary
mazel

--
10 May 2010 11:30 AM
You have run into the trickier part of web scraping: waiting for dynamic events. Unfortunately, the web browser itself doesn't have an event that fires when the update is complete. So you have to test for changes to be sure and if you can't or don't want to do that just wait for a sufficient amount of time and /or build in a retry mechanism.

Solution:
1. Use WaitForContent for the disappearance of what is now in the dropdown (typically "choose...") or appearance of any new content you know will be there. The web page source will be refreshed automatically after this action.
2. Just wait 10 secs for the update. Pretend everthing went well after this, and extract data as you would. THEN TEST the data. If the data appears valid, all is well. If the data is invalid, go back from the start: reload page, set the dropdown again. Do this max 3 times, after 3 times: permanent error (retry mechanism).
don lee

--
10 May 2010 02:48 PM
thanks Mazel. I toyed with some options to solve this problem. I did a; wait for content to disappear (so its looking for a potential refresh), followed by a wait to content to appear(wait for refresh to finish) before proceeding forward. This worked on 13 of 15 drop down box choices.

But the other 2 are extremely large amounts of data that needs time to populate. But it appears all at once and not in pieces. So it seems the page source will disappear but the webpage stays put for a while then the webpage reloads all at once. Even though I have a wait for content to appear, the script will run before the page reload finishes. So the extraction fails..

in this script, I have a form fill drop down selection, then a click submit button. I cant test for data cause I dont know what it will return and the drop-down box choice is no where in the source (nothing that distinguishes this data from any past drop down selection, just the data that I extract which I will never know what it is so cant isolate with that)

Thanks for the input, I think having a wait for webpage to reload should be an action and I will request it..

mazel

--
11 May 2010 10:53 AM
Well, seems like you have a difficult situation here. I do not have a ready made solution for this. You have to tinker around to get it working. Seems like not using Wait for Content, and testing manually for changes using a loop and refresh browser source is the best way to do this. You will be able to catch the reload then as well. However: you will need a distinguisable change in content as a flag to know when the page is correcly loaded. Failing that, the only thing to do is wait for a really long time.

Alternatively, look at post variables that are send back to the server (with a spy tool) and do a Post Web Page directly to circumvent the dropdown action. Post web page will wait for the page to finish.


Quick Reply
toggle
  Username:
Subject:
Body:
Security Code:
Enter the code shown above:

Submit

Powered by Active Forums

Forum participation and optional registration

You don't need to be registered to partcipate in the Djuggler forums, however if you want to subscribe to email notifications you need to register. You can also subscribe to the forum RSS feed.