contact  |  about  |  sitemap

extract data from nested table
Last Post 16 Mar 2009 08:32 AM by Tijn. 5 Replies.
Sort:
PrevPrev NextNext
Author Messages
noviceUser

--
06 Mar 2009 02:44 PM
Hi,

I am new to Djuggler and just downloaded the software yesterday. I am trying to extract data from a website's frames (http://depatisnet.dpma.de/ipc/cipc....0&sci=i00) and go through each link inside the page to get other data. The script to try and get data from table is attached.

The problem I got is that the data I am trying to get from page table is not completed. I tried to throw in more tables just in case, but data in each table field does not show up properly still. Is my script wrong? Or is it a bug in Get Table Content?

Thank you in advance for replying.

Taywin

test_app.djs

Tijn

--
08 Mar 2009 12:58 PM
Hi, it seems the table you want is not a valid table. That is why the get table action does not work.

In your page the the table has cells without a closing tag. Like the second TD tag below:

<TR class=ipctitle>
<TD class=ipctitle><A name=tA01></A></TD>
<TD class=ipctitle>
<TD class=ipctitle><SPAN class=ipctitletext>Landwirtschaft</SPAN></TD></TR>

Please find attached an example script that loops rows and tables with a copy text between from source action to get the table content.

Regards,
Tijn

dpma.djs

noviceUser

--
09 Mar 2009 11:33 AM
Thank you. I will try it.

By the way, will the software deal with malform HTML table tag in the future or just leave the way it is now? The reason I ask is that I see many web sites contain malform HTML tags, and often time they are table tags.


Tijn

--
09 Mar 2009 08:25 PM
You got a point about the malformed tags, the problem is what to accept and what not?

A combination of actions like Copy Text Between with a wild card or Replace Text can always solve the problem once identified. Perhaps a Get Table Content action should only work on 'valid' tables. Otherwise how should the invalid table be transported to the Grid, which cells are left empty or not and which columns are respected when tags are missing.

In short, we are still thinking about how to handle invalid table structures, so no guarantees to fix it in a future release.


noviceUser

--
13 Mar 2009 06:25 PM
Thank you for replying. The only thing I don't want to go through each table cell to extract data because it is very expensive. You also agree that there need to be certain rules to apply with malform tags.

I do not know how the software handle table tag from DOM (especially from IE because I work more with FF or Chrome). If it is from web tag, I could make a couple suggestions about fixing malform table tag below.

1.If there is an open tag but no close tag, close the tag for it.
i.e. <table><tr><td>ABCDE<td>EFGH</tr></table>
Close the cell tag automatically when see it.
2.If there is a close tag but no open tag, ignore it.

Not sure how DOM see this.

Taywin


Tijn

--
16 Mar 2009 08:32 AM
Hi Taywin,
Thank you for your thoughts on this specific problem. Currently we are hard coding on the Unicode for Djuggler, but we have put your suggestion on a list so we won't forget your remarks.

Thanks.
Tijn




Quick Reply
toggle
  Username:
Subject:
Body:
Security Code:
Enter the code shown above:

Submit

Powered by Active Forums

Forum participation and optional registration

You don't need to be registered to partcipate in the Djuggler forums, however if you want to subscribe to email notifications you need to register. You can also subscribe to the forum RSS feed.