contact  |  about  |  sitemap

Important Concepts

Djuggler is a hands-on scripting tool that simplifies collecting and integrating data from various sources into the output you require. To achieve a particular task in Djuggler, you create a script. In this script, actions are carried out one by one. Actions can import data into sources, for example from a text file or a web page. The source is then filled with the content of the file or page. From these sources, you can disect the bits of information or data you need into variables. You can further use actions to change or adapt the content of these variables. When the information you are after is thus cleaned, you can store these variables into the format you need - for example an Excel sheet, structured text file (CSV), HTML or database.

Text Manipulation

Text manipulation is at the heart of Djuggler's functionality. In many tasks where data source are combined or manipulated you need to interpret, compare or adjust text. There are two sets of text functions available through the Text Manipulation and Source Text Manipulation action categories.

Often, you would use a Source variable (like a Text File or Web Page) to import data into Djuggler. You would then scan the Source content and transfer the data you need into variables. Maybe then you would like to clean the data, compare it to other data and convert it into the right format. At the end you export the data in the desired format, again using sources or, sometimes, variables.

Introduction

Text you want to manipulate in Djuggler can be contained in Text variables, Text File sources or Web Page sources. To get text in a Text variable, you use the Set Variable action. To get text from a text file, use the Open Text File action in combination with a TextFile source. To get to the source of a web page, use the Open Web Page action using a Web Page source.

Obtaining data from semi-structured sources like HTML

Djuggler is a scripting engine that simplifies collecting and integrating data from various sources into the output you require. To mold data into the form you need, djuggler uses its text manipulation abilities. Often input sources (like text files and web pages) are text based. These sources can be structured (like in a database), unstructured (a word document) or semi-structured (like a plain-text file carrying addresses, an XML file, an HTML table). It is the semi-structured type of input source we will be mainly concerned here. Many text documents and all web pages are semi-structured in nature. Djuggler is build to simplify your task to make use of such sources. With it, you can make use of a vast amount of both offline and online information that could enhance your business process and information.

The get a grasp of this semi-structured nature, think of an HTML table. The web page source is in plain-text, but an HTML table is sort of structured non the less. It could look like

<table>
<tr><td>Name</td><td>John</td></tr>
<tr><td>Name</td><td>Annie</td></tr>
</table>


Djuggler's engine makes it easy to extract information structured in this way. To get at the name "John" for example you could use the action

Copy Text Between WebPage1 "<tr><td>Name</td><td>" "</td>" TextVar

Which would copy "John" into the variable TextVar. This is a basic way to get data from an semi-structured source.

Sources, variables and cursor position

If you work with a Source in Djuggler, it remembers the cursor position within source text that you are manipulating. An action is applied from the beginning of the content of a source. Any next action is then applied from the position where the previous action left it. With Variables, no cursor position is remembered; any action is always applied from the beginning of the text. 

Say we have the following text in a Source: [CP] one two one , where [CP] indicates the cursor position. Now we apply the same action twice on a source, watch where the cursor position goes.

[CP]one two one
apply action Find Text in Source "one"
one[CP] two one
apply action Find Text in Source "one"
one two one[CP]


Now we repeat the same actions, but we use a variable instead. You see we would never reach the second "one".

[CP]one two one
apply action Find Text "one"
[CP]one two one
apply action Find Text "one"
[CP]one two one

This is why the Find Text action has a property Starting Position, enabling you to search further in the text. You probably have to keep a cursor position variable manually for such a tast. That is why with repetitive tasks in large texts, it often pays to use the Source Text Manipulation actions.

Text manipulation actions compared

There is a division between actions that work with sources exclusively, and actions that work with sources and variables. Both versions might try to achieve a similar goal (for example, to copy text), but work differently to get the result.
 
The following actions work exclusively with Sources. They work with the internal cursor position that is changed according to actions and kept with the Source object. That means that you do not have to keep a cursor position manually when working with large texts.

Find Text in Source
Copy Text from Source
Copy Text from Source Between
Insert Text in Source
Match Text in Source

The actions below work with both Sources and Variables. When using the actions below, you have to indicate a cursor position (and keep it) to tell where to insert, copy and so on. When using a Source (Text File or Web Page) the internally kept cursor position is not changed.

Find Text from Position
Copy Text from Position
Copy Text Between
Insert Text at Position
Replace Text
Match Text
Match and Replace Text

Using Wild cards

You can use the "*" asterix wild-card to indicate a text of zero or more length, containing any character. Suppose you would have the following HTML are we are interested in copying the price data ($120):

<td width="100" valign="left">$120</td>

We could copy that by using the following action:

Copy Text Between WebPage1 "<td*>" "</td>" TextVar

In fact, you could copy every cell in a HTML table by using the line above repetitively, in a loop for example. The use of the asterix work in the following actions Find Text in Source, Find Text from Position, If (contains text), Loop While (contains text). There are a few functions that work with regular expressions where the asterix will not work in this fashion: Loop Matched Text, Match Text, Match Text in Source and Match Text and Replace.

Using Regular Expressions

Regular expressions can be a very powerfull way of matching for text patterns. Djuggler supports regular expressions in a number of functions. Those functions are: Loop Matched Text, Match Text, Match Text in Source and Match Text and Replace. If you want to find out more about regular expressions, have a look at www.regular-expressions.info or try a Google search.

Using special characters

Sometimes you will want to search for a special character, like a TAB, end-of-line or ENTER (CR) character. You can do so by inserting "#<ASCII character code>#" into a text. For example, an enter character can be found using "#13#", a TAB character using "#9#". Replacements work only in variables of the type Text. For example, in the action Find Text, the first property is a Text variable (called "Text"). Typing in "#9#" in the edit box next to the property would have you search for a TAB character.

Loop Structures

The basic loop structure is the action called Loop. Any loop ends with an End Loop action further down in the script. Between the Loop and End Loop you place the actions you want to be looped. This loop keeps on repeating until it is exited by an Exit Loop action. The way to do this is to use an If action. So a basic loop structure looks like this:



The other loop types you will find in the Loop category have the condition build into them, so they will not run indefinitely. Take for example the Loop While action. In the next example, the loop will continue until variable N reaches 10.



Both ways bring the same result. It is up to you to choose the one that fits your purpose best. A related action that lets you jump through a script is the Goto Label action. This action works together with the Label action. What the Goto Label action does is simply jump to the label with the proper name. You can use Goto Label to jump in and out of loops. If you jump back within a loop that was already running, the loop continues from the point where it was left. If you use Goto Label to jump before a loop, when entering the loop this loop is reset. Consider the following script:



This piece of script will show you a message displaying "1" very time. As the Goto Label jumps back to the BEFORE_LOOP Label, it resets the loop back to the start of the range (1). In the next script piece, you will see messages displayed from 1 to 5. This is because you jump back in the loop, and the status of the loop is remembered from where it was left.

IF Structures

The if structure lets you make a disicion in your script. For example to stop a loop, or to store or not to store a variable, anything realy. The example below shows a message if variable A is equal to variable B.



You can also include an Else action that is carried out if the condition was not met:

Working with Web pages

If you want to get data of a web page, there are a few preferred ways of working with Djuggler. They are explained below.

If you are going to be working with web pages, you will need at least one WebPage source. The web page source will store the state of the web page, its HTML source and URL. You can use the text actions from the Source Text Manipulation category to edit the HTML source of a Web Page. Like all sources, cusor posistion is retained when using these actions.

Browsing to a web page

The most easy way to navigate to a certain page is to use the Open Web Page action. Just type the URL (www.yahoo.com for example) in the URL property, and that page will be loaded into the WebPage source.

The most versatile way to navigate to a web page and obtain its source is to use the Run Web Macro action. Insert a Run Web Macro action, and double-click it to bring up the Properties window. Click the [...] button next to the Recorder property to open a special browser window.

In this browser window you can do a couple of things. You can navigate to a web page by typing in a URL such as "http://www.yahoo.com" and clicking Go. In the browser window, you can click and browse to go to other pages, just like you would in a normal browser. On the top you have the Record/Stop, Play and Save and Exit Recording buttons. If you click on the Record button from that moment on, the browser will record your clicks and navigate actions. As events are recorded, you will see them appearing in the event box. Stop will stop the recorder, and Play lets you replay the actions you have just recorded.



You can use a web macro to start a new web page navigation; for example: navigate to mail.yahoo.com and login to your mail account. Recording this whole process gives you a macro that will navigate to your mail box, including logging in. Also you can use a web macro to a mouse or keyboard event on the current web page. That current web page would depend on the WebPage source used in the Run Web Macro action. If you want to start with a fresh web page, always start with a NAVIGATION event (type in the URL and click Go).

Looping web pages

Often, data on web pages is divided over a range of succesive pages. Search results in Google are an example of this. The first result page displays the first ten results found, the second page will hold results 11 to 20, and so on. On the bottom of such a page, you will find links to further result pages, often looking like: 1|2|3 >Next. Clicking on the Next link gives the the next page of results. This Next link we call the Next Page Button. That link is exploited by the Loop Web Pages action to loop through the set of all result pages so the data on them can be collected.

A Loop Web Pages loop should always be proceeded by a Open Web Page or Run Web Macro action, which was explained above. We should always do this because this action only performs a single click (on a link) in a web page. The click is performed on the current page of the Web Page source.

Insert a Loop Web Pages action in a script and double-click it to open up the special browser window. In this browser window you can record the click on a link (the "next button"). The Next Page Button Recorder is a special web browser with which you can very easily set the Next Page Button used by the loop. With this browser, first browse to the page containing the first set of results. Then, click on the "next" link to indicate what link the loop should use to reach a following page. The Next Page Button can be either a link or a form button. Open up the special browser by clicking on the small button by the Next Page Button Recorder.



Go to www.google.com if the browser isn't already there, and search for "cars". Then, click the Record button and then, in the web page, click the Next link (see picture). You will see a pop-up telling you that djuggler has recognized that you have clicked a link. This link will be use as next page button. Click Stop and then click Save Recording and Exit.

Once the next button link is set, the loop will continue until no more pages are encoutered. The basic loop for getting data of multiple web pages looks like this: