net.art generator



Nine Steps to Dada Conversation between Cornelia Sollfrank and Richard Leopold Forchheim, April 15, 2003		C.S.: Some weeks ago, I told you about my net.art generators and asked you if you, as a programmer and a system developer, might want to have a closer look at this artistic concept. And "taking a closer look" also meant, of course, whether you might want to create a new net.art generator. The only directions I gave were that it should be easily manageable via a web interface, and my wish was, with reference to Andrew Bulhak's Dada-engine 1, to somehow vary his program. For some time now, a net.art generator (nag_02)* has already existed, programmed by Luka Frelih which had integrated Bulhak's program. I particularly liked this version of the generator because of it's special way of using language. It didn't treat language simply as material to be appropriated 1:1, but rather, created new words and terms and wrote whole paragraphs of text. Have you warmed up to the idea of the net.art generator in the meantime? R.L.: Yes, absolutely. First of all, I studied Dada art and tried to learn what it's actually about. I got an idea of it, and currently, I'm working on a new version of the generator. C.S.: What is the basic idea of the new generator? R.L.: The basic idea is to not only recombine the material found in HTML documents, that is, text and images, within a new website with a given structure, but that the actual structure of the new document is also a dynamically processed result, using HTML tags of the found sites. C.S.: Meaning you don't define a layout which is filled with random contents like, for example, Ryan Johnston's nag_01, but that the manifestation of each newly generated page is completely different and depends on the HTML tags 2 contained in the found sites. R.L.: The content of a document is logically structured through HTML tags, meaning, on the one hand, I can extract and process the contents, but at the same time, I can also take the structuring elements and use them to deduce a new structure from them. So, first of all, I extract the content from the structure and re-use it as my Dada content by recombining it and placing it in the new structure. And the new structure is also derived from the found structures. C.S.: So the new structure could be called "Dada"-HTML or "Dada structure"! R.L.: Exactly. And I hope that it'll be legible, that is, that a browser can display it. C.S.: Wow, that's great! I think this takes up Bulhak's idea in a pretty honorable way. Could we have a look at it, and could you explain how it works on the level of the code? R.L.: We can go through the program step by step. STEP 1:* Input of a title for the new "piece of Net art" First of all, we have to give it a search term or a title. Let's make a test with the term "witch." Next STEP 2: Feeding the backends of the search engine 3 with the title. A request using the term "witch" as a search term is executed. If you use only one word and not a combination of terms or a whole sentence, the result is fairly broad. C.S.: What search engine do we use? R.L.: We use the backends of those I have the interface libraries for. From that pool of interfaces, I am currently using HotBot* and AltaVista. The pool also contains a Google interface, but there's a password required, and you need to register in order to receive permission to use the interface. The password is only free for "Personal Use" and not available for "Automated Querying". There are also various commercial licenses, but I have tried to avoid those for the moment. And of course, it depends on what interfaces currently work. As a result, we will - hopefully - get a number of URLs which point to the related documents. C.S.: Did you somehow limit what material should be considered for further processing? R.L.: Yes, this happens in STEP 3: Selection from a number of URLs. The program only fetches HTML documents, or more precisely: only files with the content type: text/html, which means no PDF, for example. The reason for using only HTML is that we need HTML tags for the further processing of our new file. C.S.: How do you limit the search results apart from the format? Certainly there are hundreds of sites as a result? R.L.: The selection for the new sites only depends on size, not at all on content. For example, only the first 30 results are used for the new site. But it doesn't have to stay that number forever; the number can be configured. And there are other parameters which limit the selection. C.S.: What is the current preset for the parameters? R.L.: The files are not allowed to be bigger than 200KB (content size). The limitation of size is one filter which ensures that the machine doesn't burst over with oversized documents. The number of the documents, 30, multiplied by the maximum size of 200KB makes 6MB of possible storage space, and that's just for the loading of the documents. C.S.: And what happens to the remaining search results? R.L.: The remaining URLs lead us directly to STEP 4: Download of HTML files. Now we are downloading the complete files from their servers, and the actual process starts with STEP 5: Parsing the HTML files. C.S.: What does "parse" mean in this case? R.L.: To parse means that the classic HTML file is interpreted and organized as a tree of knots in the RAM of the computer. So each HTML element corresponds to a knot in the tree and can have one or several "baby knots" and a "parent knot". As for the content, we can have text, image, link and many other knots, which represent the original HTML document in this abstraction. When all HTML files got parsed we come to STEP 6: Evaluation of the single knots. The content of the text-knots gets extracted and fed into the Dada-text-engine. Each knot is put into the Dada-knot-machine. At this point, certain knots are filtered out, for example, empty text-knots which make no sense as a feed. Other knots are modified so that, independently from the original document of the knot, for example, image files can be found and displayed. C.S.: Feeding the Dada-text-machine, Dada-knot-machine? What does that mean? R.L.: I think I'll have to go way back, then. The Dada-engine does not follow the principle of a static language description à la Bulhak. Bulhak's description language already contains a series of words from the text it is going to generate. In our case, we do not yet know any word we are going to use. From certain given texts, new texts will be generated. The interesting thing is that the description language also has to be generated. Since I don't know in what language the found text will be, I cannot use a formal description of the language which would only refer to a special language. Because all languages differ in grammar, first of all, we have to formally examine the language. The principle I'm using builds so-called Markov-Chains: I analyze what comes after word "A", what comes after word "B" or "C", which words come how often, etc. Then I build chains and count the relations. The result is another tree structure and I can see, aha, after "A", it's not just word "B" that's been used, but also "D" and "X". That's the way I find out what words can come after "A". The same goes for the knots, only we just use the names of the knots. The Dada-machines I mentioned have been transformed into libraries and make the functions of the Dada-machine available within the program: 'feed()' and 'get Dada()'. The functions are implemented for the remaining Dada-categories: text and knot. Building the Markov-Chains is the feeding. C.S.: How is the emerging structure currently generated? R.L.: This will be 'give Dada' in STEP 7: Generating of a new HTML document as target document of the Dada-knot-machine. Now, we are generating a random document structure, starting with the document root knot. C.S.: What is the role of "chance" in all this? R.L.: The random generator selects one of the possible "child-knots" which are located in the document root knot, and from there, the possible relations to other knots will also be randomly determined. I'm getting from knot "A" to knot "C" via a special path. The path is a result of the knot relations between the knots that have been fed in. Once one relation has been used, the program counts down its number by one so that knot "C" - if it's occurred only once in the source material - cannot be used again. The new document structure is also built by Markov-Chains. C.S.: And what are the rules for gathering the knots? R.L.: In the beginning, I had some pretty huge documents. Since then, I have limited the number of knots in the target document. Currently it comes to 777. One knot always equals one HTML tag. Let's have a look at the source code to clarify. The tags of the document are considered as knots within the object tree. The complexity, that is, the iteration depth of the tree is fixed. It's currently 13. In our case, that means that the iteration depth of 13 is the 13th branching when you move along the tree from its trunk to the branches. Of course, I never know if the resulting document structure will be able to function. It is absolutely possible that the outcome is dysfunctional, that it cannot be interpreted or displayed. But with the help of the Markov-Chains, I get much more functional structures than I would by simply mixing the knots. So we come to STEP 8: Filling text fragments from the Dada text machine into the text knots of the target file. All text knots of the target file are filled with "get Dada" results from the Dada text machine. The principle of "get Dada" for text follows the same principle as the one for knots. Finally, our art work is finished. It manifests itself in STEP 9: Saving the art work as HTML file within the file system. ... where it becomes visible for the user. C.S.: We should take a closer look at this. The document we have received as result of our first attempt looks like it contains only German text. Is that pure chance, or is the program in any way reduced to one language? R.L.: No, not at all. The generator is completely independent from language. It can handle all signs known in our language. The trouble starts with different notation systems, for example, Japanese. C.S.: What was the case with the original Dada engine? R.L.: Actually, there are different Dada engines. The one you refer to, the one by Bulhak, generates the Dada texts through the use of a formal description language which already contains the complete vocabulary. C.S.: The preliminary result we've got searching for the term "witch" also contains images. Where do they come from? Because up to now, images have not been potential content. R.L.: As I copy the knots, I automatically copy the "image knots", meaning that I insert the "image knots" randomly by creating the structure of the file. I do not have the choice as to which images to use or where to place them. They simply appear by copying the knots. C.S.: That's quite banal compared to the way the program treats texts. R.L.: From a structural aspect, it is as banal as with the texts. As you can see, there definitely is material in different languages in the memory. We have Japanese symbols and there are also special characters. But mostly, it's English texts, because the backends make their inquiries at English-language search engines. C.S.: In our resulting document, we have a whole lot of links. Where do they point to? R.L.: They point to the same destination as the original link in the original text does. The information of the link remains untouched. The formatting as well, with regard to fonts and font sizes. Basically, only the text contained in this knot is replaced by a new text fragment. This also explains why we can see so many different kinds of fonts. C.S.: What do you still have to work on? R.L.: At the moment, my main concern is checking out the basic idea and firming it up. Several modules will be designed as libraries which will later be equipped with a front end. The way the front end looks like can be decided later. You also have to figure out what makes sense, because generating new pages still takes more time. I think it should be separated from the act of actually looking at it, which means you'll give the command and return a bit later for taking a look at the result. Depending on the amount of results, it could definitely take several minutes. And it depends on the parametrization of the generator. C.S.: When you know about all the processes that happen during that time, it does not appear to take too long... R.L.: In principle, what happens is quite simple: get, take apart, rebuild, give out... you could also think about visualizing this process in the front end, as a way of passing the waiting period. C.S.: Yes, simply writing all the material on the hard disk is already quite time-consuming. R.L.: The process only takes place in the RAM until the result is written on the hard disk; this requires finding a file name which can refer to the result as a URL. And all the written material is one HTML file. C.S.: After delivering the first result file, a whole lot of unused material stays in the RAM. Could we try to create another file out of it? R.L.: Of course, let's try it! No, it doesn't work. Error message. Avoiding this error still has to be integrated into the program. C.S.: Well, I think it looks quite good already. Are you also happy with the result so far? R.L.: Basically, yes. We're absolutely up to speed with Luka's generator regarding the complexity of the result. C.S.: A difference would be that Luka's generator didn't just produce one HTML file, but a whole conglomeration of new pages, an actual website with links pointing within and outside the file. Of course, it took up to 30 minutes until you did get a result. Unfortunately, his generator seemed to be too complex to keep it running for a longer period of time. Only the archive is accessible at the moment, which shows websites produced in the past, but the generator itself isn't functioning any longer. 4 R.L.: That's why I'm trying to find a compromise between complexity, speed and what's needed to keep it functioning. It's very possible, due to the amount of data which is there anyway, to make, for example, another "link Dada", a whole website, as you said. To do so, I would not just copy the "link knots", but treat them as material, save them as link pool, and for each of the link pool knots, I could produce another file with the material from what's been found. Absolutely possible. C.S.: But this would prolong the production time enormously. R.L.: Yes, absolutely. Because, depending on the number of internal links, the program would have to produce an adequate number of files. C.S.: The question would also be whether a viewer would be at all able to comprehend and click through all the pages. R.L.: Usually, it's hard to maintain an overview with more than three levels of navigation. Maybe it'd be much more interesting for the user to produce a new page using a new title. C.S.: Or by using the same title! And comparing the results! I very much like this kind of serial production. R.L.: Meanwhile, I've generated a new page, but nothing is being displayed... It seems to be an empty page. C.S.: What's written in the source code? Is the page really empty? R.L.: No, it's not empty, but it contains something that we cannot see [laughing]. Sure, if I had a browser now which could NOT display frames, we would be able to see the content... Very interesting. We can go for a test. C.S.: Like Andrew Bulhak and the other programmers of the net.art generators, you're using PERL as the programming language. Would you like to explain briefly why do you think that this programming language is especially suitable for the task at hand? R.L.: It would have been very possible to write the net.art generator* with a different programming language; some of them would offer the option of using existing libraries. But my decision to use PERL was already made way before this project came up. The reason I use PERL is that it's fun to work with PERL! Even in the documentation, you'll find little jokes here and there. System programmers and web developers discovered this language early on, which is one reason for its widespread use. For solving a problem like the net.art generator, the Comprehensive Perl Archive Network (CPAN) offers a few modules which have become basic elements of the generator. For example, you can find the module for tapping the search engines here as "WWW::Search". For downloading the client, you can use "LWP::UserAgent". The parser I am using is found in the library "XML::LibXML", which also offers the "Document-Object-Model" (DOM) - the tree. PERL and the modules are free software 5, or at least compatible. This has a lot to do with freedom, but to explain this would go beyond the scope of this conversation. Plus, I always need a little PERL training. C.S.: In what context do you work with comparable techniques in your professional life as a programmer? R.L.: Taking apart documents and rebuilding them with the help of software happens a lot in the newspaper world. Usually, the software used there is called Content-Management-System (CMS), a particular variation as a publishing system. Just recently, I worked as co-developer of a new concept for a big Hamburg-based publishing house and also contributed to the realization of the concept. The big difference, though, is that, within the business world, you get very clear instructions concerning the purpose of your work - and also about what sort of result is expected. C.S.: Yes, indeed, a big difference. R.L.: Usually, we also get very clear design guidelines in form of templates, etc. Nevertheless, the basic principles of programming are very similar. We search for certain contents in the database, and then, these contents are given their ultimate appearance. The nice thing about the net.art generator* is that we first search for content, then we remove the logical structure from the content, for example, the semantic relations. After that, we create completely new content according to random principles using the elements of the logical structure. We do the same to the structure of the file that contains the content. Finally, the viewer gets involved and discovers some sort of sense - or not. C.S.: I'd like to go back to this surprising phenomenon -- we have received a file which cannot be displayed after all these complex processes have taken place. Theoretically, such a thing may happen. But my question would be, how can we communicate that to the user? Actually, it would be necessary to bring up a message which indicates the difference between a simple dysfunctionality and a result which cannot be displayed -- which is a completely different sort of dysfunctionality. R.L.: Almost a philosophical problem! To get a result - and a non-result also is a kind of result - means being able to display something; what's being displayed simply is not visible in this particular case. It is very likely that a user who has typed in a certain term, who triggered certain actions by doing so, and who expects a 'piece of art' as result, will become highly irritated by this particular kind of art. In such cases, it would be great to have a WIKI 6 website which contains so-called "frequently asked questions". This would definitely also contribute creating a cult around the generators. C.S.: Not a bad idea! Do you think there are enough fools who would spend time on such a thing? R.L.: Why not? The idea of the net.art generator* bears an enormous potential. Imagine an extremely high number of automatically generated websites getting indexed by the search engines. The wheel would turn full circle. C.S.: Yes, great - feedback! And this goes on until the whole Web consists only of "Dada sites"! R.L.: Assuming that the websites and the generators were located on many different servers, the search engines would have a hard time telling apart "Dada websites" from the regular ones. And if the "Dada sites" were ultimately registered under their corresponding search terms, nobody would be able to get any reasonable content any longer. As a kind of "worst case scenario" of course. But there are plenty different steps along the way... C.S.: In any case, a very nice scenario. We're working on it. (Translation: Cornelia Sollfrank; edited by David Hudson) Footnotes 1 The Dada-engine* by Andrew Bulhak is a system for generating text using specific parameters. The parameters consist of a set of rules which forms a grammar. The rules are loaded into the Dada engine where they are processed and result in new texts. Using the name "Dada", Bulhak refers to the random and collage principles introduced by the "Dada" artists. 2 "Tags" are HTML page description commands written between angle brackets. 3 "Backends" are post-positive systems; in this context, the term refers to search engines. 4 Archives of nag_02, http://nag.ljudmila.org/?view 5 What is Free Software? http://www.fsfeurope.org/documents/freesoftware.de.html 6 WIKI* is short form of WikiWikiWeb, an open authoring system for websites: http://de.wikipedia.org/wiki/Wiki