Australian Scholarly Editions Centre Projects (ASEC)

Further Work

This chapter looks at one possible way in which the JITM paradigm can be further developed using relational database technologies. It then concentrates on methods by which the JITM paradigm might be able to gain user acceptance in the academic community and how current developments in the technology can be used to best advantage.

7.1 Management & Manipulation of Tag Sets

7.1.1 Storing Tag Sets in a Database

As mentioned in section 4.5.2. "Data Storage for a JITM Electronic Edition", the tagRecords of a JITM electronic edition are well defined data structures and could be generated as output from a database prior to being incorporated into a tag set for use in a JITM system. Further work should be done on which type of database (i.e. relational, hierarchical or perhaps object oriented) would be best suited for this purpose.

7.1.2 Manipulation of Tag Sets

In the prototype JITM system developed for this report, the selected tag sets of a perspective are processed to see if they define a valid SGML hierarchical structure. If the structure is invalid, the conflicting tags are indicated so the user can hopefully define a non-conflicting set which suits their purpose. This is done without any reference to the transcription file, and is an example of the way tag sets can be manipulated by themselves. Similarly, sets could be filtered prior to use to remove tagRecords that do not match a specified criterion. For example, if a tag set contains all the tagRecords for the dialogue of a whole chapter, but the user is only interested in looking at the dialogue of a particular character, then tagRecords could be selected based on the character’s name being part of the tagRecord’s string content.

7.2 Strategies for Further Development

Further development on JITM-based electronic editions would be best undertaken by the implementation of a JITM system working with real data and being called upon to do the real work of an electronic scholarly edition. The JITM paradigm is well suited for incremental development. As pointed out earlier in this report, the first step in creating a JITM-based edition is the publication of the transcription files for the contributing states. The process of creating and authenticating these transcription files will allow us to capture the physical layout of the states as the first sets of tagRecords. Once the transcription files are available, further applications of meta-data can be done incrementally and in parallel. This would allow a number of scholars to explore the potential of the JITM paradigm simultaneously, without effecting each other or the basic resource of the transcription files.

As mentioned in the section, "Problems with the Paradigm Shift", creating a JITM-based electronic edition is not likely to be attractive to traditional publishing houses. Therefore, it is likely that the editor of the edition will be responsible for both the preparation and distribution costs. For this reason low cost methods would obviously be preferable. As the edition is digital in nature, it would seem that electronic distribution and presentation over the Internet is the best and cheapest solution. As access to these technologies is prevalent in tertiary institutions which is the source of the major group of users for a scholarly edition.

Publishing an edition as a research site would help to develop the as yet undiscovered capabilities of the JITM system by exposing it to the user requirements of a large number of potential users. The JITM paradigm gives people with specific interests in the edition the capability of creating their own tag sets (i.e. perspectives on the edition) which would have the added benefit of helping the paradigm gain user acceptance. The next section looks at ways which an Internet server could be implemented.

7.2.1 Internet-based Solutions

The easiest way of mounting the transcription files onto a server would be to make them available for anonymous ftp access. Since the transcription files include only minimal TEI conformant SGML tagging, it would be an easy task to convert them for use by other text processing systems using text filtering utilities. This would at least allow the transcription files to be made available as a service to the academic community.

It would also be easy to include the TEI header file for the edition on the site as well, so that users of web-capable SGML browsers could then access the files over the Internet using the capabilities built into their browsers. However this means of access is not compatible with the JITM paradigm as the creation of a perspective document from the transcription file needs to occur before the document is passed to the SGML browser.

A web-based JITM edition would require a front-end, where users would specify the features they wanted in their perspective, so that a background task (or CGI) on the server could create the perspective files and transfer it to the user. The user would then read the file using their SGML-capable browser. Currently this interface would likely be based on the use of HTML to act as the front-end of the system. This means that with the addition of a CGI to create the perspective files of the edition an editor could turn an existing World Wide Web site into a JITM electronic edition server.

The client-based side of this system is already available. Electronic Book Technologies® make DynaWeb™ which is a web capable SGML browser and Panorama™ from SoftQuad® has been available as a NetScape™ helper application for many years. However both of these applications are relatively expensive. SoftQuad has just released a cheaper version of Panorama that works as a plug-in for both NetScape Navigator™ and Internet Explorer™ on all major hardware platforms. The plug-in is activated when the MIME type for a down loaded file indicates it is an SGML file. This ties in nicely with the delivery of the JITM perspective files from a HTTP server.

7.2.2 The CGI and Beyond

The JITM CGIs main processing activity is the parsing and embedding of the SGML tags into body of the transcription file. Although processor intensive this should be easy to implement as a background CGI. The main features important to a JITM-based web server will be the organisation and maintenance of the data files (i.e. transcription files and their associated tag sets) and the provision of a user interface to provide full access to the design goals of the JITM paradigm. This section looks at how some of these ideas may be further developed.

There are a number of development systems available for developing the server CGI. HyperCard, the prototype development environment, can be used to created CGIs for Macintosh-based web servers such as WebStar™ using AppleScript™ as the inter-process communication mechanism. This would allow scripting already done for the prototype to be reused and speed up the developmental aspect of the project by allowing a quick prototype to be available for web access.

For a production system, a cross platform solution would be best so that a JITM server could be hosted on a number of different platforms. Having a cross platform solution would help improve the acceptance of the JITM paradigm within the academic community by allowing the system to be trialed on existing web sites. Perl, which is available for UNIX, MacOS, and Windows would be a good choice for this because of its excellent text handling capabilities. It is a known and well respected development environment and is already used widely for the development of CGIs for web servers.

Another possible development environment which needs to be looked at closely is JAVA™. Although relatively new technology having one executable file that will run on any hardware has decided advantages for a JITM system. Apart from increasing the acceptance of the JITM system having one version of the executable would guarantee that the results delivered by the JITM CGI would be the same on all platforms. JAVA also contains some features that would speed the development of the production CGI. Firstly it is multi-threaded so that the CGI can handle a number of requests simultaneously. Secondly the base language contains a hashing function which could be used to calculate the Manipulation Detection Code values used for authentication.

As mentioned in Section 7.1. "Management and Manipulation of Tag Sets" the use of a database for storing the data files for an edition has great appeal. There are a number of databases which could provide the data management capabilities required for a server-based JITM system. Further study would be required to see how well they interfaced with the JITM system and whether there was a cross-platform solution available. Similarly there are high powered SGML-based document management systems available which could no doubt be used with a JITM system. These applications are very powerful and typically very expensive. It should be noted here that tying the JITM system to external proprietary software has the potential danger of reducing its long term useability. In the end it may turn out that keeping the JITM system as simple as possible may be the best long term solution.

7.2.3 Support for XML

The eXtensible Markup Language (XML) is being touted as the replacement to HTML, and is also supposed to bring some of the power of SGML to the World Wide Web. A long-term JITM system would need to be able to support the use of an XML-based delivery system to ensure its continued used. Further work would need to be done in this area. Fortunately there are a few things in favour of the two systems being compatible, and potentially it is possible that an XML browser could replace the SGML browser currently required for a JITM system.

An XML file, while not needing to have a specific DTD, must have a logically complete hierarchy tree, and for this reason tag minimisation is not allowed in an XML document. The JITM file format specification also does not allow tag minimisation. The XML specification allows for the use of empty tags (i.e. milestone tags such as line breaks), but they have a new formal representation as shown below;

 

This is an example of an XML <milestone/> tag.

This representation means that they will be immediately recognisable for what they are by an XML browser. Since the JITM system uses an abstracted encoding scheme, new tag sets could be created to incorporate this new type of tag for delivery to XML systems while still maintaining the old milestone tag sets for delivery to SGML browsers.

An XML file will also be able to include with it a format specification which will tell the XML browser how to represent the contents of the file. This format specification will also be able to contain applets to provide the XML browser with capabilities it might not normally have. It is possible that this facility could be used to give an XML browser all the capabilities required to be able to present and control a JITM perspective document, without the XML browser having to be a fully TEI compliant SGML browser.

Return   Next