INNOVATIONS

A promising new application for LaTeX is as a generator of sophisticated PDF documents. As well as value added e-books, we can use LaTeX for on-the-fly document production.

Why LaTeX?

- LaTeX is a stable and robust program, having the best representation of mathematical text available, tables, cross referencing and other structures, and ablility to build table of contents, index, and bibliography on the fly.

- LaTeX is a programming language with features include loops, conditionals, on the fly definition construction, the ability to send information to external files, and/or bring other external files into the document, measure text, do arithmetic operations, and much more.

- LaTeX allows parsing of form results, allowing either PDF forms or HTML forms to be used as input, as well as database information.

- Importantly, it can embed PostScript and PDFmark information within LaTeX commands, allowing dynamic graphics generation and hypertext linking as the LaTeX document is produced. Any of the other features that can be added to a PDF document can also be written into the LaTeX macro set and be customized depending on the input to the LaTeX file.

Real Life Example, On-the-fly Document Production

Database publishing:
Architectural Specifications Example

PDF forms produce output that may be parsed by LaTeX This discovery was made in the process of building a proof of concept for an Architectural Specifications company that routinely produces documents that are as long as 20,000 pages, building the document from pre-existing units, based on the requirements of the particular project, a high rise or shopping center, for example.

A PDF form was designed so that choices may easily be made: Sample Form

When processed with Acrobat, the form results are presented as text in a new file (FDF). Sample FDF Data The form file may be "cleared" and reused.

The FDF file is now available for parsing by LaTeX, changing the form results into LaTeX commands.

Here is the SpecCheckList form, used twice, and producing two new PDF documents:
First use of Spec Check List Form==> Parsed ==> Document Produced

Second use of Spec Check List Form==> Parsed ==> Document Produced

(Tech note: the FDF file is input, and parsed, then the results sent to an .inf file, which is client and date stamped so that it is unique, and may be reused. The file is then input back to the .tex file where the new definitions are used in the prepared fields below. That is what is happening here: Sample use of parsed FDF data )

As well as populating an existing document with the information gathered from the form data, as shown in the example above, the newly generated LaTeX commands may be used to input the appropriate sub documents. In this example, they could be used to build an entire architectural specification, turn it into PDF, to be presented to the Architectural client, and distributed to subcontractors.

OPPORTUNITIES

Starting with an Acrobat or HTML form is useful for database publishing or on-the-fly report generation of any variety. Here are only some of the possibilities:

- Automating the building of large custom documents, which can also have a hyperlinked table of contents, cross referencing, links to on-line material, and automatically generated index.

- Automate the building of graphical data representation. Input to the form may be numbers or math that can then be used to generate PostScript graphics on the fly. An example might be medical reports that show lab results graphically and give custom advice to the patient based on those results.

- Automate datamining on-line, and representation of the results.

Consider the uses of this technology in Bioinformatics, Sequence and Genome Analysis for example. Genome research projects typically involve a variety of data (sequences, annotations, analysis results, database links, graphical images, etc.) that may be distributed over multiple storage locations and networks. Management, analysis, and communication of this information may be greatly helped by this automated report generation tool.

An on-line search of a genetics database yields information that may then be represented in PostScript in a way that helps the researcher evaluate the results quickly. This information would be presented in a PDF file which may have links to further information as well.

- Publishing: used for tracking article or book submissions. Authors fill out form, report is built, with PDF file as record to refer to later.

What uses can you imagine for this technology?
***

Background:
How LaTeX, PostScript and PDF Work Together
LaTeX output normally is printed after converting it to PDF. An intermediate step is changing the LaTeX output to PostScript, which is then translated with Acrobat Distiller into PDF.

A consequence of these steps: LaTeX => PostScript => PDF, and the fact that PostScript code and PDFmark commands may be added to a LaTeX file, means that we can write LaTeX commands that process the text and then automatically generate PostScript code using the information that LaTeX has captured.

The PostScript code may then be passed through verbatim when the LaTeX output is changed to PostScript, allowing us to use any of the features available in PostScript, combined with the results of the LaTeX commands.

Some trivial examples of LaTeX/PostScript interaction, which nevertheless demonstrate passing information from LaTeX to PostScript:

- The first example shows positioning a variable sized PostScript screen behind LaTeX text, based on the size of the text, as measured with a LaTeX command, and the results passed to the PostScript code. There are also "cutouts" in the screen, the size determined with a LaTeX macro, and the information passed to PostScript, which actually makes the screens. PostScript Cut Out Screen

- The second example shows PostScript color tabs used on the side of chapter opening pages is shown here, and moved down the page with each new chapter, in a sample from MatLab documentation: PostScript Side Tabs.

Prelinked PDF Generation

Similar to processing LaTeX/PostScript commands, we can also include pdfmark commands in the body of LaTeX commands, a feature that allows hypertext links to be generated based on the information in the text. PostScript and PDFmark commands can include color and any other capability found in Acrobat. These commands will be passed through the PostScript interpreter, and recognized when Distiller turns that PostScript into PDF.

Since LaTeX commands can generate custom PostScript code, based on LaTeX's processing of the text, and since custom PDFmark information may also be generated with LaTeX code, LaTeX makes an ideal text processing program for automated report generation, with the report to be presented as a sophisticated PDF document. The possibilities for the content and presentation of the PDF document abound, and we look forward to exploring them.

Let us know how these tools might be useful to you!

-- Amy Hendrickson

info@TeXnology.com
617 738-8029

TeXnology Inc.
Amy Hendrickson
57 Longwood Avenue
Brookline, MA 02446
USA