Comparing PDF pages automatically

I needed to compare whether two PDF documents generated from the same TeX sources where different or not, since the generation method was slightly different for both.

Trying to compare it visually would be crazy, since the document contains more than 1400 pages. So I needed an automatic way to do it.

Albert Astals Cid from the poppler mailing list suggested to use pdftoppm and diff.

The way to automatically compare PDF pages was extremely easy: convert all pages to images, compare each pages and be warned only on the different ones.

After that, it only rests to check visually which differences contain those pages which have images that don’t match.

Advertisements
Posted in Digital typography, PDF, TeX. Comments Off on Comparing PDF pages automatically

Doing arithmetic in TeX

I have just discovered accidentally how to perform simple arithmetic operations in TeX:

\number\numexpr 5+5\relax

And this will print the result, not the operands.

Posted in Personal technotes, TeX. Comments Off on Doing arithmetic in TeX

\include and \input

Thanks to this comment, I have realized the main difference between \include and \input in LaTeX: \include has a page break before and after the file and \input doesn’t.

Posted in Personal technotes, TeX. Comments Off on \include and \input

Magnifying with XeTeX

When magnifying with XeTeX using the geometry package is important not to forget that the documentclass should not have the page size set on it, instead of having it set on the proper geometry option.

Posted in Personal technotes, TeX. Comments Off on Magnifying with XeTeX

How to generate a booklet from a PDF file

Using ConTeXt (taken from the imposition explanation):

\definepapersize	[filius][width=136mm, height=232mm]
\setuppapersize		[filius][A4,landscape]
\setuparranging		[2UP,doublesided]
\setuplayout [backspace=0pt,
    topspace=0pt,
       width=middle,
      height=middle,
    location=middle,
      header=0pt,
      footer=0pt,
      grid=no, marking=off]
\starttext
\insertpages
  [document.pdf][width=0pt]
\stoptext

You have to replace the document.pdf with the real file name and filius with the original paper size.

Posted in Personal technotes, TeX. Comments Off on How to generate a booklet from a PDF file

On presentation technologies (what Lessig might need)

Reading Lawrence Lessig’s Experiments in presentation technology, I became extremely interested on his efforts when I read:

My hope is to put every presentation I’ve made, with audio and the source files, up for anyone to do with as they wish. That turns out to be harder than it should be. Any advice or help would be greatly appreciated.

It sounds promising, but there is an issue that makes the task harder than it seems:

The only difficult part about this was listening to myself again (and again) as I built this.

If I don’t understand the issue here wrong, the problem is the timeline to sync audio and each slide. It is difficult to guess how long each slide should take. And if you have many slides, this task will be tedious. And a solution for this would be that the computer counts for you.

Computers are mainly counting machines. It should not be difficult to implement a multiplatform program (using wxWidgets or something similar) that is able to detect keystrokes defined by the user to start the timeline, detect each new slide transition and finish the timeline and that is also able to export this timeline into a text file.

An example of this would be a (Keynote/PowerPoint/Impress/PDF) presentation not using the fullscreen mode to see what comes next (or in a mode that enables you to see previous and next slides). You start recording the audio and start the timeline. Each slide transition is detected by the program, so the syncing will be perfect. You finish the timeline and stop recording. The ouput file would be:

00:00:05
00:00:10
00:00:12
00:00:18.25
00:00:23

(Of course, the program could have another features, but this is only a basic sketch.)

If I’m not wrong, Lawrence Lessig could even generate the timeline when giving the presentation. This would be the first step to generate the presentation with audio in a PDF file (as suggested here).

Posted in Digital typography, Presentation technology. Comments Off on On presentation technologies (what Lessig might need)

On presentation technologies (PDF)

After reading Lessig’s Experiments in presentation technology, I have been searching for a proper way of syncing audio and slides. As far as I know (and I’m not a programer, only a user), vectors are smaller and better for being zoomed than bitmaps. Formats that allow vectors and multimedia are PDF and Flash.

I think PDF is the right answer (my experiences with Flash will be described on a following post). It is both possible to determine the duration of an automatic page display and it is possible to embed audio and video files on PDF files. Both issues are described on the PDF Reference Version 1.6. The /Dur entry to the page object is described on page 121 (and it is rather tricky, since Acrobat/Adobe Reader only advances automatically to next page when the /PageMode /FullScreen is set on the catalog dictionary [although this might not be mandadory]). “Multimedia” is described on section 9.1 of the PDF Reference and I guess that section 9.2 (“Sounds”) is not applicable to mp3/ogg files. (Sorry, but it is too technical for me.)

The question is then which tools can modify already existing PDF documents to set the duration display duration of each page and to add the background mp3/ogg audio file to the complete presentation. Apart from Adobe’s tools themselves, I’m afraid that no tool released under an open source license is able to achive this right now, but I guess there is one that could be easily implemented to do the job.

pdftk is a tool to manipulate PDF documents and it is released under the GNU GPL. It is a command line tool and it runs under Windows, GNU/Linux, Mac OS X, FreeBSD and Solaris. It could be implemented in the following way:

  • Adding an option pagedisplay times.txt, having times.txt the following content:

    00:05
    00:10.5
    00:13.2

    Time intervals could also be introduced only in seconds or in hours:minutes:seconds.

  • Switching /PageMode to /FullScreen in the catalog dictionary (it is required for the automatic display to work).

  • Adding an option audiosync audio.mp3 n, where n stands for the page number where the audio file should be inserted. Probably it would be a good idea to allow to insert different audio files in different pages.

If you feel more confortable with Python, I guess pyPDF could be developed to be able do the task, but this tool is in earlier stages of development and probably more labour would be required. Since pyPDF is actually a library and it requires an interpreter to run, it is harder to be deployed by end-users.

On the display side, Adobe Reader 7 is able to deal with both features, but multimedia playing doesn’t work in UNIX. And what about non-Adobe PDF viewers? poppler handles neither the page object entry /Dur (see issue) nor embedded multimedia (see issue) yet.

And eventually the question is: any takers?