Using Linux to convert output to PDF format

Legacy systems are still out there! Some are still generating reams of paper and this can be easily avoided with a variety of Linux tools to convert legacy output into PDF format.


This article discusses some simple linux tools that can be used to change print outputs into more useful PDF files.  Let’s begin with a quick look at some of simple Linux  tools that can be used to convert text/html to PDF format.

Converting to PDF format

Perhaps the easiest way to get your legacy print jobs into PDF format is to pipe your text (or laserjet coded) print file through one of the many of the linux utilities available in most distributions (or easily installed).  Let’s take a very quick look at a few of your options.


Postscript output and Ghostscript

Most Linux distributions pre-install Ghostscript (the binary is ‘gs’) or make it readily available via their package management systems.  If you prefer you can also install it yourself from source obtained from the ghostscript web site.


Ghostscript will accept postscript  input and will output to PDF format.  If you’ve been printing to postscript printers you’ll be able to preserve all formatting.


Most Linux printers pipe their output to a spooler; something like this:

cat | gs  -dNOPAUSE   -sDEVICE=pdfwrite -sOutputFile=OutputFromPS.pdf -

or, if you’re dealing with an final file

gs  -dNOPAUSE   -sDEVICE=pdfwrite -sOutputFile=OutputFromPS.pdf

PCL (laserjet) Output and GhostPCL (pcl6)

GhostPCL is developedt by the same project that publishes the familiar ghostscript.  The great feature of ghostpcl is that it accepts laserjet print codes.  This makes it simple to take your print jobs formatted for laserjet printers and convert them to PDF without losing the formatting (or graphics or other special print formatting).

Ghostpcl is available the ghostscript website.  The binary is 'pcl6'.

Redefine your printer  piping the output through pcl6 rather that to a printer

cat samplePCL.prn | pcl6  -dNOPAUSE   -sDEVICE=pdfwrite -sOutputFile=OutputFromPCL.pdf - 


pcl6  -dNOPAUSE   -sDEVICE=pdfwrite -sOutputFile=OutputFromPCL.pdf  samplePCL.prn


PJL commands in the file seem to cause pcl6 some issues.  There is a command line flag for PJL commands.


Text Output and enscript and PS2PDF

In legacy systems you’ll no doubt encounter many print jobs that are plain text -- not postscript or pcl (laserjet). Linux has a number of tools to convert text to PDF.

<br class="kix-line-break" />enscript -p - -r -f Courier7 -M Letter -B -L 60 -c sampleTEXT.txt | gs  -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=OutputFromTXT.pdf -


Libreoffice to convert to PDF

libreoffice  --headless --convert-to pdf sampleDocForPDFArticle.odt


This creates a file ‘sampleDocForPDFArticle.pdf’ in the current directory.  The output directory can be specified on the command line.

Note that Libreoffice can convert to a number of other formats, including html.

In most Linux distributions the ‘headless’ version of Libreoffice is a separate installation from the more common version. If you find this command doesn’t work search for ‘libreoffice headless’ in your distribution's repository.

Joining PDF files

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdffile1.pdf file2.pdf
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=singleCombinedMultipagePdfFile.pdf -dBATCH  tmp_1,1.pdf tmp_1,2.pdf


See also: ‘pdftk’ which can join PDFs and much more, though I’ve found using ‘gs’ to be much more robust.


Splitting PDFs

gs -dBATCH -sOutputFile="$4" -dFirstPage=$2 -dLastPage=$3 -sDEVICE=pdfwrite "$1" >& /dev/null

gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER   -dFirstPage=22 -dLastPage=36  -sOutputFile=outfile_p22-p36.pdf 100p-inputfile.pdf

See also: ‘pdftk’ which can split PDFs (and much more), though I’ve found ‘gs’ to be much more robust.

Rotating PDF files

This can be useful if you find your PDF's aren't being oriented properly.

It can also rotate PDF's made from PCL (by converting to postscript first).

The desired gs command argument ‘AutoRotatePages’ requires Postscript input.  The method is converts the original PDF to postscript and make the rotation.

pcl6 -dNOPAUSE -dBATCH  -sDEVICE=ps2write -sOutputFile=- original.pcl | gs -dNOPAUSE -dBATCH -dAutoRotatePages=/All -q -sDEVICE=pdfwrite   -sOutputFile=new_file.pdf

Also consider ‘pdftk’, available in most distributions, which can easily rotate PDF files.

Note that the pdftk method for rotating is faster.
pdftk original_wrong.pdf cat 1-endeast output new_orientation.pdf

I’ve come to prefer using ‘gs’ (and it’s sibling ‘pcl6’) over ‘pdftk’ as it’s likely to already be installed and it seems more reliable and produces a better PDF. Extract text from PDF

pdftotext -layout OutputFromTXT.pdf

Produces a file ‘OutputFromTXT.txt’

pdftotext -layout OutputFromTXT.pdf test.txt

Produces file text.txt in the current directory.


I’ve come to prefer using ‘gs’ (and it’s sibling ‘pcl6’) over ‘pdftk’ as it’s likely to already be installed and it seems more reliable and produces a better PDF.   


However, ‘pdftk’ can do many useful things not covered in this article.  ‘pdftk’ is worth considering if you need to:

  • manipulate PDF metadata,  

  • set file encryption passwords,

  • compress/uncompress,

  • attach files to your PDF (not all PDF readers will recognize the attached files),

  • include X/FDF Data, and

  • set ‘stamps’ on the background of your PDF.  


We’ll save ‘pdftk’ for a future article.


blog comments powered by Disqus