pdf.js完整使用教程

在浏览器中浏览pdf文件首选的组件就是mozilla的pdf.js组件。

示例

官方示例地址：https://mozilla.github.io/pdf.js/web/viewer.html

这是一个拥有完整功能的pdf浏览示例，有放大缩小、翻页、查找等功能。

下载

官方下载地址：https://mozilla.github.io/pdf.js/getting_started/#download

下载有三个版本

预构建（现代浏览器）。Firefox、Chrome。
预构建（旧浏览器）。Firefox ESR+、Chrome 92+、Opera、Edge、Safari 15.4+、Node.js 18+
源码。

所谓现代浏览器仅指火狐和谷歌浏览器，Edge、Safari都不算。因此大部分情况下应该使用旧浏览器版本。

浏览器支持官方说明：https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support

需要注意的是，预构建中的js文件没有压缩，需要自己进行处理。按官方的说法，需要自行下载源码，运行gulp minified，输出压缩后的文件。也可直接对js进行压缩，但只能用Terser，用其它的压缩器可能导致程序不可用。

压缩程序官方说明：https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#minified

另外还可以通过npm下载。这种方式会同时包含现代浏览器版和旧浏览器版的程序，但web目录中只包含pdf_viewer.js组件，不包含完整的viewer.html（即示例中的完整功能）。

npm install pdfjs-dist --save

使用npm下载的程序，有提供压缩的js，但web目录中的pdf_viewer.js依然没有提供压缩版。

组成部分

很显然，一个完整的pdf浏览组件，是一个非常复杂的系统。不仅有解析、渲染PDF的核心功能，还有查找、翻页、放大、缩小等UI界面，甚至还包括对大文件的性能优化。

pdf.js中将这些部分的功能分开。主要包括：

核心功能。位于build目录，主要文件是pdf.js pdf.worker.js。
扩展功能。npm安装版的web目录，主要文件pdf_viewer.js pdf_viewer.css文件。对核心功能进行了包装，提供了翻页、放大、缩小等功能的封装，特别对大文件的性能进行了优化。
视图功能。预构建版的web目录，主要文件是viewer.html viewer.js viewer.css。提供了完整的PDF浏览功能。

神奇的是，npm安装版的没有viewer.html viewer.js viewer.css，预构建版的没有独立提供pdf_viewer.js pdf_viewer.css（但在viewer.js viewer.css中包含了相应代码）。

只使用核心功能

只使用核心功能，可以最大程度的控制和定制组件。但缺点是开发量大，大文件性能需要专门处理，否则非常卡，甚至无法正常使用。适合个性化要求高的场景使用。

<script src="//mozilla.github.io/pdf.js/build/pdf.js"></script>
<h1>PDF.js 'Hello, world!' example</h1>
<canvas id="the-canvas"></canvas>

<script>
// If absolute URL from the remote server is provided, configure the CORS
// header on that server.
var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/examples/learning/helloworld.pdf';

// Loaded via <script> tag, create shortcut to access PDF.js exports.
var pdfjsLib = window['pdfjs-dist/build/pdf'];

// The workerSrc property shall be specified.
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';

// Asynchronous download of PDF
var loadingTask = pdfjsLib.getDocument(url);
loadingTask.promise.then(function(pdf) {
  console.log('PDF loaded');
  console.log('PDF总页数: ' + pdf.numPages);

  // Fetch the first page
  var pageNumber = 1;
  pdf.getPage(pageNumber).then(function(page) {
    console.log('Page loaded');

    var scale = 1.5;
    var viewport = page.getViewport({scale: scale});

    // Prepare canvas using PDF page dimensions
    var canvas = document.getElementById('the-canvas');
    var context = canvas.getContext('2d');
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    // Render PDF page into canvas context
    var renderContext = {
      canvasContext: context,
      viewport: viewport
    };
    var renderTask = page.render(renderContext);
    renderTask.promise.then(function () {
      console.log('Page rendered');
    });
  });
}, function (reason) {
  // PDF loading error
  console.error(reason);
});
</script>

这个例子只显示了PDF的第一页，如需全部显示，可以通过pdf.numPages获取总页数，然后循环将整个PDF页面都展示出来。但这样简单粗暴的处理，会在PDF文件很大时，出现性能问题。

应该动态的加载要显示的页面，动态清空不需要显示的页面，以提高性能。要自己实现这个功能，显然有较大的代码量。

官方示例代码地址：https://mozilla.github.io/pdf.js/examples/

使用扩展功能

npm安装版的web目录，有pdf_viewer.js pdf_viewer.css文件，提供和一些常用功能的封装。特别时优化了大文件的性能，会动态加载需要显示的页面、清空不显示的页面。可以大大减少代码量。

预构建版没有这两个文件，必须使用npm的方式安装，然后将相应文件复制到程序目录下。

但主要的缺点是，PDF内容必须放在一个定高的容器里，且需指定position: absolute。这样就无法将内容平铺在页面中，除了浏览器里有滚动条，显示PDF的容器也会出现滚动条。滚动条中还有滚动条，略微不友好。

除非像官网网站演示的那样，整个页面都用来显示pdf，就不会出现这样的情况。代码如下：

simpleviewer.html

<!DOCTYPE html>
<html dir="ltr" mozdisallowselectionprint>
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
  <meta name="google" content="notranslate">
  <title>PDF.js viewer using built components</title>

  <style>
    body {
      background-color: #808080;
      margin: 0;
      padding: 0;
    }
    #viewerContainer {
      overflow: auto;
      position: absolute;
      width: 100%;
      height: 100%;
    }
  </style>

  <link rel="stylesheet" href="../../node_modules/pdfjs-dist/web/pdf_viewer.css">

  <script src="../../node_modules/pdfjs-dist/build/pdf.js"></script>
  <script src="../../node_modules/pdfjs-dist/web/pdf_viewer.js"></script>
</head>

<body tabindex="1">
  <div id="viewerContainer">
    <div id="viewer" class="pdfViewer"></div>
  </div>

  <script src="simpleviewer.js"></script>
</body>
</html>

simpleviewer.js

"use strict";

if (!pdfjsLib.getDocument || !pdfjsViewer.PDFViewer) {
  // eslint-disable-next-line no-alert
  alert("Please build the pdfjs-dist library using\n  `gulp dist-install`");
}

// The workerSrc property shall be specified.
//
pdfjsLib.GlobalWorkerOptions.workerSrc =
  "../../node_modules/pdfjs-dist/build/pdf.worker.js";

// Some PDFs need external cmaps.
//
const CMAP_URL = "../../node_modules/pdfjs-dist/cmaps/";
const CMAP_PACKED = true;

const DEFAULT_URL = "../../web/compressed.tracemonkey-pldi-09.pdf";
// To test the AcroForm and/or scripting functionality, try e.g. this file:
// "../../test/pdfs/160F-2019.pdf"

const ENABLE_XFA = true;
const SEARCH_FOR = ""; // try "Mozilla";

const SANDBOX_BUNDLE_SRC = "../../node_modules/pdfjs-dist/build/pdf.sandbox.js";

const container = document.getElementById("viewerContainer");

const eventBus = new pdfjsViewer.EventBus();

// (Optionally) enable hyperlinks within PDF files.
const pdfLinkService = new pdfjsViewer.PDFLinkService({
  eventBus,
});

// (Optionally) enable find controller.
const pdfFindController = new pdfjsViewer.PDFFindController({
  eventBus,
  linkService: pdfLinkService,
});

// (Optionally) enable scripting support.
const pdfScriptingManager = new pdfjsViewer.PDFScriptingManager({
  eventBus,
  sandboxBundleSrc: SANDBOX_BUNDLE_SRC,
});

const pdfViewer = new pdfjsViewer.PDFViewer({
  container,
  eventBus,
  linkService: pdfLinkService,
  findController: pdfFindController,
  scriptingManager: pdfScriptingManager,
});
pdfLinkService.setViewer(pdfViewer);
pdfScriptingManager.setViewer(pdfViewer);

eventBus.on("pagesinit", function () {
  // We can use pdfViewer now, e.g. let's change default scale.
  pdfViewer.currentScaleValue = "page-width";

  // We can try searching for things.
  if (SEARCH_FOR) {
    eventBus.dispatch("find", { type: "", query: SEARCH_FOR });
  }
});

// Loading document.
const loadingTask = pdfjsLib.getDocument({
  url: DEFAULT_URL,
  cMapUrl: CMAP_URL,
  cMapPacked: CMAP_PACKED,
  enableXfa: ENABLE_XFA,
});
(async function () {
  const pdfDocument = await loadingTask.promise;
  // Document loaded, specifying document for the viewer and
  // the (optional) linkService.
  pdfViewer.setDocument(pdfDocument);

  pdfLinkService.setDocument(pdfDocument, null);
})();

使用完整功能

使用完整功能必须下载预构建版，里面提供了viewer.html viewer.js viewer.css。

完整功能提供的是HTML页面，所以只能用iframe嵌入到网页中，需要显示的pdf文件通过url地址的file参数传递。

例如：

<iframe src="http://mozilla.github.com/pdf.js/web/viewer.html?file=compressed.tracemonkey-pldi-09.pdf"/>

官网file参数说明链接：https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#can-i-specify-a-different-pdf-in-the-default-viewer

真实案例

ujcms开源项目中的文库功能使用的是第二种方案，即扩展功能，依赖pdf_viewer.js pdf_viewer.css。

增加了全屏、放大、缩小功能，可以满足基本要求。

功能演示地址：https://demo.ujcms.com/article/53
官网地址：https://www.ujcms.com
源码托管地址：https://gitee.com/ujcms/ujcms
具体代码地址：https://gitee.com/ujcms/ujcms/blob/master/src/main/webapp/templates/1/default/article_wenku.html

其它参考

https://pdfjs.express/blog/how-to-use-pdf-js