這是本文件的舊版!
Paperless-ngx 文件管理系統(Docker)
- 之前就一直找尋可以快速搜尋 File Server 內檔案內文關鍵字的系統, 最近看到這套 Paperless-ngx 還具有OCR的功能, 連掃描產生的 PDF 內文都可以解析出內文, 真的就很符合我希望使用的情境.
- 安裝環境 :
- VM : 4 vCores / 8G RAM / 32G(SSD)+500G(HDD)
- 配置 : 將 500G 掛在 /data 目錄上, 作為存放資料使用
安裝方式
- 下載 docker-compose.env 與 docker-compose.yml
wget https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/dev/docker/compose/docker-compose.env -O docker-compose.env wget https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/dev/docker/compose/docker-compose.postgres.yml -O docker-compose.yml
- 修改 docker-compose.env
vi docker-compose.env
- 增加繁體中文 OCR 辨識功能
PAPERLESS_OCR_LANGUAGES=chi-tra chi-tra-vert
- 修改網址 Exp. docs.my.ichiayi.com
PAPERLESS_URL=https://docs.my.ichiayi.com
- 修改時區
PAPERLESS_TIME_ZONE=Asia/Taipei
- 設定 Reverse Proxy(Option) Exp. docs.my.ichiayi.com → http 172.16.0.220 8000
- 修改 docker-compose.yml 來支援 Office 格式, 以及增加 time out 時間, 資料存放到 /data
vi docker-compose.yml
version: "3.4" services: broker: image: docker.io/library/redis:7 restart: unless-stopped volumes: - redisdata:/data db: image: docker.io/library/postgres:15 restart: unless-stopped volumes: - pgdata:/var/lib/postgresql/data environment: POSTGRES_DB: paperless POSTGRES_USER: paperless POSTGRES_PASSWORD: paperless webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest restart: unless-stopped depends_on: - db - broker ports: - "8000:8000" volumes: - data:/usr/src/paperless/data - media:/usr/src/paperless/media - ./export:/usr/src/paperless/export - ./consume:/usr/src/paperless/consume env_file: docker-compose.env environment: PAPERLESS_REDIS: redis://broker:6379 PAPERLESS_DBHOST: db PAPERLESS_TIKA_ENABLED: 1 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000 PAPERLESS_TIKA_ENDPOINT: http://tika:9998 PAPERLESS_OCR_LANGUAGES: chi-tra chi-tra-vert PAPERLESS_OCR_LANGUAGE: chi_tra+eng gotenberg: image: docker.io/gotenberg/gotenberg:7.10 restart: unless-stopped # The gotenberg chromium route is used to convert .eml files. We do not # want to allow external content like tracking pixels or even javascript. command: - "gotenberg" - "--chromium-disable-javascript=true" - "--chromium-allow-list=file:///tmp/.*" - "--uno-listener-start-timeout=90s" - "--api-timeout=900s" tika: image: ghcr.io/paperless-ngx/tika:latest restart: unless-stopped volumes: data: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/web-data' media: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/web-media' pgdata: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/db-data' redisdata: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/broker-data'
- 建立 /data 內各個資料目錄
mkdir -p /data/web-data mkdir -p /data/web-media mkdir -p /data/db-data mkdir -p /data/broker-data
- 第一次抓取 docker images
docker compose pull
- 建立第一位 Paperless 管理者帳號
docker compose run --rm webserver createsuperuser
- 啟動 Paperless 服務
docker compose up -d