====== Paperless-ngx 文件管理系統(Docker) ====== * 之前就一直找尋可以快速搜尋 File Server 內檔案內文關鍵字的系統, 最近看到這套 Paperless-ngx 還具有OCR的功能, 連掃描產生的 PDF 內文都可以解析出內文, 真的就很符合我希望使用的情境. * 安裝環境 : * VM : 4 vCores / 8G RAM / 32G(SSD)+500G(HDD) * OS : [[tech/alpine_docker|Alpine3 + Docker Compose]] * 配置 : 將 500G 掛在 /data 目錄上, 作為存放資料使用 ===== 安裝方式 ===== - 下載 docker-compose.env 與 docker-compose.yml wget https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/dev/docker/compose/docker-compose.env -O docker-compose.env wget https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/dev/docker/compose/docker-compose.postgres.yml -O docker-compose.yml - 修改 docker-compose.env vi docker-compose.env * 增加繁體中文 OCR 辨識功能 PAPERLESS_OCR_LANGUAGES=chi-tra chi-tra-vert * 修改網址 Exp. docs.my.ichiayi.com PAPERLESS_URL=https://docs.my.ichiayi.com * 修改時區 PAPERLESS_TIME_ZONE=Asia/Taipei * 修改預設 OCR為繁體中文+英文 PAPERLESS_OCR_LANGUAGE=chi_tra+eng - 設定 Reverse Proxy(Option) Exp. docs.my.ichiayi.com -> http 172.16.0.220 8000 - 修改 docker-compose.yml 來支援 Office 格式, 以及增加 time out 時間, 資料存放到 /data vi docker-compose.yml services: broker: container_name: broker image: docker.io/library/redis:7 restart: unless-stopped volumes: - redisdata:/data db: container_name: db image: docker.io/library/postgres:15 restart: unless-stopped volumes: - pgdata:/var/lib/postgresql/data environment: POSTGRES_DB: paperless POSTGRES_USER: paperless POSTGRES_PASSWORD: paperless webserver: container_name: webserver image: ghcr.io/paperless-ngx/paperless-ngx:latest restart: unless-stopped depends_on: - db - broker - gotenberg - tika ports: - "8000:8000" volumes: - data:/usr/src/paperless/data - media:/usr/src/paperless/media - ./export:/usr/src/paperless/export - ./consume:/usr/src/paperless/consume env_file: docker-compose.env environment: PAPERLESS_REDIS: redis://broker:6379 PAPERLESS_DBHOST: db PAPERLESS_TIKA_ENABLED: 1 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000 PAPERLESS_TIKA_ENDPOINT: http://tika:9998 gotenberg: container_name: gotenberg image: docker.io/gotenberg/gotenberg:7.10 restart: unless-stopped # The gotenberg chromium route is used to convert .eml files. We do not # want to allow external content like tracking pixels or even javascript. command: - "gotenberg" - "--chromium-disable-javascript=true" - "--chromium-allow-list=file:///tmp/.*" - "--uno-listener-start-timeout=90s" - "--api-timeout=900s" tika: container_name: tika image: ghcr.io/paperless-ngx/tika:latest restart: unless-stopped volumes: data: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/web-data' media: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/web-media' pgdata: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/db-data' redisdata: driver: local driver_opts: type: 'none' o: 'bind' device: '/data/broker-data' - 建立 /data 內各個資料目錄 mkdir -p /data/web-data mkdir -p /data/web-media mkdir -p /data/db-data mkdir -p /data/broker-data - 第一次抓取 docker images docker compose pull - 建立第一位 Paperless 管理者帳號 docker compose run --rm webserver createsuperuser - 啟動 Paperless 服務 docker compose up -d ===== 參考網址 ===== * https://docs.paperless-ngx.com/setup/ * https://docs.paperless-ngx.com/configuration/#PAPERLESS_OCR_LANGUAGE * [[https://github.com/paperless-ngx/paperless-ngx/blob/main/docker/compose/docker-compose.sqlite-tika.yml | 想要支援 Office 格式, 就需要在 docker-compose.yml 內增加 gotenberg 與 tika 兩個服務]] * [[https://github.com/paperless-ngx/paperless-ngx/discussions/4627 | 分析文件檔案時出現 503 可調整 gotenberg 的 timeout 時間以及增加 CPU 與 RAM 的資源]] {{tag>docs 檔案管理 ocr docker}}