Skip to content

Installing Crayfish

Needs Maintenance

The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see Contributing.

In this section, we will install:

  • Islandora/Crayfish, the suite of microservices that power the backend of Islandora 2.0
  • Indvidual microservices underneath Crayfish

Crayfish 2.0

Installing Prerequisites

Some packages need to be installed before we can proceed with installing Crayfish; these packages are used by the microservices within Crayfish. These include:

  • Imagemagick, which will be used for image processing. We'll be using the LYRASIS build of imagemagick here, which supports JP2 files.
  • Tesseract, which will be used for optical character recognition; note that by default Tesseract can only understand English; several other individual Tesseract language packs can be installed using apt-get, and a list of available packs can be procured with sudo apt-cache search tesseract-ocr
  • FFMPEG, which will be used for video processing
  • Poppler, which will be used for generating PDFs
sudo add-apt-repository -y ppa:lyrasis/imagemagick-jp2
sudo apt-get update
sudo apt-get -y install imagemagick tesseract-ocr ffmpeg poppler-utils

NOTICE: If you get the sudo: apt-add-repository: command not found, run sudo apt-get install software-properties-common in order to make the command available.

Cloning and Installing Crayfish

We’re going to clone Crayfish to /opt, and individually run composer install against each of the microservice subdirectories.

cd /opt
sudo git clone https://github.com/Islandora/Crayfish.git crayfish
sudo chown -R www-data:www-data crayfish
sudo -u www-data composer install -d crayfish/Homarus
sudo -u www-data composer install -d crayfish/Houdini
sudo -u www-data composer install -d crayfish/Hypercube
sudo -u www-data composer install -d crayfish/Milliner
sudo -u www-data composer install -d crayfish/Recast

Preparing Logging

Not much needs to happen here; Crayfish opts for a simple logging approach, with one .log file for each component. We’ll create a folder where each logfile can live.

sudo mkdir /var/log/islandora
sudo chown www-data:www-data /var/log/islandora

Configuring Crayfish Components

Each Crayfish component requires one or more .yaml file(s) to ensure everything is wired up correctly.

NOTICE

The following configuration files represent somewhat sensible defaults; you should take consideration of the logging levels in use, as this can vary in desirability from installation to installation. Also note that in all cases, http URLs are being used, as this guide does not deal with setting up https support. In a production installation, this should not be the case. These files also assume a connection to a PostgreSQL database; use a pdo_mysql driver and the appropriate 3306 port if using MySQL.

Homarus (Audio/Video derivatives)

/opt/crayfish/Homarus/cfg/config.yaml | www-data:www-data/644

---
homarus:
  executable: ffmpeg
  mime_types:
    valid:
      - video/mp4
      - video/x-msvideo
      - video/ogg
      - audio/x-wav
      - audio/mpeg
      - audio/aac
      - image/jpeg
      - image/png
    default: video/mp4
  mime_to_format:
    valid:
      - video/mp4_mp4
      - video/x-msvideo_avi
      - video/ogg_ogg
      - audio/x-wav_wav
      - audio/mpeg_mp3
      - audio/aac_m4a
      - image/jpeg_image2pipe
      - image/png_image2pipe
    default: mp4
fedora_resource:
  base_url: http://localhost:8080/fcrepo/rest
log:
  level: NOTICE
  file: /var/log/islandora/homarus.log
syn:
  enable: true
  config: /opt/fcrepo/config/syn-settings.xml

Houdini (Image derivatives)

Currently the Houdini microservice uses a different system (Symfony) than the other microservices, this requires different configuration.

/opt/crayfish/Houdini/config/services.yaml | www-data:www-data/644

# This file is the entry point to configure your own services.
# Files in the packages/ subdirectory configure your dependencies.
# Put parameters here that don't need to change on each machine where the app is deployed
# https://symfony.com/doc/current/best_practices/configuration.html#application-related-configuration
parameters:
    app.executable: /usr/local/bin/convert
    app.formats.valid:
        - image/jpeg
        - image/png
        - image/tiff
        - image/jp2
    app.formats.default: image/jpeg

services:
    # default configuration for services in *this* file
    _defaults:
        autowire: true      # Automatically injects dependencies in your services.
        autoconfigure: true # Automatically registers your services as commands, event subscribers, etc.

    # makes classes in src/ available to be used as services
    # this creates a service per class whose id is the fully-qualified class name
    App\Islandora\Houdini\:
        resource: '../src/*'
        exclude: '../src/{DependencyInjection,Entity,Migrations,Tests,Kernel.php}'

    # controllers are imported separately to make sure services can be injected
    # as action arguments even if you don't extend any base controller class
    App\Islandora\Houdini\Controller\HoudiniController:
        public: false
        bind:
            $formats: '%app.formats.valid%'
            $default_format: '%app.formats.default%'
            $executable: '%app.executable%'
        tags: ['controller.service_arguments']

    # add more service definitions when explicit configuration is needed
    # please note that last definitions always *replace* previous ones

/opt/crayfish/Houdini/config/packages/crayfish_commons.yml | www-data:www-data/644

crayfish_commons:
  fedora_base_uri: 'http://localhost:8080/fcrepo/rest'
  syn_config: '/opt/fcrepo/config/syn-settings.xml'

/opt/crayfish/Houdini/config/packages/monolog.yml | www-data:www-data/644

monolog:

  handlers:

    houdini:
      type: rotating_file
      path: /var/log/islandora/Houdini.log
      level: DEBUG
      max_files: 1

The below files are two versions of the same file to enable or disable JWT token authentication.

/opt/crayfish/Houdini/config/packages/security.yml | www-data:www-data/644

Enabled JWT token authentication:

security:

    # https://symfony.com/doc/current/security.html#where-do-users-come-from-user-providers
    providers:
        jwt_user_provider:
            id: Islandora\Crayfish\Commons\Syn\JwtUserProvider

    firewalls:
        dev:
            pattern: ^/(_(profiler|wdt)|css|images|js)/
            security: false
        main:
            anonymous: false
            # Need stateless or it reloads the User based on a token.
            stateless: true

            provider: jwt_user_provider
            guard:
                authenticators:
                    - Islandora\Crayfish\Commons\Syn\JwtAuthenticator

            # activate different ways to authenticate
            # https://symfony.com/doc/current/security.html#firewalls-authentication

            # https://symfony.com/doc/current/security/impersonating_user.html
            # switch_user: true


    # Easy way to control access for large sections of your site
    # Note: Only the *first* access control that matches will be used
    access_control:
        # - { path: ^/admin, roles: ROLE_ADMIN }
        # - { path: ^/profile, roles: ROLE_USER }

Disabled JWT token authentication:

security:

    # https://symfony.com/doc/current/security.html#where-do-users-come-from-user-providers
    providers:
        jwt_user_provider:
            id: Islandora\Crayfish\Commons\Syn\JwtUserProvider

    firewalls:
        dev:
            pattern: ^/(_(profiler|wdt)|css|images|js)/
            security: false
        main:
            anonymous: true
            # Need stateless or it reloads the User based on a token.
            stateless: true

Hypercube (OCR)

/opt/crayfish/Hypercube/cfg/config.yaml | www-data:www-data/644

---
hypercube:
  tesseract_executable: tesseract
  pdftotext_executable: pdftotext
fedora_resource:
  base_url: http://localhost:8080/fcrepo/rest
log:
  level: NOTICE
  file: /var/log/islandora/hypercube.log
syn:
  enable: true
  config: /opt/fcrepo/config/syn-settings.xml

Milliner (Fedora indexing)

/opt/crayfish/Milliner/cfg/config.yaml | www-data:www-data/644

---
fedora_base_url: http://localhost:8080/fcrepo/rest
drupal_base_url: http://localhost
modified_date_predicate: http://schema.org/dateModified
strip_format_jsonld: true
debug: false
db.options:
  driver: pdo_pgsql
  host: 127.0.0.1
  port: 5432
  dbname: CRAYFISH_DB
  user: CRAYFISH_DB_USER
  password: CRAYFISH_DB_PASSWORD
log:
  level: NOTICE
  file: /var/log/islandora/milliner.log
syn:
  enable: true
  config: /opt/fcrepo/config/syn-settings.xml

Recast (Drupal to Fedora URI re-writing)

/opt/crayfish/Recast/cfg/config.yaml | www-data:www-data/644

---
fedora_resource:
  base_url: http://localhost:8080/fcrepo/rest
drupal_base_url: http://localhost
debug: false
log:
  level: NOTICE
  file: /var/log/islandora/recast.log
syn:
  enable: true
  config: /opt/fcrepo/config/syn-settings.xml
namespaces:
-
  acl: "http://www.w3.org/ns/auth/acl#"
  fedora: "http://fedora.info/definitions/v4/repository#"
  ldp: "http://www.w3.org/ns/ldp#"
  memento: "http://mementoweb.org/ns#"
  pcdm: "http://pcdm.org/models#"
  pcdmuse: "http://pcdm.org/use#"
  webac: "http://fedora.info/definitions/v4/webac#"
  vcard: "http://www.w3.org/2006/vcard/ns#"

Creating Apache Configurations for Crayfish Components

Finally, we need appropriate Apache configurations for Crayfish; these will allow other services to connect to Crayfish components via their HTTP endpoints.

Each endpoint we need to be able to connect to will get its own .conf file, which we will then enable.

NOTICE

These configurations would potentially have collisions with Drupal routes, if any are created in Drupal with the same name. If this is a concern, it would likely be better to reserve a subdomain or another port specifically for Crayfish. For the purposes of this installation guide, these endpoints will suffice.

/etc/apache2/conf-available/Homarus.conf | root:root/644

Alias "/homarus" "/opt/crayfish/Homarus/src"
<Directory "/opt/crayfish/Homarus/src">
  FallbackResource /homarus/index.php
  Require all granted
  DirectoryIndex index.php
  SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>

/etc/apache2/conf-available/Houdini.conf | root:root/644

Alias "/houdini" "/opt/crayfish/Houdini/public"
<Directory "/opt/crayfish/Houdini/public">
  FallbackResource /houdini/index.php
  Require all granted
  DirectoryIndex index.php
  SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>

/etc/apache2/conf-available/Hypercube.conf | root:root/644

Alias "/hypercube" "/opt/crayfish/Hypercube/src"
<Directory "/opt/crayfish/Hypercube/src">
  FallbackResource /hypercube/index.php
  Require all granted
  DirectoryIndex index.php
  SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>

/etc/apache2/conf-available/Milliner.conf | root:root/644

Alias "/milliner" "/opt/crayfish/Milliner/src"
<Directory "/opt/crayfish/Milliner/src">
  FallbackResource /milliner/index.php
  Require all granted
  DirectoryIndex index.php
  SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>

/etc/apache2/conf-available/Recast.conf | root:root/644

Alias "/recast" "/opt/crayfish/Recast/src"
<Directory "/opt/crayfish/Recast/src">
  FallbackResource /recast/index.php
  Require all granted
  DirectoryIndex index.php
  SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>

Enabling Each Crayfish Component Apache Configuration

Enabling each of these configurations involves creating a symlink to them in the conf-enabled directory; the standardized method of doing this in Apache is with a2enconf.

sudo a2enconf Homarus Houdini Hypercube Milliner Recast

Restarting the Apache Service

Finally, to get these new endpoints up and running, we need to restart the Apache service.

sudo systemctl restart apache2

Last update: October 11, 2023