Advanced configuration of the local installation

Once we have the application installed and running on our machine, we can start configuring our installation to adapt it to our specific needs. Here we show you how to make some important configurations:

Change the File and Database Path.

By default, as we have already seen, our local installation has the following structure configured:

├── local-machine
│   ├── archihub
│   │   ├── frontend
│   │   ├── backend
│   │   ├── mongo_db
│   ├── webfiles
│   ├── userfiles
│   ├── temporal
│   ├── original
│   ├── data
│   │   ├── mongodb
│   │   ├── elastic

In ArchiHUB, the folders webfiles, userfiles, temporal, original, and data are essential for the operation of the system since they contain all the documents and data generated by the application. However, it is possible to configure these folders to be located in different paths, either on an external drive or on a network drive, allowing greater flexibility in the organization of your files. Let’s see how to do it.

In our .env file that we configured at the beginning of the guide we are going to modify the paths. These are configured in the environment variables:

USERFILES_PATH='../userfiles'
WEBFILE_PATH='../webfiles'
UPLOAD_PATH='../original'
TEMPORAL_PATH='../temporal'
DATA_PATH='../data'

From here, you can change the essential folder paths. In the example, the paths are relative, but you can use absolute paths that lead directly to your content. It is important to remember that for the correct functioning of the application, these folders should not be changing and their content should not be changing frequently.

You can add or modify the following variables in the .env file to point to the new desired paths, in this case we want to assign a path that is in an external disk:

WEBFILES_PATH=/mnt/disco_externo/webfiles
USERFILES_PATH=/mnt/disco_externo/userfiles
TEMPORAL_PATH=/mnt/disco_externo/temporal
ORIGINAL_PATH=/mnt/disco_externo/original
DATA_PATH=/mnt/disco_externo/data

Important Considerations

Path Consistency: Make sure that the specified paths are always available and accessible by the system where ArchiHUB is running.
Access Permissions: Verify that ArchiHUB has the necessary permissions to read and write to the new paths.
Avoid Frequent Changes: Changing the paths and contents of these folders frequently may cause errors in the operation of the application. Be sure to define these paths definitively during the initial configuration.
Application restart: After making these changes, it is necessary to restart ArchiHUB for the new settings to take effect. You can run docker compose up --build -d from the root folder as we saw in the chapter to update the application.

Enable ElasticSearch for Searching

By default, the Elasticsearch container is downloaded and installed along with ArchiHUB. However, this functionality is not automatically enabled because it requires quite a lot of machine resources as you catalog and index information. Let’s see how you can activate it and how to use it. This functionality is essential to perform keyword searches both in the metadata of the resources and in the files you process.

Activating Elasticsearch in ArchiHUB

The first step is to go to the system configuration and in the Index Management section activate the first option.

After activating Elasticsearch in the system configuration, it is necessary to restart the backend container to apply the changes. This can be done in two ways:

From console: You must navigate to the local-machine/archihub folder and from there you can run the following command to restart the backend container: docker compose up -d --no-deps archihub_flask_backend.
From desktop application: In the desktop application it is a little simpler, make sure you are in the submenu of the containers, look for the one that says archihub_flask_backend and simply stop it and start again.

To validate that the Elasticsearch index has started correctly, follow these steps:

Head back to the system configuration in the ArchiHUB interface. Look for the “Index Management” section and check if the first option is active. If this option is active, congratulations! You now have your archive connected to the index, which means that Elasticsearch is working properly and you can take advantage of ArchiHUB’s advanced search capabilities.

If the first option is not active, it is possible that a problem occurred when starting the index. In this case, try to solve the problem by consulting our frequently asked questions section, where some of the common problems and their solutions are addressed. If you do not find the answer there, we recommend that you ask in the forum of the application, where the community and developers can offer help and guidance to solve any problem.

Start indexing

For Elasticsearch to work correctly with ArchiHUB, it is necessary to generate a mapping for the index, which defines the structure of the data to be indexed. Fortunately, ArchiHUB takes care of this automatically using the metadata standards you have defined. Here’s how to perform these steps:

First, access the system settings in the ArchiHUB interface and look for the section called Index Management. Here, you should click on the option to “Regenerate index”. This will generate the necessary mapping based on your metadata standards. It is important to note that this action will appear in your processing in your profile, allowing you to track its progress.

After regenerating the index, you must index the resources to upload them to the index so that you can start searching. To do this, in the same section Index Management, select the option “Index the resources”. This process can also be followed from the “My Processes” section in your profile, where you will be able to see the status and progress of the indexing.

Once the index is generated and the resources are indexed, ArchiHUB will automatically take care of uploading the database changes to Elasticsearch in case the content is updated. This ensures that the information in the index is always up to date and reflects the changes made to the archive.

Processing Node Configuration

By default, our docker-compose.yml file starts a single processing node dedicated to all tasks that are not in a specific processing queue. You can learn more about the processing queues by clicking here.

However, if we want to install new plugins that may use a bit more intensive processing, we need to modify that file.

To do this open the docker-compose.yml file in a text editor and go to the CELERY QUEUE SERVICE section. You will notice that it looks a lot like CELERY WORKER SERVICE but it is commented out and the command it executes is different as we now have the processing rows specified. Uncomment the whole block of text, then open the terminal and restart the containers with docker compose up --build -d.