Skip to content

Technology stack and requirements

ArchiHUB is a versatile web platform that uses a varied technological stack to offer you a wide range of actions on your documents.

Technology stack

stack tech

  • Database: We rely on MongoDB, a highly flexible non-relational database. This choice allows us to adapt to your changing needs in terms of metadata schemas.

  • Index: The index allows us to retrieve information quickly and efficiently. ArchiHUB takes care of all the organization and adaptation of elasticsearch mappings.

  • In-memory database: We leverage Redis to implement a caching system that helps alleviate the load on our main database. In addition, we use Redis to manage a queue of processes with Celery, which allows us to handle tasks in an efficient and scalable manner.

  • Backend: The backend of our application benefits from several open source projects. We use:
       - FFmpeg: for file processing
       - Celery: for the management of the processing queue, allowing a distributed and asynchronous execution of tasks.
       - Flask and Gunicorn: to run the backend in parallel, ensuring optimal scalability and performance at all time.

  • Frontend: Our application has a frontend developed using React.js.

Requirements

TypeRequirementsComments
Local machine- 8GB RAM
- Disk size according to the content. It is important to note that ArchiHUB generates multiple versions of the files in addition to the original. This means that the disk space required may vary according to the versions generated and the size of the originals.
It is important to note that in this installation, some plugins, such as the automatic transcription plugin, may require additional resources to run correctly. However, you can rest easy knowing that for the cataloging, retrieval and file organization functionalities, you will have no problems thanks to the optimization efforts we have made.
Installation in a multi-machine infrastructure- Two (2) machines for the MongoDB cluster
   - Minimum 16GB of RAM for each and 8 CPU cores

- Two (2) machines for the Elasticsearch cluster
   - Minimum 32GB RAM for each and 16 CPU cores. Ideally 64GB of RAM.

- One (1) machine for the application
   - 64GB of RAM and 32 CPU cores. This machine is not only responsible for running the application itself, but also for handling the cache and managing the process queue. Since we run multiple instances of the backend in parallel, it is crucial to have enough RAM and CPU power to handle all requests efficiently.

- Two (2) machines for processing
   - ArchiHUB provides the possibility to separate rows of tasks depending on the intensity of the tasks to be executed.
This configuration provides the minimum necessary for rapid deployment that can easily adjust to changing application query requirements. If necessary, we can easily add new machines to each of the clusters to improve performance. This flexibility allows us to scale on demand and ensure that you always have the resources you need to maintain optimal performance.