Version: 2.14.0

FLOWX Engine setup guide

Introduction

This guide will provide instructions on how to set up and configure the FLOWX Engine to meet your specific requirements.

Infrastructure prerequisites

The FLOWX Engine requires the following components to be set up before it can be started:

Docker engine - version 17.06 or higher
Kafka - version 2.8 or higher
Elasticsearch - version 7.11.0 or higher
DB instance

Dependencies

For Microservices architecture, some Microservices holds their data individually using separate Databases.

Database

A basic Postgres configuration can be set up using a helm values.yaml file as it follows:

helm values.yaml:

  onboardingdb:
    existingSecret: {{secretName}}
    metrics:
      enabled: true
      service:
        annotations:
          prometheus.io/port: {{prometheus port}}
          prometheus.io/scrape: "true"
        type: ClusterIP
      serviceMonitor:
        additionalLabels:
          release: prometheus-operator
        enabled: true
        interval: 30s
        scrapeTimeout: 10s
    persistence:
      enabled: true
      size: 1Gi
    postgresqlDatabase: onboarding
    postgresqlExtendedConf:
      maxConnections: 200
      sharedBuffers: 128MB
    postgresqlUsername: postgres
    resources:
      limits:
        cpu: 6000m
        memory: 2048Mi
      requests:
        cpu: 200m
        memory: 512Mi

Redis server - a Redis cluster is required for the engine to cache process definitions, compiled scripts, and Kafka responses
Kafka cluster - Kafka is the backbone of the engine and all plugins and integrations are accessed via the Kafka broker
Additional dependencies - details about how to set up logging via Elasticsearch, monitoring, and tracing via Jaeger, can be found here

Configuration

Configuring Kafka

Kafka handles all communication between the FLOWX Engine and external plugins and integrations. It is also used for notifying running process instances when certain events occur.

Both a producer and a consumer must be configured. The following Kafka-related configurations can be set by using environment variables:

SPRING_KAFKA_BOOTSTRAP_SERVERS - the address of the Kafka server, it should be in the format "host:port"
KAFKA_AUTH_EXCEPTION_RETRY_INTERVAL - the interval between retries after AuthorizationException is thrown by KafkaConsumer
KAFKA_MESSAGE_MAX_BYTES - this is the largest size of the message that can be received by the broker from a producer.

Consumer groups & consumer threads

In Kafka a consumer group is a group of consumers that jointly consume and process messages from one or more Kafka topics. Each consumer group has a unique identifier called a group ID, which is used by Kafka to manage message consumption and distribution among the members of the group.

Thread numbers, on the other hand, refer to the number of threads that a consumer application uses to process messages from Kafka. By default, each consumer instance runs in a single thread, which can limit the throughput of message processing. Increasing the number of consumer threads can help to improve the parallelism and efficiency of message consumption, especially when dealing with high message volumes.

Both group IDs and thread numbers can be configured in Kafka to optimize the processing of messages according to specific requirements, such as message volume, message type, and processing latency.

The configuration related to consumers (group ids and thread numbers) can be configured separately for each message type as it follows:

KAFKA_CONSUMER_GROUP_ID_NOTIFY_ADVANCE - related to a Kafka consumer group that receives messages related to notifying advance actions, it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_GROUP_ID_NOTIFY_PARENT - related to a Kafka consumer group that receives messages related to notifying when a subprocess is blocked and it sends a notification to the parent process, it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_GROUP_ID_ADAPTERS - related to a Kafka consumer group that receives messages related to adapters, it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_GROUP_ID_SCHEDULER_RUN_ACTION - related to a Kafka consumer group that receives messages related to requests to run scheduled actions, it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_GROUP_ID_SCHEDULER_ADVANCING- related to a Kafka consumer group that receives messages related to messages sent by the scheduler when a timeout expires, indicating that the advancing should continue (parallel advancing), it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_GROUP_ID_PROCESS_START - related to a Kafka consumer group that receives messages related to starting processes, it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_GROUP_ID_PROCESS_EXPIRE - related to expiring processes, it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_GROUP_ID_PROCESS_OPERATIONS - related to a Kafka consumer group that receives messages related to processing operations from task management, it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_THREADS_NOTIFY_ADVANCE - the number of threads used by a Kafka consumer application to notify the Kafka broker about the progress of
KAFKA_CONSUMER_THREADS_NOTIFY_PARENT - the number of threads used by a Kafka consumer application, related to a process when it is blocked
KAFKA_CONSUMER_THREADS_ADAPTERS- the number of threads used by a Kafka consumer application, related to adapters
KAFKA_CONSUMER_THREADS_SCHEDULER_RUN_ACTION - the number of threads used by a Kafka consumer application, related to requests to run scheduled actions
KAFKA_CONSUMER_THREADS_SCHEDULER_ADVANCING - the number of threads used by a Kafka consumer application, related to messages sent by the scheduler when a timeout expires, indicating that the advancing should continue (parallel advancing), it is used to configure the group ID for this consumer group
KAFKA_CONSUMER_THREADS_PROCESS_START - the number of threads used by a Kafka consumer application, related to starting processes
KAFKA_CONSUMER_THREADS_PROCESS_EXPIRE - the number of threads used by a Kafka consumer application, related to expiring processes
KAFKA_CONSUMER_THREADS_PROCESS_OPERATIONS - the number of threads used by a Kafka consumer application, related to processing operations from task management

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
KAFKA_CONSUMER_GROUP_ID_NOTIFY_ADVANCE	notif123-preview
KAFKA_CONSUMER_GROUP_ID_NOTIFY_PARENT	notif123-preview
KAFKA_CONSUMER_GROUP_ID_ADAPTERS	notif123-preview
KAFKA_CONSUMER_GROUP_ID_SCHEDULER_RUN_ACTION	notif123-preview
KAFKA_CONSUMER_GROUP_ID_PROCESS_START	notif123-preview
KAFKA_CONSUMER_GROUP_ID_PROCESS_EXPIRE	notif123-preview
KAFKA_CONSUMER_GROUP_ID_PROCESS_OPERATIONS	notif123-preview
KAFKA_CONSUMER_THREADS_NOTIFY_ADVANCE	6
KAFKA_CONSUMER_THREADS_NOTIFY_PARENT	6
KAFKA_CONSUMER_THREADS_ADAPTERS	6
KAFKA_CONSUMER_THREADS_SCHEDULER_RUN_ACTION	6
KAFKA_CONSUMER_THREADS_PROCESS_START	6
KAFKA_CONSUMER_THREADS_PROCESS_EXPIRE	6
KAFKA_CONSUMER_THREADS_PROCESS_OPERATIONS	6

It is important to know that all the events that start with a configured pattern will be consumed by the engine. This makes it possible to create a new integration and connect it to the engine without changing the configuration of the engine.

KAFKA_TOPIC_PROCESS_NOTIFY_ADVANCE - Kafka topic used internally by the engine
KAFKA_TOPIC_PROCESS_NOTIFY_PARENT - topic used for sub-processes to notify parent process when finished
KAFKA_TOPIC_PATTERN - the topic name pattern that the Engine listens on for incoming Kafka events
KAFKA_TOPIC_LICENSE_OUT - the topic name used by the Engine to generate licensing-related details

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
KAFKA_TOPIC_PROCESS_NOTIFY_ADVANCE	paperflow-process-notify
KAFKA_TOPIC_PROCESS_NOTIFY_PARENT	flowx-process-parent-notify
KAFKA_TOPIC_PATTERN	ro.flowx.updates.qa-.*
KAFKA_TOPIC_LICENSE_OUT	ai.flowx.license

KAFKA_TOPIC_TASK_OUT - used for sending notifications to the plugin
KAFKA_TOPIC_PROCESS_OPERATIONS_IN - used for receiving calls from the task management plugin

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
KAFKA_TOPIC_TASK_OUT	ai.flowx.task.in
KAFKA_TOPIC_PROCESS_OPERATIONS_IN	ai.flowx.process.operations

»Scheduler

KAFKA_TOPIC_PROCESS_EXPIRE_IN - the topic name that the Engine listens on for requests to expire processes
KAFKA_TOPIC_PROCESS_SCHEDULE_OUT_SET - the topic name used by the Engine to schedule a process expiration
KAFKA_TOPIC_PROCESS_SCHEDULE_OUT_STOP - the topic name used by the Engine to stop a process expiration
KAFKA_TOPIC_PROCESS_SCHEDULE_IN_RUN_ACTION - the topic name that the Engine listens on for requests to run scheduled actions
KAFKA_TOPIC_PROCESS_SCHEDULE_IN_ADVANCE - the topic name where Engine listens for events, where scheduler sends messages related to advancing through a database

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
KAFKA_TOPIC_PROCESS_EXPIRE_IN	ai.flowx.process.expire
KAFKA_TOPIC_PROCESS_SCHEDULE_OUT_SET	ai.flowx.in.schedule.message.v1
KAFKA_TOPIC_PROCESS_SCHEDULE_OUT_STOP	ai.flowx.in.stop.scheduled.message.v1
KAFKA_TOPIC_PROCESS_SCHEDULE_IN_RUN_ACTION	ai.flowx.action.run
KAFKA_TOPIC_PROCESS_SCHEDULE_IN_ADVANCE	ai.flowx.schedule.advancing

»Using the scheduler

KAFKA_TOPIC_DATA_SEARCH_IN - the topic name that the Engine listens on for requests to search for processes
KAFKA_TOPIC_DATA_SEARCH_OUT - the topic name used by the Engine to reply after finding a process

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
KAFKA_TOPIC_DATA_SEARCH_IN	ai.flowx.dev.core.trigger.search.data.v1
KAFKA_TOPIC_DATA_SEARCH_OUT	ai.flowx.dev.engine.receive.core.search.data.results.v1

KAFKA_TOPIC_AUDIT_OUT - topic key for sending audit logs. Default value: ai.flowx.audit.log

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
`KAFKA_TOPIC_AUDIT_OUT`	ai.flowx.audit.log

Processes that can be started by sending messages to a Kafka topic

KAFKA_TOPIC_PROCESS_START_IN - the Engine listens on this topic for requests to start a new process instance
KAFKA_TOPIC_PROCESS_START_OUT - used for sending out the reply after starting a new process instance

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
`KAFKA_TOPIC_PROCESS_START_IN`	ai.flowx.in.start.process
`KAFKA_TOPIC_PROCESS_START_OUT`	ai.flowx.out.start.process

Default parameter (env var)	Default FLOWX.AI value (can be overwritten)
`KAFKA_TOPIC_PROCESS_INDEX_OUT`	ai.flowx.dev.core.index.process.v1

Configuring WebSockets

The engine communicates with the frontend application via WebSockets. The following environment variables need to be configured for the socket server connection details:

WEB_SOCKET_SERVER_URL_EXTERNAL - the external URL of the WebSocket server
WEB_SOCKET_SERVER_PORT - the port on which the WebSocket server is running
WEB_SOCKET_SERVER_PATH - the WebSocket server path

Configuring file upload size

The maximum file size allowed for uploads can be set by using the following environment variables:

SPRING_SERVLET_MULTIPART_MAX_FILE_SIZE - maximum file size allowed for uploads
SPRING_SERVLET_MULTIPART_MAX_REQUEST_SIZE - maximum request size allowed for uploads

Configuring Advancing controller

To use advancing controller, the following env vars are needed for process-engine to connect to Advancing Postgres DB:

ADVANCING_DATASOURCE_JDBC_URL - environment variable used to configure a JDBC (Java database connectivity) data source, it specifies the connection URL for a particular database, including the server, port, database name, and any other connection parameters necessary
ADVANCING_DATASOURCE_USERNAME - environment variable used to authenticate the user access to the data source
ADVANCING_DATASOURCE_PASSWORD - environment variable used to set the password for a data source connection

»Advancing controller setup

Configuring Scheduler

Below you can find a configuration .yaml to use scheduler service together with FLOWX Engine:

scheduler:
  processCleanup:
    enabled: false
    cronExpression: 0 */5 0-5 * * ? #every day during the night, every 5 minutes, at the start of the minute.
    batchSize: 1000
  masterElection:
    cronExpression: 30 */3 * * * ? #master election every 3 minutes
  websocket:
    namespace:
      cronExpression: 0 * * * * *
      expireMinutes: 30

Below you can find a configuration .yaml to use scheduler service together with FLOWX Engine:

processCleanup: A configuration for cleaning up processes.
enabled specifies whether this feature is turned on or off.
cronExpression is a schedule expression that determines when the cleanup process runs. In this case, it runs every day during the night (between 12:00 AM and 5:59 AM) and every 5 minutes, at the start of the minute.
batchSize specifies the number of processes to be cleaned up in one batch.
masterElection: A configuration for electing a master.
websocket: A configuration for WebSocket connections.
expireMinutes specifies how long the WebSocket namespace is valid for (30 minutes in this case).

FLOWX Engine setup guide

Introduction​

Infrastructure prerequisites​

Dependencies​

Database​

Configuration​

Configuring Kafka​

Consumer groups & consumer threads​

Topics related to the Task Management plugin​

Topics related to the scheduler extension​

Topics related to the Search Data service​

Topics related to the Audit service​

Processes that can be started by sending messages to a Kafka topic​

Topics related to process event messages​

Configuring WebSockets​

Configuring file upload size​

Configuring Advancing controller​

Configuring Scheduler​

Was this page helpful?