FLOWX Engine setup guide
Introductionโ
This guide will provide instructions on how to set up and configure the FLOWX Engine to meet your specific requirements.
Infrastructure prerequisitesโ
The FLOWX Engine requires the following components to be set up before it can be started:
- Docker engine - version 17.06 or higher
- Kafka - version 2.8 or higher
- Elasticsearch - version 7.11.0 or higher
- DB instance
Dependenciesโ
For Microservices architecture, some Microservices holds their data individually using separate Databases.
Databaseโ
A basic Postgres configuration can be set up using a helm values.yaml file as it follows:
helm values.yaml:
onboardingdb:
existingSecret: {{secretName}}
metrics:
enabled: true
service:
annotations:
prometheus.io/port: {{prometheus port}}
prometheus.io/scrape: "true"
type: ClusterIP
serviceMonitor:
additionalLabels:
release: prometheus-operator
enabled: true
interval: 30s
scrapeTimeout: 10s
persistence:
enabled: true
size: 1Gi
postgresqlDatabase: onboarding
postgresqlExtendedConf:
maxConnections: 200
sharedBuffers: 128MB
postgresqlUsername: postgres
resources:
limits:
cpu: 6000m
memory: 2048Mi
requests:
cpu: 200m
memory: 512MiRedis server - a Redis cluster is required for the engine to cache process definitions, compiled scripts, and Kafka responses
Kafka cluster - Kafka is the backbone of the engine and all plugins and integrations are accessed via the Kafka broker
Additional dependencies - details about how to set up logging via Elasticsearch, monitoring, and tracing via Jaeger, can be found here
Configurationโ
- Datasource configuration
- Redis configuration
- Logging
- Authorization & access roles
- Configuring access roles for processes
- Kafka configuration
Configuring Kafkaโ
Kafka handles all communication between the FLOWX Engine and external plugins and integrations. It is also used for notifying running process instances when certain events occur.
Both a producer and a consumer must be configured. The following Kafka-related configurations can be set by using environment variables:
SPRING_KAFKA_BOOTSTRAP_SERVERS
- the address of the Kafka server, it should be in the format "host:port"KAFKA_AUTH_EXCEPTION_RETRY_INTERVAL
- the interval between retries after AuthorizationException is thrown by KafkaConsumerKAFKA_MESSAGE_MAX_BYTES
- this is the largest size of the message that can be received by the broker from a producer.
Consumer groups & consumer threadsโ
In Kafka a consumer group is a group of consumers that jointly consume and process messages from one or more Kafka topics. Each consumer group has a unique identifier called a group ID, which is used by Kafka to manage message consumption and distribution among the members of the group.
Thread numbers, on the other hand, refer to the number of threads that a consumer application uses to process messages from Kafka. By default, each consumer instance runs in a single thread, which can limit the throughput of message processing. Increasing the number of consumer threads can help to improve the parallelism and efficiency of message consumption, especially when dealing with high message volumes.
Both group IDs and thread numbers can be configured in Kafka to optimize the processing of messages according to specific requirements, such as message volume, message type, and processing latency.
The configuration related to consumers (group ids and thread numbers) can be configured separately for each message type as it follows:
KAFKA_CONSUMER_GROUP_ID_NOTIFY_ADVANCE
- related to a Kafka consumer group that receives messages related to notifying advance actions, it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_GROUP_ID_NOTIFY_PARENT
- related to a Kafka consumer group that receives messages related to notifying when a subprocess is blocked and it sends a notification to the parent process, it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_GROUP_ID_ADAPTERS
- related to a Kafka consumer group that receives messages related to adapters, it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_GROUP_ID_SCHEDULER_RUN_ACTION
- related to a Kafka consumer group that receives messages related to requests to run scheduled actions, it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_GROUP_ID_SCHEDULER_ADVANCING
- related to a Kafka consumer group that receives messages related to messages sent by the scheduler when a timeout expires, indicating that the advancing should continue (parallel advancing), it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_GROUP_ID_PROCESS_START
- related to a Kafka consumer group that receives messages related to starting processes, it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_GROUP_ID_PROCESS_EXPIRE
- related to expiring processes, it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_GROUP_ID_PROCESS_OPERATIONS
- related to a Kafka consumer group that receives messages related to processing operations from task management, it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_THREADS_NOTIFY_ADVANCE
- the number of threads used by a Kafka consumer application to notify the Kafka broker about the progress ofKAFKA_CONSUMER_THREADS_NOTIFY_PARENT
- the number of threads used by a Kafka consumer application, related to a process when it is blockedKAFKA_CONSUMER_THREADS_ADAPTERS
- the number of threads used by a Kafka consumer application, related to adaptersKAFKA_CONSUMER_THREADS_SCHEDULER_RUN_ACTION
- the number of threads used by a Kafka consumer application, related to requests to run scheduled actionsKAFKA_CONSUMER_THREADS_SCHEDULER_ADVANCING
- the number of threads used by a Kafka consumer application, related to messages sent by the scheduler when a timeout expires, indicating that the advancing should continue (parallel advancing), it is used to configure the group ID for this consumer groupKAFKA_CONSUMER_THREADS_PROCESS_START
- the number of threads used by a Kafka consumer application, related to starting processesKAFKA_CONSUMER_THREADS_PROCESS_EXPIRE
- the number of threads used by a Kafka consumer application, related to expiring processesKAFKA_CONSUMER_THREADS_PROCESS_OPERATIONS
- the number of threads used by a Kafka consumer application, related to processing operations from task management
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_CONSUMER_GROUP_ID_NOTIFY_ADVANCE | notif123-preview |
KAFKA_CONSUMER_GROUP_ID_NOTIFY_PARENT | notif123-preview |
KAFKA_CONSUMER_GROUP_ID_ADAPTERS | notif123-preview |
KAFKA_CONSUMER_GROUP_ID_SCHEDULER_RUN_ACTION | notif123-preview |
KAFKA_CONSUMER_GROUP_ID_PROCESS_START | notif123-preview |
KAFKA_CONSUMER_GROUP_ID_PROCESS_EXPIRE | notif123-preview |
KAFKA_CONSUMER_GROUP_ID_PROCESS_OPERATIONS | notif123-preview |
KAFKA_CONSUMER_THREADS_NOTIFY_ADVANCE | 6 |
KAFKA_CONSUMER_THREADS_NOTIFY_PARENT | 6 |
KAFKA_CONSUMER_THREADS_ADAPTERS | 6 |
KAFKA_CONSUMER_THREADS_SCHEDULER_RUN_ACTION | 6 |
KAFKA_CONSUMER_THREADS_PROCESS_START | 6 |
KAFKA_CONSUMER_THREADS_PROCESS_EXPIRE | 6 |
KAFKA_CONSUMER_THREADS_PROCESS_OPERATIONS | 6 |
It is important to know that all the events that start with a configured pattern will be consumed by the engine. This makes it possible to create a new integration and connect it to the engine without changing the configuration of the engine.
KAFKA_TOPIC_PROCESS_NOTIFY_ADVANCE
- Kafka topic used internally by the engineKAFKA_TOPIC_PROCESS_NOTIFY_PARENT
- topic used for sub-processes to notify parent process when finishedKAFKA_TOPIC_PATTERN
- the topic name pattern that the Engine listens on for incoming Kafka eventsKAFKA_TOPIC_LICENSE_OUT
- the topic name used by the Engine to generate licensing-related details
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_TOPIC_PROCESS_NOTIFY_ADVANCE | paperflow-process-notify |
KAFKA_TOPIC_PROCESS_NOTIFY_PARENT | flowx-process-parent-notify |
KAFKA_TOPIC_PATTERN | ro.flowx.updates.qa-.* |
KAFKA_TOPIC_LICENSE_OUT | ai.flowx.license |
Topics related to the Task Management pluginโ
KAFKA_TOPIC_TASK_OUT
- used for sending notifications to the pluginKAFKA_TOPIC_PROCESS_OPERATIONS_IN
- used for receiving calls from the task management plugin
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_TOPIC_TASK_OUT | ai.flowx.task.in |
KAFKA_TOPIC_PROCESS_OPERATIONS_IN | ai.flowx.process.operations |
Topics related to the scheduler extensionโ
ยปSchedulerKAFKA_TOPIC_PROCESS_EXPIRE_IN
- the topic name that the Engine listens on for requests to expire processesKAFKA_TOPIC_PROCESS_SCHEDULE_OUT_SET
- the topic name used by the Engine to schedule a process expirationKAFKA_TOPIC_PROCESS_SCHEDULE_OUT_STOP
- the topic name used by the Engine to stop a process expirationKAFKA_TOPIC_PROCESS_SCHEDULE_IN_RUN_ACTION
- the topic name that the Engine listens on for requests to run scheduled actionsKAFKA_TOPIC_PROCESS_SCHEDULE_IN_ADVANCE
- the topic name where Engine listens for events, where scheduler sends messages related to advancing through a database
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_TOPIC_PROCESS_EXPIRE_IN | ai.flowx.process.expire |
KAFKA_TOPIC_PROCESS_SCHEDULE_OUT_SET | ai.flowx.in.schedule.message.v1 |
KAFKA_TOPIC_PROCESS_SCHEDULE_OUT_STOP | ai.flowx.in.stop.scheduled.message.v1 |
KAFKA_TOPIC_PROCESS_SCHEDULE_IN_RUN_ACTION | ai.flowx.action.run |
KAFKA_TOPIC_PROCESS_SCHEDULE_IN_ADVANCE | ai.flowx.schedule.advancing |
Topics related to the Search Data serviceโ
KAFKA_TOPIC_DATA_SEARCH_IN
- the topic name that the Engine listens on for requests to search for processesKAFKA_TOPIC_DATA_SEARCH_OUT
- the topic name used by the Engine to reply after finding a process
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_TOPIC_DATA_SEARCH_IN | ai.flowx.dev.core.trigger.search.data.v1 |
KAFKA_TOPIC_DATA_SEARCH_OUT | ai.flowx.dev.engine.receive.core.search.data.results.v1 |
Topics related to the Audit serviceโ
KAFKA_TOPIC_AUDIT_OUT
- topic key for sending audit logs. Default value:ai.flowx.audit.log
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_TOPIC_AUDIT_OUT | ai.flowx.audit.log |
Processes that can be started by sending messages to a Kafka topicโ
KAFKA_TOPIC_PROCESS_START_IN
- the Engine listens on this topic for requests to start a new process instanceKAFKA_TOPIC_PROCESS_START_OUT
- used for sending out the reply after starting a new process instance
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_TOPIC_PROCESS_START_IN | ai.flowx.in.start.process |
KAFKA_TOPIC_PROCESS_START_OUT | ai.flowx.out.start.process |
Topics related to process event messagesโ
Default parameter (env var) | Default FLOWX.AI value (can be overwritten) |
---|---|
KAFKA_TOPIC_PROCESS_INDEX_OUT | ai.flowx.dev.core.index.process.v1 |
Configuring WebSocketsโ
The engine communicates with the frontend application via WebSockets. The following environment variables need to be configured for the socket server connection details:
WEB_SOCKET_SERVER_URL_EXTERNAL
- the external URL of the WebSocket serverWEB_SOCKET_SERVER_PORT
- the port on which the WebSocket server is runningWEB_SOCKET_SERVER_PATH
- the WebSocket server path
Configuring file upload sizeโ
The maximum file size allowed for uploads can be set by using the following environment variables:
SPRING_SERVLET_MULTIPART_MAX_FILE_SIZE
- maximum file size allowed for uploadsSPRING_SERVLET_MULTIPART_MAX_REQUEST_SIZE
- maximum request size allowed for uploads
Configuring Advancing controllerโ
To use advancing controller, the following env vars are needed for process-engine
to connect to Advancing Postgres DB:
ADVANCING_DATASOURCE_JDBC_URL
- environment variable used to configure a JDBC (Java database connectivity) data source, it specifies the connection URL for a particular database, including the server, port, database name, and any other connection parameters necessaryADVANCING_DATASOURCE_USERNAME
- environment variable used to authenticate the user access to the data sourceADVANCING_DATASOURCE_PASSWORD
- environment variable used to set the password for a data source connection
Configuring Schedulerโ
Below you can find a configuration .yaml to use scheduler service together with FLOWX Engine:
scheduler:
processCleanup:
enabled: false
cronExpression: 0 */5 0-5 * * ? #every day during the night, every 5 minutes, at the start of the minute.
batchSize: 1000
masterElection:
cronExpression: 30 */3 * * * ? #master election every 3 minutes
websocket:
namespace:
cronExpression: 0 * * * * *
expireMinutes: 30
Below you can find a configuration .yaml to use scheduler service together with FLOWX Engine:
- processCleanup: A configuration for cleaning up processes.
- enabled specifies whether this feature is turned on or off.
- cronExpression is a schedule expression that determines when the cleanup process runs. In this case, it runs every day during the night (between 12:00 AM and 5:59 AM) and every 5 minutes, at the start of the minute.
- batchSize specifies the number of processes to be cleaned up in one batch.
- masterElection: A configuration for electing a master.
- websocket: A configuration for WebSocket connections.
- expireMinutes specifies how long the WebSocket namespace is valid for (30 minutes in this case).