Configuring
Function Operation Instructions
1. Parameter Meanings
When configuring a MongoDB data synchronization task, here is the detailed meaning of each parameter:
-
workName:
- Meaning: Task name
- Description: Used to identify the name of the data synchronization task. If not provided, it defaults to "workNameDefault".
-
sourceDsUrl:
- Meaning: Source MongoDB connection URL
- Description: Specifies the connection URL of the source MongoDB database, which can be a single node, a replica set, or a sharded cluster.
-
targetDsUrl:
- Meaning: Target MongoDB connection URL
- Description: Specifies the connection URL of the target MongoDB database, which can be a single node, a replica set, or a sharded cluster.
-
syncMode:
- Meaning: Synchronization mode
- Description: Specifies the mode of data synchronization, which can be one of the following options:
- "all": Full mode, sync all tables, excluding operations on source tables during synchronization.
- "allAndRealTime": Full plus real-time mode, performs full sync first and then starts real-time sync.
- "allAndIncrement": Full plus incremental mode, performs full sync first and then syncs only operations on source tables during synchronization.
- "realTime": Real-time mode, syncs based on configured start and end times.
-
realTimeType:
- Meaning: Real-time task type
- Description: Selects the type of real-time task, which can be "oplog" or "changestream".
- Additional Information:
- "oplog": Uses MongoDB's oplog for real-time synchronization, suitable for source replica sets, supports DDL operations, and is faster.
- "changestream": Uses MongoDB's changestream for real-time synchronization, suitable for source replica sets or mongos, does not support DDL operations, and has moderate speed.
-
fullType:
- Meaning: Full task type
- Description: Selects the type of full task, which can be "sync" or "reactive".
- Additional Information:
- "sync": Uses a stable transmission method for full synchronization.
- "reactive": Uses a faster transmission method for full synchronization.
-
dbTableWhite:
- Meaning: Tables to synchronize
- Description: Specifies tables to synchronize using regular expressions. For example, to sync all tables under the mongodb database: mongodb\..+, the default is to sync all tables.
-
ddlFilterSet:
- Meaning: DDL operations to synchronize
- Description: Specifies DDL operations to synchronize, separated by commas. The default is *, meaning sync all DDL operations.
-
sourceThreadNum:
- Meaning: Source task thread number (full mode)
- Description: Specifies the number of threads to read source tasks in full synchronization.
-
targetThreadNum:
- Meaning: Target task thread number (full mode)
- Description: Specifies the number of threads to write target tasks in full synchronization.
... (Continues with the rest of the parameter explanations)
2. Parameter Usage Scope
| Parameter | Real-Time Task | Full Task | Full + Increment Task | Full + Real-Time Task |
|--------------------|--------------|----------|----------------------|-----------------------|
| workName | ✔️ | ✔️ | ✔️ | ✔️ |
| sourceDsUrl | ✔️ | ✔️ | ✔️ | ✔️ |
| targetDsUrl | ✔️ | ✔️ | ✔️ | ✔️ |
| syncMode | ✔️ | ✔️ | ✔️ | ✔️ |
| realTimeType | ✔️ | | ✔️ | ✔️ |
| fullType | | ✔️ | ✔️ | ✔️ |
| dbTableWhite | ✔️ | ✔️ | ✔️ | ✔️ |
| ddlFilterSet | ✔️ | | ✔️ | ✔️ |
| batchSize | ✔️ | ✔️ | ✔️ | ✔️ |
| bucketNum | ✔️ | ✔️ | ✔️ | ✔️ |
| bucketSize | ✔️ | ✔️ | ✔️ | ✔️ |
| startOplogTime | ✔️ | | | |
| endOplogTime | ✔️ | | ✔️ | ✔️ |
| delayTime | ✔️ | | | |
| nsBucketThreadNum | ✔️ | | | |
| writeThreadNum | ✔️ | | | |
| ddlWait | ✔️ | ✔️ | ✔️ | ✔️ |
| clusterInfoSet | ✔️ | ✔️ | ✔️ | ✔️ |
| bind_ip | ✔️ | ✔️ | ✔️ | ✔️ |
3. Data Validation
# Data validation script
# 0: Multi-threaded validation: Configure 1-8 validation methods after 0, which can be processed concurrently
# 1: Estimate count validation for libraries and tables (may be inaccurate)
# 2: Accurate count validation for libraries and tables
# 3: Library and table dbHash validation (locks the library, use with caution)
# 4: Validate 100 randomly selected data from libraries and tables, source side randomly selects 100 data, check if they exist on the target side
# 5: Validate 100 data of each data type from libraries and tables, extract 100 data of each data type for _id (first 50 and last 50), check if they exist on the target side
# 6: Check missing index information in libraries and tables
# 7: Check missing index information in libraries and tables and create missing indexes
# 8: Library dbHash validation (locks the library, use with caution)
# 9: Output detailed validation log information. When not specified, the log only records abnormal validation information
# Can be used in combination, e.g., 123456 123457 1237. If not specified, the default is combination 16
checkData=12456