When configuring an ETL job, several settings determine how it operates:
1. Job Name
- What It Is: A descriptive name for the ETL job.
- Why It Matters: Helps identify the job among others, especially when managing multiple jobs.
- Best Practices:
- Use clear, descriptive names (e.g., “Daily SCCM Computer Import”).
- Include the data source and purpose in the name.
2. Active Status
- What It Is: Determines if the job is enabled and will run as scheduled.
- Why It Matters: Inactive jobs won’t execute, allowing you to control which jobs are operational.
- Best Practices:
- Set to Active when the job is ready for production.
- Use Inactive status for jobs under development or testing.
3. Allow Empty Tables
- What It Is: Decides whether to create a staging table if the data source returns no data.
- Why It Matters:
- Yes: Creates an empty table, which may prevent errors in downstream processes expecting the table.
- No: Skips table creation, which can save resources if the absence of data is acceptable.
- Best Practices:
- Set to Yes if downstream processes require the table’s existence.
- Set to No to avoid unnecessary empty tables.
4. Destination Table
- What It Is: The name of the staging table where data will be loaded in ReadyWorks.
- Why It Matters: It’s the table you’ll reference in data mappings and other processes.
- Best Practices:
- Ensure the name is unique and descriptive.
- Match the naming convention used in your organization.
5. Data Identity
- What It Is: An identifier used when dealing with multiple data sources or environments.
- Why It Matters: Helps distinguish between data from different sources or environments (e.g., production vs. test).
- Best Practices:
- Keep it the same as the destination table unless you have multiple data sources.
6. Job Schedule
- What It Is: Specifies when the job runs—Hourly, Daily, or Weekly.
- Why It Matters: Aligns data imports with your organization’s data refresh needs.
- Best Practices:
- Hourly: For frequently changing data requiring constant updates.
- Daily: For data that changes daily or doesn’t need immediate updates.
- Weekly: For static data or data that doesn’t change often.
7. Order
- What It Is: Defines the execution order of jobs with the same schedule.
- Why It Matters: Ensures jobs run in a specific sequence, especially when one job depends on the output of another.
- Best Practices:
- Assign lower numbers to higher-priority jobs.
- Adjust order numbers to manage dependencies.
8. Fields to Index
- What It Is: Lists columns to index in the staging table for faster data retrieval.
- Why It Matters: Improves performance during data mapping and querying.
- Best Practices:
- Index columns frequently used in joins or filters.
- Separate multiple columns with commas (e.g., “ComputerName, SerialNumber”).
9. Data Selection (SQL Query)
- What It Is: The SQL query or command that extracts data from the source.
- Why It Matters: Defines the actual data retrieved and loaded into ReadyWorks.
- Best Practices:
- Write efficient queries that only select necessary data.
- Test queries in a SQL editor before using them in the ETL job.