Key Components of an ETL Job

Thursday, November 21, 2024 By: Thomas SweeneyIn: FAQ's Print

When configuring an ETL job, several settings determine how it operates:

1. Job Name

What It Is: A descriptive name for the ETL job.
Why It Matters: Helps identify the job among others, especially when managing multiple jobs.
Best Practices:
- Use clear, descriptive names (e.g., “Daily SCCM Computer Import”).
- Include the data source and purpose in the name.

2. Active Status

What It Is: Determines if the job is enabled and will run as scheduled.
Why It Matters: Inactive jobs won’t execute, allowing you to control which jobs are operational.
Best Practices:
- Set to Active when the job is ready for production.
- Use Inactive status for jobs under development or testing.

3. Allow Empty Tables

What It Is: Decides whether to create a staging table if the data source returns no data.
Why It Matters:
- Yes: Creates an empty table, which may prevent errors in downstream processes expecting the table.
- No: Skips table creation, which can save resources if the absence of data is acceptable.
Best Practices:
- Set to Yes if downstream processes require the table’s existence.
- Set to No to avoid unnecessary empty tables.

4. Destination Table

What It Is: The name of the staging table where data will be loaded in ReadyWorks.
Why It Matters: It’s the table you’ll reference in data mappings and other processes.
Best Practices:
- Ensure the name is unique and descriptive.
- Match the naming convention used in your organization.

5. Data Identity

What It Is: An identifier used when dealing with multiple data sources or environments.
Why It Matters: Helps distinguish between data from different sources or environments (e.g., production vs. test).
Best Practices:
- Keep it the same as the destination table unless you have multiple data sources.

6. Job Schedule

What It Is: Specifies when the job runs—Hourly, Daily, or Weekly.
Why It Matters: Aligns data imports with your organization’s data refresh needs.
Best Practices:
- Hourly: For frequently changing data requiring constant updates.
- Daily: For data that changes daily or doesn’t need immediate updates.
- Weekly: For static data or data that doesn’t change often.

7. Order

What It Is: Defines the execution order of jobs with the same schedule.
Why It Matters: Ensures jobs run in a specific sequence, especially when one job depends on the output of another.
Best Practices:
- Assign lower numbers to higher-priority jobs.
- Adjust order numbers to manage dependencies.

8. Fields to Index

What It Is: Lists columns to index in the staging table for faster data retrieval.
Why It Matters: Improves performance during data mapping and querying.
Best Practices:
- Index columns frequently used in joins or filters.
- Separate multiple columns with commas (e.g., “ComputerName, SerialNumber”).

9. Data Selection (SQL Query)

What It Is: The SQL query or command that extracts data from the source.
Why It Matters: Defines the actual data retrieved and loaded into ReadyWorks.
Best Practices:
- Write efficient queries that only select necessary data.
- Test queries in a SQL editor before using them in the ETL job.

How helpful do you find this article?

Likes:

Comments

No records to display