 |
|
 |
|
|
|
|
 |
 |
(KEL) is a solution for Aggregation of continuous attributes. It allows the automatic creation of event aggregates for multiple time periods and categories.
KEL provides a quick and easy way to create wide variety of aggregates from continuous attributes. Multiple time periods can be defined, such as weeks, months, quarters, or years. Automated pivoting and filtering capabilities allow the creation of aggregates per event category. Within each defined time period and category, KEL will compute counts, sums, minimums, maximums, averages, or differences between time periods as specified. In addition, it allows the specification of a "relative" date as the starting point for the aggregations, meaning that dates such as the "day of first purchase" or "data of cancellation" can be used to create meaningful aggregates for data mining. KEL works for any of the allowable KAF data sources and is not restricted to relational databases. The aggregates are created "on the fly" without making any changes to the underlying files or database tables.
Benefits: KEL improves the quality of predictive models by allowing quick investigation of the value of event aggregates for continuous attributes. For example counts, sums, and averages of purchase history by product category by month over 24 months and by quarter over 8 quarters can be automatically rolled up (1920 new attributes with 20 categories) and added as inputs to a cross sell model. Alternatively the count of calls and sum of minutes broken out by peak, evening, and weekend times for the 12 months leading up to a cancelled contract can be rolled up (144 new attributes) and added as inputs to a mobile phone churn model. Attributes that show significant additional value for the model can then be added to the production analytic record definition through the KXEN Data Manipulation interface. learn more
|
What: KEL is a data manipulation component that builds a “mineable” representation of an event history. It merges static information from a static table with dynamic information from history tables, which is aggregated automatically per period of time.
Why: The information necessary to build predictive models is often spread across a table containing “static” information such as customer demographics or equipment specifications and a log of transactions such as purchase history, service call history or equipment alarms. To build predictive models, this data must be compressed and combined into a single row, representing both the static reference information and the event history.
How: KEL creates aggregates on user defined periods. Period length can be day, week, month, etc. They are computed from a reference date that can be fixed or specific to each of the reference cases (e.g., date of first purchase for a customer). KEL is programmable and you can specify the aggregates (min, max, sum, count, etc.).
Benefits for the business user: KEL does not require programming to perform this sophisticated aggregation. Due to the speed of KEL, several aggregation options can be tested ad-hoc to find the most meaningful solution.
Benefits for the Data Mining expert: KEL enables the Data Mining professional to include additional historical data in the analysis process, resulting in better models. KEL is fast and can handle very large data sets.
Benefits for the Integration specialist and IT: Only one pass of the log table is required, using an efficient internal data representation. Building transactional aggregates can be done in minutes instead of days, and can be used to prototype permanent ETL processes. No changes to the underlying schema are required.
Example: For CRM, the most valuable information is how a customer has interacted with a company and its products. This information is typically stored as a purchase history, or call center log. When performing an analysis to predict customer churn, a customer’s actions with respect to the time they left can be critical for maximizing model quality. This requires an event aggregation based on the churn date. Customers churn at different times, so aggregating on a fixed date, such as January 2001, is not necessarily meaningful for the analysis. In this case, the count of purchases and complaint calls, and the sum of purchases could be automatically aggregated for each month in the year before the churn date. Once this is done by KEL, K2R could be used to predict churn.
Example: In a different scenario, when predicting machine part failure, the static information about a particular piece of equipment (lot number, manufacture date, etc.) is not nearly as important as how the equipment has been used. The operating logs, with conditions such as temperature and pressure can be utilized by KEL. Again, a relative date such as the day the equipment was entered into service is the appropriate point for aggregation. A series of alarms in a new machine can be very different than the same set of alarms in a ten-year-old machine. Alarm counts along with maximum pressure and temperature for each quarter over the first five years of service life could be automatically created by KEL. In this case, K2S might be used in addition to create segments of equipment with high risk and low risk for failure. |
|
|
 |
|
|
 |
 |
|
|
|
|
 |
|
 |