We are sometimes asked, “Why did KXEN develop its own algorithms?” Since that question goes to the very core of why we founded KXEN, I thought it would be useful to share the history and reasoning behind this.
The reason why KXEN has developed its own algorithms is, as often is the case for inventions, a matter of knowledge, luck, serendipity, and … a problem that needed solving!
- Knowledge: the core team of KXEN has been in machine learning for over 25 years. The team is a blend of scientists, engineers, and consultants that have been concocting machine learning algorithms, implementing these mathematical constructs as efficient computer programs, and putting these things into operation.
- Luck: Back in 1995, Vladimir Vapnik came to France to give some seminars, and some of us saw the light! Vladimir changed the machine learning game with his new ‘Statistical Learning Theory’: he provided the machine learning guys with a mathematical framework that allowed them finally to understand, at the core, why some techniques were working and some others were not. All of a sudden, a new realm of algorithms could be written that would use mathematical equations instead of engineering data science tricks (don’t get me wrong here: I am an engineer at heart and I know the value of “tricks,” but tricks cannot overcome the drawbacks of a bad mathematical framework). Here was a foundation for automated data mining techniques that would perform as well as the best data scientists deploying these tricks. Luck is not enough though; it was because we knew a lot about statistics and machine learning that we were able to decipher the nuggets of gold in Vladimir’s theory.
- Serendipity: In September 1998, we met with Michel Bera (our Chief Scientific Officer, and the guy that concocted the core of our classification engine) and agreed that all these companies who had been collecting zillions of bytes of data would soon need to analyze that data to finally get a return on their investment. We later discovered that we were a little early to the game in back 1998, but with over 400 global customers today, KXEN has thrived, and the market has never been so ripe for simple, automated, advanced analytics as it is today.
- A problem to solve: the market needed a system that is able to perform classification and regression (we later added clustering/segmentation, times series analysis, association rules and social network analysis), that has the following characteristics:
- Non-parametric: little user intervention and tuning should be required — it should work well out of the box.
- Independent of the data and target distribution:
- Target: the classification system should be able to handle rates of positive values even as low as 0.1% (such as in fraud, for example), or be able to forecast a continuous value with only 1% of non zero values.
- Data: it should automate mixing and matching and comparing influence for ordinal, nominal, continuous,and textual variables without any user intervention.
- Scalable in number of rows: the training time should be linear with the number of rows, and the quality of the models should increase with the number of rows.
- Scalable in number of columns: the training time should be close to linear with respect to the number of columns, and the quality of the models should increase with the number of columns. It is well known that most algorithms present a problem of over-fitting in high dimensions; it is quite ironic that companies spend billions of dollars in collecting data but often cannot take advantage of all this data because most first-generation analytical workbenches collapse trying to handle the high dimensionality inherent in all this data.
- Descriptive: a good predictive analytics package must be able to present its findings in a way that a business user can understand. We have always believed that there is a continuum between predictive and descriptive analytics: predictive models should be descriptive enough and descriptive models should be usable in a predictive manner to make decisions.
- Deployable: the scoring equations should be simple enough to be deployed in any operational environment: SQL for databases, Java code for the web (or even for smartphones), etc.
Guess what? None of the tens of algorithms present in the first generation tools (such as SAS Enterprise Miner or IBM SPSS/Clementine) were providing a solution for all of this back in 1998 — and they still don’t in 2012!
I know what you’re thinking: “This guy is so pretentious! This is a daunting task which data scientists have worked on for years and they still have not found a solution!” As a matter of fact, it was not vanity, but naivete (the 5th element required to create a company based on invention!) — we simply assumed that we could in fact make a software package that did all this. That innocent assumption, mixed with knowledge, luck and serendipity, led us to the solution we have today. Vapnik’s theory provided us with a mathematical framework for capabilities 1, 2 and 4 above; what remained was 3, 5 and 6, which we solved with a well known pattern in machine learning: by using linear systems in a properly encoded space (the trick is to have the good encoded space). The low response rate problem (described above in capability 2) requires careful usage of the metrics used to optimize the learning algorithms (metrics that should be independent of the target distribution).
And so we started KXEN!
13 years later, we have hundreds of customers throughout the world, and nearly 99% of them bought KXEN technology (called InfiniteInsight®) after benchmarking it against the first generation tools used by their data scientists (and if they bought the product, then you know KXEN’s InfiniteInsight® came out the winner of the benchmarks!).
Have fun using InfiniteInsight®!