TY - GEN
T1 - Deploying analytics with the portable format for analytics (PFA)
AU - Pivarski, Jim
AU - Bennett, Collin
AU - Grossman, Robert L.
PY - 2016/8/13
Y1 - 2016/8/13
N2 - We introduce a new language for deploying analytic models into products, services and operational systems called the Portable Format for Analytics (PFA). PFA is an example of what is sometimes called a model interchange format, a language for describing analytic models that is independent of specific tools, applications or systems. Model interchange formats allow one application (the model producer) to export models and another application (the model consumer or scoring engine) to import models. The core idea behind PFA is to support the safe execution of statistical functions, mathematical functions, and machine learning algo- rithms and their compositions within a safe execution environment. With this approach, the common analytic models used in data science can be implemented, as well as the data transformations and data aggregations required for pre- and post-processing data. PFA compliant scoring engines can be extended by adding new user defined functions described in PFA. We describe the design of PFA. A Data Mining Group (DMG) Working Group is developing the PFA standard. The current version is 0.8.1 and contains many of the commonly used statistical and machine learning models, including regression, clustering, support vector machines, neural networks, etc. We also describe two implementations of Hadrian, one in Scala and one in Python. We discuss four case studies that use PFA and Hadrian to specify analytic models, including two that are deployed in operations at client sites.
AB - We introduce a new language for deploying analytic models into products, services and operational systems called the Portable Format for Analytics (PFA). PFA is an example of what is sometimes called a model interchange format, a language for describing analytic models that is independent of specific tools, applications or systems. Model interchange formats allow one application (the model producer) to export models and another application (the model consumer or scoring engine) to import models. The core idea behind PFA is to support the safe execution of statistical functions, mathematical functions, and machine learning algo- rithms and their compositions within a safe execution environment. With this approach, the common analytic models used in data science can be implemented, as well as the data transformations and data aggregations required for pre- and post-processing data. PFA compliant scoring engines can be extended by adding new user defined functions described in PFA. We describe the design of PFA. A Data Mining Group (DMG) Working Group is developing the PFA standard. The current version is 0.8.1 and contains many of the commonly used statistical and machine learning models, including regression, clustering, support vector machines, neural networks, etc. We also describe two implementations of Hadrian, one in Scala and one in Python. We discuss four case studies that use PFA and Hadrian to specify analytic models, including two that are deployed in operations at client sites.
KW - Deploying analytics
KW - Model producers
KW - PFA
KW - PMML
KW - Portable Format for Analytics
KW - Scoring engines
UR - http://www.scopus.com/inward/record.url?scp=84984949425&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84984949425&partnerID=8YFLogxK
U2 - 10.1145/2939672.2939731
DO - 10.1145/2939672.2939731
M3 - Conference contribution
AN - SCOPUS:84984949425
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 579
EP - 588
BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
Y2 - 13 August 2016 through 17 August 2016
ER -