ML Pipelines at Odnoklassniki
This talk covers the main architecture of Spark ML library, as well as several peculiarities you can meet while using it to solve real problems with a large amount of data processing. We'll focus on a row of limitations which make it harder to use the library, and tell about the extensions for standard elements that had to be developed in order to avoid these limitations and fully unleash the potential of massive distributed machine learning. We'll demonstrate the work of standard library and its extensions using the problem of news feed ranging in Odnoklassniki social network as an example. The talk will be helpful for developers, data engineers and analysts who use ML methods and platforms of distributed processing of information.
Graduated from St. Petersburg State University in 2004, got a PhD degree in the field of the formal logical methods in 2007. Spent almost 9 years in outsourcing without losing contact with the university and research community. Big data analysis at Odnoklassniki became for Dmitry a unique chance to combine theoretical knowledge and scientific foundation to the development of real and popular products. And this chance he gladly took advantage of by coming there in 2011.