The Evolution of the Analytic Environment and Analytic Professional

The maturation of the analytic environment has escalated significantly in driven predominantly by low cost hardware and open source technologies such as R, Python, Julia, Hadoop, and more recently Spark.

Analytic solutions such as R & Python have been around for a long time but they were not built to handle large data.  Around eight years ago the adoption of R & Python started to escalate within corporations. The large digital firms like Google & Netflix were using those solutions much earlier but it took time for other corporations to start using open source analytics in production environments.  Then integrations with big data solutions such as Hadoop began but they required significant work to setup and maintain and were not optimal.  Yet, they did provide the ability to run sophisticated analytics on large data environments using open source technology which was not possible in the past so this was a big step forward.  Finally Spark became mainstream with nice R & Python integrations which are quickly evolving and easy to use such as RSpark, PySpark, and most recently SparklyR.

In years past, the cost of entry for an analytic environment was significant and required large servers (originally mainframes then Unix) along with proprietary and expensive software.  Now the cost of entry is only limited by skills.  An open source analytic environment with R, Python & Spark will do the job, and many processes may now run on personal laptops, but requires the skills to setup and optimize that environment.  And if a large cluster of computers is needed solutions such as Amazon EMR , Google Dataproc and Microsoft Azure HDInsight allow us to setup environments cheap and only scale them out and pay when they are needed.

Analytic professionals have had to evolve quickly as well to keep up.  The term or title Data Scientist started to become used often in 2014.  A Data Scientist is more technical than Analytic professionals of years past.  Data Scientists must have the skills to be able to setup an environment on their personal systems and should be well versed in at least one cloud environment (AWS, Digital Ocean, Google,..) and need to be comfortable with Linux.

Data Scientists also need to be business domain experts as they should be focused on providing actionable insight that directly drives incremental revenue (or reduce costs) as well as be armed with the skills that allows them to communicate insights to all business audiences.  Having both the deep technical and analytic skills, as well as the business acumen and communication skills, is a challenge but should be the goal of all Data Scientists.

The analytic evolution is further enabling the ability to streamline and productionize processes into self-sustaining solutions so Data Scientists, and technology experts (Data engineers), may continue to focus on pushing the boundary of insight development.  Digital examples include modules which automatically identify potential customer interests, based on past behavior, and serve tailored banner and video ads in real time.

It certainly is an exciting time for analytics … sorry, Data Science.  Perhaps I feel a bit more excitement then some since I started with the pain of mainframes and proprietary expensive software.