This blog is a summary of the following book:

Analyzing the analyzers: An Introspective Survey of Data Scientists and Their Work. Published in June 2013 Link: https://www.oreilly.com/data/free/analyzing-the-analyzers.csp

This book offers a good introduction to people who want to become data scientists and to employers who want to incorporate data science into their business. It’s easy to read and understand. It’s a book worth reading multiple times at different stages of your data science career. Here I summarize key points which I believe are insightful and meaningful to those interested in data science.

Despite of the popularity of the buzzword ‘data science’, data scientist is still a new professions that have no clear expectations about what a practitioner is able to do, without well-defined educational and career paths. Therefore, there exist much misunderstanding about data science among both the employers and employees. For examples, when looking for data scientists, the employers may be looking for Rockstar programmers who have developed highly sophisticated machine learning algorithms, people who have built a distributed back-end big data platform, or people who can dig into hundreds of gigabytes data and provide an effective business solution within a couple of hours. At the same time, people who are already in the data scientist positions may be confused about their role within the organizations and how they should advance their career. Find the right people for a task is all about communication and, without the appropriate shared vocabulary , data science problems and data science talents are too often kept apart.

This book has summarized five major skills of data scientists and four different types of data scientists as follows:

Five major skills of data scientists:

  • business
  • machine learning/big data
  • math/operational research
  • programming
  • statistics

Four different types of data scientists:

  • data businessperson (best at business and statistics)
  • data creative (best at big data and statistics)
  • data developer (best at programming and big data)
  • data researcher (best at statistics)

Among them, data researchers often start with academic research in the physical or social sciences, or statistics. Many organizations have realized the value of deep academic training in the use of data to understand complex processes. 75% of data researchers have published in peer-reviewed journals, and over half have a PhD degree.

While the authors emphasize the division of different types of data scientists and their different technical skills, they also envision that the most successful data scientists are T-shaped data scientist. The “T” represents breadth of skills, across the top, with depth in one area (usually domain knowledge) represented by the vertical bar. T-shaped data scientists are those with substantial, deep expertise in at least one aspect of data science and breadth of skills to single-handedly do at least prototype-level versions of all the steps needed to derive new insights or build data products. Given that data science is an inherently collaborative and creative field, T-shaped data scientists can not only successfully work with database administrators, business people, and others with overlapping skill sets to get data projects completed in innovative ways, but also figure out individual data problems more effectively than those without depth.

The authors states that in order for data scientists to work efficiently, the data scientists should work in teams with overlapping skills to be most effective; the organization must provide a platform and opportunities for the individual to be successful. Data science teams need direct access to both raw and decision-makers, and a diversity of skills to make best use of that access. They also need to be supported by a management with a process for adopting and using their results.