Big Data IV – Unicorn Or Safety In Numbers?

“…The CEO calmly responds, ‘I want GOD! I want a rockstar programmer who has… built a … big data platform, and has started a company!’

Dmitri respectfully responds back: ‘I wish you the best of luck finding that person.’ ”

That’s the punchline of an entertaining and eye opening 2015 article by Harlan Harris, “Analyzing the Analyzers”[1]. It goes on to say,

“In his next interview with a small but exciting startup… he is informed that the next step in the process is a fun project that should only consume a few hours:

  1. Find a publicly available data set at least several hundred gigabytes in size
  2. Pose and answer an interesting question about the data
  3. Detail all steps, assumptions, and conclusions, including code.

If this request is coming from the technically savvy CTO, Dmitri wonders what other unrealistic expectations this position may entail.”

1. Unicorns

Dmitri’s not alone. There’s a growing number of employers desiring the ever elusive ‘data science unicorn’; a first class machine learning guru with software design and hardware expertise, a visualisation genius with exceptional business acumen and a wealth of experience.

Sure, such creatures do exist; but they are few and far between. And they’re a funny breed…

Goutam Chakraborty, Professor of Marketing at Oklahoma State University, offered his insight during a 2014 kdnuggets discussion [2]:

“I feel a data scientist (wanted by a company) is someone with a ‘multiple personality disorder’ who can still function well!”

Abhi Yadav from insidebigdata.com went further [3]:

“…A kind of over-achieving, unapologetically ambitious, narcissistic, nerd-on-A.C.I.D, who eats, drinks, sleeps, and is married to data science.”

That was his summary of an excellent article [4] by Chief A.I. Officer at ZIFF, Ben Taylor, on practical insights for how to separate the unicorns from the rest of the herd:

“[They should] launch you ahead of the competition, give you appropriate intellectual property to buffer your imitators, explode the scope of your potential data problems beyond your original hopes, and break through every damn obstacle put in their path.”

Taylor offers six criteria for identifying a potential unicorn:

  1. 30,000 Hour Expert

“You need to find an individual where data science is not a career, curriculum, or hobby, but an addiction… We are talking 10 years of serious daily hacking.”

  1. Reality Distortion On High

“…has such a high degree of unstoppability that they begin to bleed into the category of reality distortion… They will succeed and break through any barrier in their path. So many data scientists out there are constrained to similar problems, classes, and types they have seen before.”

  1. High On Crazy

“The people who are crazy enough to think they can change the world are the ones who do.” – Steve Jobs.

  1. Previous Intellectual Property Experience

“If you are in a market where you have imitators and you need intellectual property development… I wouldn’t use this as a hard filter, but a bonus.”

  1. Large Social Footprint/Skills

“Most likely after you hire this individual you will need them to build out a data science team of top data scientists in the future. That will be hard for them to do if they don’t know what they are looking for, or if they don’t have strong ties with the data science community.

  1. Volatile Ambition

“A type-E individual doesn’t settle anywhere. If you ask an individual where do you see yourself in 5 years and they respond ‘Not working here’ you have found a real winner.”

That’s unicorns, but what about the rest of us mortals?

Turns out his last point, volatile ambition, is many companies’ undoing when hiring a unicorn.

Jeff Bertolucci from informationweek.com interviewed Shashi Upadhyay, CEO of Lattice, a big data applications provider in 2013 [5] and was told:

“Before you know it, because the supply for this talent group is so far behind demand, they have lost this person [who] has gone to the next company. And all of a sudden, all that good work is lost. And you ask yourself, ‘Why did that happen? And how can I manage against it?'”

To put it bluntly, “Many companies need to stop looking for a unicorn and start building a data science team…It doesn’t make sense for organizations to hire a single data scientist, for a variety of reasons. If your budget can swing it, a data science team is the way to go. If not, data science apps may be the next best thing.”

2. Data Science Teams

Saša Baškarada and Andy Koronios from University of South Australia published a 2016 paper, “Unicorn Data Scientist: The Rarest Of Breeds” [6].

They conducted, “Semi-structured interviews with managers/directors from nine Australian state and federal government agencies with relatively mature data science functions.”

And, “Failed to find evidence of unicorn data scientists.”

Instead they identified, “Six key roles that are considered to be required for an effective data science team:

  1. Domain Expert
  2. Data Engineer
  3. Statistician
  4. Computer Scientist
  5. Communicator
  6. Team Leader”

Each role was associated with primary and secondary skills sets and three Australian University Masters degrees were evaluated for their effectiveness at training each skill set.  The courses mainly focused on data preparation, statistics, computer science and communication however there was also some training in domain expertise and management.

The report concluded, “These universities aim to produce quasi-unicorns. Given the limited opportunity for specialization… [and] without deep expertise in any of the roles  identified in our framework, such graduates may not be able to effectively contribute to multidisciplinary data science teams. Nevertheless, they may prove valuable to smaller agencies and firms with limited resources who may have to rely on such quasi-unicorns.”

It should be noted that some companies are starting to headhunt, hire and train these graduate ‘quasi-unicorns’ via jobs like ‘Junior Data Scientist’. Many, however, find themselves inside an analytics or data science team, where they develop expertise in one or two key areas rather than pursuing a quasi-unicorn approach.

DataIQ research analyst Toni Sekinah had similar sentiments in her 2017 article, “In Search Of The Data Science Unicorn” [7].

She asks, “Is the data science industry asking too much of its new recruits?”

Observing that, “The acute shortage of data science professionals has been well documented for a number of years now and the problem shows no signs of abating… By [companies] using different job titles as synonyms for data scientist, it makes it difficult for prospective data professionals to know what they need to do to join the ranks.”

Sekinah gives five industry synonyms for data scientist:

  1. Data Architect
  2. Data Engineer
  3. Machine Learner
  4. Big Data/Data Science Engineer
  5. Analyst

And describes the key differences between each role.

Just Giving chief analytics officer Mike Bugembe told Sekinah:

“Most people describe a data scientist as all five of those things in one person. It’s a lot to ask… a company that is able to find a person with the skills for the first three roles alone, would have found ‘the unicorn’. ”

And warned:

“You have organisations who don’t really know what they’re looking for, so they’ll hire people that don’t really fit the mould then it goes through that cycle where you’re getting a lot of organisations hiring people and being unsuccessful.”

AltexSoft, a US IT consulting company, outlines the following key members of a data science team [8] :

  1. Chief Analytics Officer (CAO)/Chief Data Officer

“A ‘business translator’, bridges the gap between data science and domain expertise acting both as a visionary and a technical lead.”

  1. Data Analyst

“Ensures that collected data is relevant and exhaustive while also interpreting the analytics results.”

  1. Business Analyst

“Converting business expectations into data analysis. If your core data scientist lacks domain expertise, a business analyst bridges this gulf.”

  1. Data Scientist (not a data science unicorn)

“A person who solves business tasks using machine learning and data mining techniques. If this is too fuzzy, the role can be narrowed down to data preparation and cleaning with further model training and evaluation.”

  1. Data Architect

“[Works with Big Data]… warehouse the data, define database architecture, centralize data, and ensure integrity across different sources.”

  1. Data Engineer

“Implement, test, and maintain infrastructural components that data architects design. Realistically, the role of an engineer and the role of an architect can be combined in one person.”

  1. Application/data visualization engineer

“… only necessary for a specialized data science model. In other cases, software engineers come from IT units to deliver data science results in applications that end-users face.”

Martijn Theuwissen from DataCamp posted a 2015 article in KDnuggets on “The Different Data Science Roles In The Industry” [9].

His list was similar:

  1. Data Scientist
  2. Data Analyst
  3. Data Architect
  4. Data Engineer
  5. Statistician
  6. Database Administrator
  7. Business Analyst
  8. Data & Analytics Manager

As was Maloy Manna’s list at Data Science Central in 2015 [10] :

  1. Data Scientist
  2. Data Engineer/Software Developer
  3. Data Solutions Architect
  4. Data Platform Administrator
  5. Full-stack Developer
  6. Designer
  7. Product Manager
  8. Project Manager

For more in depth discussion and comparison of these roles, see the 2018 article “What Kind of Data Scientist Are You?” [11] By Alex Woodie or the section, “Different Types of Data Scientists” in a 2017 Data Science Central article  [12] by Vincent Granville.

Lastly, there was a great presentation [13] by data scientist Jessica Kirkpatrick  at the recent Kaggle CareerCon 2018 discussing the different data science job titles and their skill sets. She covered:

  1. Data Analyst
  2. Data [Analytics] Engineer
  3. Data Architect/ETL Engineer
  4. Domain Expert
  5. Management

3. Why Not Both?

whyNotBoth4

What if you’ve got a good team and then you recruit a unicorn? How best to use and integrate them for maximum short and long term benefit?

Booz Allen Hamilton, a US management and IT consulting firm, has an excellent publication, “The Field Guide to Data Science” [14] which describes data science in a business framework and gives insight into data science teams and unicorns.

They propose three main data science skill areas, namely

  1. Domain Expertise
  2. Computer Science
  3. Mathematics/Statistics

And define data science unicorns as, “People with expertise across all three of the skill areas.”

Noting that, “Most of the time, you will not be able to find [them]… If you’re ever lucky enough to find one they should be treated carefully:

  1. Encourage them to lead your team, but not manage it. Don’t bog them down with responsibilities of management that could be done by other staff.
  2. Put extra effort into managing their careers and interests within your organization. Build opportunities for promotion… that allow them to focus on mentoring other Data Scientists and progressing the state of the art while also advancing their careers.
  3. Make sure that they have the opportunity to present and spread their ideas in many different forums.”

However the thrust of the paper was not on unicorns but on how to properly choose a data science team and create an environment for them to flourish.

Conclusion

For data scientists-in-training, having an intelligently diverse set of skills and experience is very important, but not to the neglect of actual competence and expertise in at least one field, since there are more job opportunities as a specialist within an analytics or data science team than going solo. A common suggestion for aspiring data scientists is to enter the workforce as an overqualified data analyst and work your way up from there.

For recruiters, becoming familiar with the different data science roles and their uses is key to properly navigating through the hype and confusion surrounding data science. For low-medium level results, a quasi-unicorn jack-of-all trades may work but for medium and large companies, a specialised data science team is a safer, smarter, widely accepted alternative.

References

[1] https://www.oreilly.com/ideas/analyzing-the-analyzers

[2] https://www.kdnuggets.com/2014/06/masters-degree-become-data-scientist.html

[3] https://insidebigdata.com/2018/02/12/data-scientists-wasting-time/

[4] https://www.linkedin.com/pulse/why-your-data-scientist-sucks-benjamin/

[5] https://www.informationweek.com/big-data/big-data-analytics/are-you-recruiting-a-data-scientist-or-unicorn/d/d-id/899843

[6] https://www.emeraldinsight.com/doi/pdfplus/10.1108/PROG-07-2016-0053

[7] https://www.dataiq.co.uk/article/search-data-science-unicorn

[8] https://www.altexsoft.com/blog/datascience/how-to-structure-data-science-team-key-models-and-roles/

[9] https://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html

[10] https://www.datasciencecentral.com/profiles/blogs/what-roles-do-you-need-in-your-data-science-team

[11] https://www.datanami.com/2018/03/15/what-kind-of-data-scientist-are-you/

[12] https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning

[13] “Am I a Good Fit? Identifying Your Best Data Science Job Opportunities” https://www.youtube.com/watch?v=0W0Zrc-m5r8

[14] https://www.boozallen.com/s/insight/publication/field-guide-to-data-science.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s