Big Data V – Learning The Lingo

50% of any job is just being able to talk the talk. What does that mean for data science?

Firstly, recall that data science is an umbrella term housing several different disciplines:

  1. Statistics
  2. Computer Science
  3. Data Storage
  4. Machine Learning/Artificial Intelligence
  5. Business Analytics/Business Intelligence

1. Data Science Key Terms

With the recent hype around data science, statistics and machine learning, they are often lumped together these days. For in-depth glossaries on data science and machine learning, check out:

  1. [1]
  2. Google’s machine learning glossary (with pictures) [2]
  3. Analytics Vidhya’s data science glossary (with pictures) [3]
  4. KDnuggets, “277 Data Science Key Terms, Explained” sorted by topics [4]
    • 20 Big Data Key Terms, Explained
    • 12 Machine Learning Key Terms, Explained
    • 10 Clustering Key Terms, Explained
    • 14 Deep Learning Key Terms, Explained
    • 16 Database Key Terms, Explained
    • 15 Descriptive Statistics Key Terms, Explained
    • 11 Prescriptive Statistics Key Terms, Explained
    • 20 Cloud Computing Key Terms, Explained
    • 16 Hadoop Key Terms, Explained
    • 13 Apache Spark Key Terms, Explained
    • 12 Internet of Things (IoT) Key Terms, Explained
    • 18 Natural Language Processing (NLP) Key Terms, Explained

Or if you’d rather a quick overview of the most important terms instead, check out:

  1. Data Science Central’s, “Key Vocabulary Everyone Should Understand” [5]
  2. Dataconomy’s, “25 Big Data Terms Everyone Should Know” [6]
  3. Business Broadway’s, “25 Data Science Terms Every Customer Professional Needs to Know” [7]

2. Business Intelligence Key Terms

Data scientists aren’t just good with numbers and coding, they know how to capitalise on data to save costs, discover new opportunities and optimise current procedures. This requires some business acumen and domain knowledge as well as curiosity, creativity and communication skills.

In short, data science needs business intelligence.

  1. Business intelligence company Phocus has an excellent, “A to Z of Business Intelligence (glossary)” [8]
  2. Business intelligence resource portal has a similar glossary [9]
  3. So does BA Times [10]

Learning the lingo alone wont cut it for business intelligence and management – even more so than the computational side of data science, where more and more sophisticated packages allow programmers to apply a myriad of machine learning algorithms to their data set just by knowing their names and writing a few lines of code. In contrast, management and business skills require practice and experience yet are often overlooked in data science degree programs.

Data scientist and author Randy Bartlett told content marketer Daniel Levine in a 2015 interview  [11] that he, “Criticizes university programs for often leaving these skills out all together: ‘There’s no real training about how to talk to clients, how to organize teams, or how to lead an analytics group.’ ”

There were similar sentiments in a 2016 research paper, “Unicorn Data Scientist: The Rarest Of Breeds” by Saša Baškarada and Andy Koronios [12].  They concluded:

“These universities aim to produce quasi-unicorns. Given the limited opportunity for specialization… [and] without deep expertise in any of the roles  identified in our framework, such graduates may not be able to effectively contribute to multidisciplinary data science teams.”

The same is true for most MOOCs. Springboard [13] is one notable exception, where business knowledge is one of the three main skill sets the course focuses on, as well as statistics and programming.

“They need to understand business fundamentals in order to communicate their findings and drive other teams to action based on their insights.” [13]

Springboard also offers a 70 page “Guide To Data Science Jobs” which gives, “An outline of what business knowledge you need to be a successful data scientist” [13] via “70 pages of original research, interviews with real data scientists, and checklists” [14].

Conclusion: Take the business intelligence side of data science seriously and try and find a proper course or subject that will cover it in detail.

3. Computer Science and IT

If you come from a maths/stats or a business analyst background, chances are your IT knowledge isn’t all it could be for a career in data science.

You could start with the KDnuggets key terms articles on databases [15], cloud computing [16]  and Internet of Things (IoT) [17]. Alternatively, here are some relevant and readable IT glossaries suitable for business intelligence and data science:

  1. IT glossary (long) [18]
  2. Quickbase computer science glossary (short) [19]
  3. University of Idaho computer science glossary (short) [20]
  4. A great article by Kelli Smith , “99 Terms You Need To Know When You’re New To Tech”  (long) [21]

In particular, data science projects often lead to the development of a new software tool or app, either for internal use within the business or as an external platform to commercialise the analytics.

Here are some good resources for software development lingo:

  1. TechTarget “Software Development Glossary” (short) [22]
  2. SolutionsIQ, “Agile Glossary” (long) [23]
  3., “Scrum Developer Glossary” (short) [24]

For an overview and comparison of the four main methodologies (Agile, Kanban, Scrum, Waterfall) used in software development, check out [25][26] and for a more general overview of these and other software development methodologies see [27].

Another very important IT concept for data science is algorithm/time complexity.  This is a measure (“big O notation”) of how long an algorithm will take and how effectively it will scale for large data sets, hence it is of critical importance for  efficient big data analytics.

There is an excellent visual introduction to algorithm complexity at [28] :


Here are two good articles explaining big O notation for algorithm complexity [29][30].  For more information, check out the Wikipedia article on “Time Complexity” [31].

4. Job Interview Qs

Now you’ve got the data science lingo down pat, how do you think you’d go in a job interview?


Andrew Fogg, CDO of, wrote a very popular article in 2016 entitled, “20 Questions To Detect Fake Data Scientists” [32] which aimed to help employers separate ‘fake’ data scientists from the real deal.

For those of you who just want the answers to his 20 questions, check out the KDnuggets article [33].

Data Science Central is compiling a comprehensive list of data science interview questions [34], currently up to 91. There are sample answers and resources for most of the questions as well.

Springboard analysed hundreds of data science interviews, found four main lines of questioning and six key strategies to properly prepare for an interview [35]. They also produced a 90 page guide [36] covering all the detail.

Lastly the 2018 Kaggle CareerCon had two 1 hour panels discussing data science job interviews:

  1. “Overview of the Data Science Interview Process” with Michelle Casbon, Senior Engineer at Google, and Steve Greenberg, Engineering Manger at Google. [37]
  2. “Live Breakdown of Common Data Science Interview Questions” with Kaggle Data Scientists Walter Reade and Rachael Tatman. [38]








































One Comment Add yours

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s