26 Most Important Big Data Terms You Should Know About

data science glossary: most important big data terms and terminology one should know about
Reading Time: 6 mins

Let’s demystify the most used terms and jargons in the big data space.

How would you define ‘data science’? How about big data, AI, or data culture? These are just a few of the many jargons that are commonly tossed around in data speak. You might wonder what people really mean when they use one of these cool buzz words.

Just as data went mainstream, so has a dense vocabulary of jargon. It’s a real challenge to understand and agree on the definitions of these data terminologies. Perhaps, this is a tougher roadblock than even getting business value from data!

This article will demystify the 26 most frequently used terms in the data space.

Find out where does your organization stand in the levels of data science maturity
Take Free Data Maturity Assessment

We’ve grouped these phrases into four broad categories: Data Engineering, Business Intelligence, Data Science, and Decision Intelligence. These are in the same logical sequence that an organization typically follows to get business value from data.

A Glossary of Big Data Terms

Data Engineering Related Big Data Terms

Data Engineering is a discipline that focuses on aspects such as the identification of data sources, collection, curation, and storage of the data. This is a precursor to all other disciplines that help get value from data.

Data Governance is a framework and a set of practices to help all stakeholders across an organization identify and meet their information needs. (Ref: Data Governance Institute)

Data Warehouse is a central repository of information that can be used to analyze and make more informed decisions. (Ref: Amazon)

Data Fabric is an architecture and set of data services that provide consistent capabilities, integrating data management across the cloud and on-premises to accelerate digital transformation. Gartner says that data fabric enables friction-less access and sharing of data in a distributed data environment. (Ref: NetApp, Gartner

Business Intelligence Related Big Data Terms

Business Intelligence is the discipline of analyzing and transforming data to extract valuable business insights to enable decision-making. Today, BI is typically used to refer to descriptive analysis and reporting.

Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. This term was coined around 1990 and quickly became a buzzword. (Ref: Wikipedia)

MIS Reporting (Management Information Systems) is the process of providing essential information to run the day-to-day business activities and monitor an organization’s progress. This usually refers to descriptive and operational reporting.

Data Science Related Terminology

Data Science is the discipline of applying advanced analytics techniques to extract valuable information from data for business decision-making and strategic planning. It brings together fields such as data mining, statistics, mathematics, machine learning, data visualization, and software programming.

Artificial Intelligence (AI) refers to the ability of a machine to mimic the capabilities of the human mind, such as learning from examples and experience, recognizing objects, understanding and responding to language, making decisions, and solving problems. (Ref: IBM

Video recognition: Steelhead fish

Machine learning is a subset of the artificial intelligence (AI) discipline that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. (Ref: Expert AI)

Deep learning is a technique that falls into the machine learning discipline. It is based on artificial neural networks that are inspired by the structure of the human brain. It learns from vast amounts of data and is particularly good at finding patterns from unstructured data such as text and images.

Read: A simple english explanation of GANs or Dueling neural-nets

Augmented intelligence refers to a human-centered partnership that brings people and AI together to enhance cognitive performance, including learning, decision making, and new experiences. (Ref: Gartner, Forbes)

Collective intelligence refers to a group’s combined capability to perform various tasks and solve diverse problems. Businesses can enable this by collaboration, collective efforts, and competition of many individuals in consensus decision-making. (Ref: Wikipedia)

Read: Data science: 3 scenarios CIOs could see in 2030

Descriptive Analytics is the examination of data or content to answer the question “What happened?” It is typically characterized by traditional business intelligence (BI) and data visualization. (Ref: Gartner)

Diagnostic Analytics is a form of advanced analytics that examines data to answer the question “Why did it happen?” You can achieve it with the help of techniques such as data mining, statistics, and machine learning. (Ref: Gartner)

Predictive Analytics is a form of advanced analytics that examines data to answer the question “What is likely to happen?” You can achieve it with the help of techniques such as machine learning and Artificial Intelligence (AI). (Ref: Gartner)

Information Design is the practice of presenting information in a way that fosters an efficient and effective understanding of the information. (Ref: Wikipedia)

Data Visualization falls into the discipline of information design. It refers to the graphical representation of information using visual elements such as charts, graphs, and maps. The intent is to enable decision-making with the appropriate representation of insights.

Read: 72 Types of visualizations for Data Storytelling

Data Consumption refers to the presentation of insights in a form that aids understanding and action. It is often achieved by adopting analytics techniques to identify insights and data visualization techniques to present the insights.

Data Storytelling is the practice of building a narrative around data and its accompanying visualizations to help convey context and the meaning of data in a powerful and compelling fashion. (Ref: TDWI)

Read: How to create data stories in 4 easy steps

Decision Intelligence Related Big Data Terms

Decision Intelligence is the discipline of turning information into organizational decisions at scale. Organizations and individuals can achieve it by applying data science within the context of a business problem by bringing together managerial science and social science disciplines. (Ref: Enterprisers Project)

Management Science is the broad interdisciplinary study of problem-solving and decision-making in human organizations. It has strong linkages to fields such as management, economics, business, and management consulting. (Ref: Wikipedia)

Social Science is the branch of science devoted to the study of societies and the relationships among individuals within those societies. This field is gaining relevance in the data space since it helps gain insights into people’s behavioral aspects. (Ref: Wikipedia)

Decision Support Systems (DSS) is an information system that supports organizational decision-making activities. This field saw a lot of research in the 1970s, and it saw rapid growth over the next few decades. (Ref: Wikipedia)

Data literacy is the ability to read, write and communicate data in context. It includes an understanding of data sources, analytical techniques, business applications, and resulting value. (Ref: Gartner)

Data Culture refers to values, behavior, and norms shared by most individuals within an organization regarding data-related issues. Broadly, it refers to the ability of an organization to use data for informed decision-making. (Ref: Wikipedia)

How to promote a culture of data-driven decisions? | Data Science Whiteboards S01 E12

Mastering the Semantic Knowledge Tree

As with any of the big data terms, remember that the precise definition is less important than understanding what it broadly means or how you can interpret it from an application context. 

As you can see, several disciplines need to come together to get value from data. For example, it requires technical fields such as machine learning, organizational disciplines such as managerial science, and arts disciplines such as social science.

How should one go about mastering such disparate areas of expertise?

We can take inspiration from Elon Musk, who says that it is important to view knowledge as a semantic tree. Make sure you understand the fundamental principles, that is metaphorically like the tree’s trunk. Then you master the different areas of expertise, which you can equate to the branches. 

Finally, you must get to the next-level concepts and applications, which are like the leaves. Otherwise, there is nothing for the leaves to hang on to.

Excel in Data Science Journey with Gramener

At Gramener, we dedicatedly work with top executives and help them transform into a data-driven organizations. Over the decade, we worked with top organizations and helped them grow the ladder of data science maturity.

The aim of our advisory consulting, which includes executive education, and a variety of data advisory workshops is to lay a successful data science roadmap by assessing the level of data maturity of the organization. Take a free assessment and find out where does your organization stand in the levels of data maturity.

gramener's free data maturity assessment survey
  • Save

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link
Powered by Social Snap