Directory Image
This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Privacy Policy.

The history of data science

Author: Raj Tattapure
by Raj Tattapure
Posted: Aug 24, 2019
Early days

Data science has been formally introduced evolved to become the sexiest job in the world. There were several mathematicians, scientist and international organizations who played the key role directly or indirectly. Interestingly those contributions were not related data science always, however, they have defined few building blocks which was very important for data science discipline.

International Federation for Information Processing established in 1960 under UNESCO who set some key guideline and concept on data, how it should be processed and what standard should maintain. However it was not data science at all, but it defines a systematic way of data processing and presenting. Before those guidelines, there were data presentation or processing was limited to individual domain and used to be very difficult to interpret on the other domain. They first introduced a term Datalogy in 1968 to formalize this data analysis practice.

John W. Tukey who was an American mathematician and famous for the development of FFT algorithm and box plot. He writes the book ‘The Future of Data Analysis’ in 1962. He first brought the idea of the relationship between the statistic and analysis or more preciously data analysis. Earlier data analysis used to consider as "applied" disciple of Statistics; which makes the scope very limited and scoped within the business area. In this book he writes:

"For a long time, I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt……………

I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier……………….

A large part of data analysis are inferential in the sample-to-population sense but these are only parts, not the whole… Data analysis is a larger and more varied field than inference, or incisive procedures, or allocation."

This book has been cited several times in many research paper as the formal introduction of data analysis outside Statistical disciple. At later stage researchers come up with several hypotheses to derived another dimension of the same data resulting in a better decision making.

One important thing to notice Tukey was a Mathematician, not a Statistician; who blends statistical analysis and mathematics together to make data analysis more "scientific" and acceptable.

In the year 1977 Tukey published his another major work: Exploratory Data Analysis. He bought another major idea on how "Explanatory" and "Confirmatory" data analysis should be done and he stressed upon "Side By Side" approach. That means we need the new or revised hypothesis to do this analysis side by side. But why this idea was so important? As mentioned before contemporary data analysis used to be a statistical disciple and limited to the specific domain. For example, a particular hypothesis may be useful to find out a type of health issues of the population but the same hypothesis might not be applicable to another area like identifying a quality of a particular corp.

Another important name that came after Tukey is Peter Naur. In 1974 he published "Concise Survey of Computer Methods" in Sweden and The United States. The book is a collection of modern data processing method from various domain used worldwide in the verity of applications. Other important fundamental aspects of the book were data standard or guidelines defined by International Federation for Information Processing. Which makes those ideas more acceptable and interpret able with wide ranges of audiences. In fact, those ideas detailed in the book comes with a short survey or example data processing. In this book, he used the term "data science" several times. At the later stage, Naur produces the new or formal definition of data science. "The science that dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences." From this time the term ‘Data science’ is used very frequently. But it really took a long time to catch on. After his paper data science is pushed towards more and more.

In 1977, The International Association for Statistical Computing (IASC) was originated. It is included as the sector of International Statistical Institute (ISI). The main aim of IASC is to connect and exchange statistical computing worldwide between statistician, computer professional, educational institute, researchers and government on various subject or domain. They start publishing a monthly journal named "Computational Statistics & Data Analysis". This was a tremendous move as it helps with knowledge sharing and new ideas on computational statistics and data analysis. If you notice by this time data analysis become and has been accepted as an important disciple.

In 1989, first ‘Knowledge Discovery Database Workshop’ has been organized by Gregory Piatetsky- Shapiro, also known as KDD-89. KDD-89 discussed these areas,

  • Expert Database Systems
  • Scientific Discovery
  • Fuzzy Rules
  • Using Domain Knowledge
  • Learning from Relational (Structured) Data
  • Dealing with Text and other Complex Data
  • Discovery Tools
  • Better Presentation Methods
  • Integrated Systems
  • Privacy

KDD-89 has been cited is several research paper at a later stage and has been considered a pioneer for a formal improvement of data representation. Following this session scientist and researchers start exploring these options for data representation and data storage. Which in turn helps DBMS for better storage, retrieval, and presentation of data. In fact, today’s "data science" or data analysis expands towards various areas starting from health, retail, manufacturing, service and govt organization. However, this wide acceptability becomes a success with the improvement of DBMS a general solution for data management rather a tool. KDD-89 became ACM SIGKDD Annual conference on Knowledge Discovery and Data mining.

Next couple of year’s data science gets another dimension with the improvement of database management system or DBMS or simply "database". DBMS changed the way we used to store view or review data. Following KDD-89 researches come up with the easy representation of DBMS technology allowed to store and share data more easily and effectively.

By this time we have realized we need more computational power in order to continue with those data analysis. As researchers have arrived to improve computer processing power data analysis become easy with more and more "deep dive" and getting improved day by day.

Both of these has some significant contribution towards today’s shape of "data science" indirectly. Yes still this term "data science" has not been accepted formally but everyone one is discussing on data analysis and management.

About the Author

Kausal Vikash, as the name suggests, believe in Skill (Kausal) UP-Gradation (Vikash) of every individual in this community in order to stay relevant and competitive in this environment.

Rate this Article
Leave a Comment
Author Thumbnail
I Agree:
Comment 
Pictures
Author: Raj Tattapure

Raj Tattapure

Member since: Jul 24, 2019
Published articles: 1

Related Articles