Modern instruments and equipment allow us to record and capture so much more information than ever before. Computers are faster, hard drives are bigger and the processing power of today would be considered alien not even 50 years ago. However, with all this new data that can be harvested, it needs to be analysed in a way that enables users such as scientists and business analysts to manage and use it correctly.
Enter data science, the modern way to collect and analyse data to achieve efficient and reliable results. Its use can be varied, such as discovering how likely an individual is to buy a certain product based on their age, or to predict the most likely destination a newly discovered species of bird will emigrate to based on similar species and its food source.
As shown in the diagram opposite, data science is a combination of computer science, mathematics, statistics and knowledge of a business and its domain. For a more detailed description, the purpose of data science is to assemble mathematical models and use them to describe current outcomes and/or future results. By combining mathematical facts and computing, data can be analysed and studied in ways like never before.
Often crossed over with statistics, data science is a method that makes use of computer algorithms to discover a trend or anomaly in collected data, analyse it, provide solutions to problems and then identify similar patterns in the future. Data science is also often described as data analytics or data-driven science. In business for example, data science can be used to determine key business decisions for company growth or expansion in a certain field so it is important that the data collected is specific to the requirement and is exactly what is needed.
The system does this by studying the data that has been entered and generate its own algorithms to come up with possible solutions. For example, take the following sequence:
1 2 4 8 16 ?
Of course, you would say that the next number in the sequence is 32 and you would be correct. However, just telling a computer the answer may help in this case but what if you were not able to solve the next sequence of numbers
1 3 9 27 ?
The machine wouldn’t be able to help as it hasn’t been conditioned to solve problems. So, machine learning is applied to teach the computer to behave in a way that it must solve the problem itself. This is beneficial as it doesn’t need to follow a fixed set of instructions to compute a solution as it generates solutions by processes that it has developed and learnt on its own. The computer can then use this newly learned information and be able to apply it to other to problems. Machine learning is like computational statistics, which also uses computer systems to solve problems and data mining which focuses on summarising the data and finding similar characteristics between them, known as exploratory data analysis (EDA). Analysts benefit from machine learning properties as they can allow results from hypotheses to be repeated. The more they are repeated, the more reliable the results.
The below steps summarise the way a data scientist will generate solutions using data science techniques:
Hypotheses, questions and parameters need to be set to define what data is to be gathered. They need to be concise and specific to the problem.
This is done by presenting it in a clear and structured way such as in a table or chart. The data is then checked to find any anomalies, inconsistencies or data that just isn’t needed for a specific test. The inconsistencies are either adjusted to display what they should or removed completely. Once this have been completed, the data can then be used for analysis.
Here, techniques will be chosen to determine patterns or relationships between the data you have collected. Examples of these techniques include statistical functions, data visualisation graphs or tables. It may be the case that more data is required after the results are obtained, in which case, the whole process needs to be repeated; more questions and parameters are set, the data is then prepared and analysed.
The data that has been analysed should now be able to answer the question that was the focus of the experiment. If the data supports the initial prediction, a conclusion can be made that what has been researched and tested is true. If not, then the hypothesis can be deemed false. The results can then be used to make the key strategic decisions required.
The importance of data science and data analysis is unquestionable. Technology will continue to advance and computers will get smarter so it is vital that businesses, researchers and analysts alike are keeping up with the modern evolution of way that data is used.
Science as we know it is changing because of the impact of computers. Traditional methods of science (experimental, theoretical and computational) are all being affected. Data science is emerging as the next core method in scientific research and study.
Data science should be integral to all current and future projects. Decisions should be made with data driven evidence, problems should be solved with data driven solutions and opportunities should be sought out by data driven information.