Developers
July 13, 2020

How Data Science and ML Can Help Amid COVID-19

Kaggle, a Data Science community that with the use of Machine Learning and challenges is predicting and helping analyze Covid-19 data.
Source: Unsplash

Today we will talk about how Data Scientists are helping in times of pandemic. There's a big community called Kagle, with over 5 million users that are hosting challenges to help the medical field resolve Covid-19. They believe AI can beat the pandemic.

The Kaggle community has forecasted numbers of Covid-19 deaths, and share their work as open –source. Today we will take a closer look at what Kaggle does, and how it makes an impact.

Kaggle is not a small thing, to prove this, the White House has personally asked Kaggle to host a natural language processing challenge to filter data from publications. The amount of research and hours spent on Covid-19 has increased immensely. In February 2020, 16 scientific papers were published per day. In may it increased to 257 papers per day.

Challenges that help the world

As the pandemic takes place, there are millions of scientists worldwide trying to figure out how to help. The fact that a community can connect millions and provide challenges for them to solve surely speeds things up and creates an order.

The Kaggle community is now trying to solve 9 specific questions that can advance the whole process to a new level. The questions were taken from different National Academies. Science, Engineering, and Medicine.

To answer these questions, the community bases its research on 139,000 papers. The community is using transformer language models to solve this. They are using SciBERT and BioBERT.  

The community is analyzing specific transmission data, such as temperature and humidity. From the first results, preliminary tables are built. This preliminary tables extract the most relevant information, working like a filter. The results are then analyzed by a medical team.

A specialist built a semantic similarity index related to the data. This allows researchers to find data by topics and by the use of keywords. The specialist claims that it is more important the context of the data than the data itself.

Efficient data analysis

A community can have a million papers, but if there is no efficiency to browse within the papers, there is no point in having such amount of data. He has contributed by using NLP (natural language processing) to extract metadata.

Kaggle does the prediction of infections and fatalities by region. To achieve it, it employs a global transmission forecasting competition. By having the numbers they can know if they are correlating with the actual cases. The amount of people in hospitals, patients, infected, and recovered.

It is no easy task to do forecasting work, as if the numbers don't correlate then there's time and money wasted. The community has been successfully helping medical researchers so far.  

There is a winning solution and it fits with the RMSLE (Root Mean Square Log Error). The RSMLE measures the differences between the log of predicted values and actual values. This is done to predict deaths in all states over a sample of 29 days.  

One of the technologies that are helping competitors of the community is Machine Learning. The ML models are helping process data and compare data from short-term forecasts and long-term forecasts.

If there is more data, and more knowledge on how the virus spreads, the community is going to be able to add more features and have highly optimized forecasts. They have already realized that it´s not about how much data they have but how efficiently the manage it.

Some of the factors that are being tested in the models are population size, population density, age distribution, smoking rates, economic indicators, and lockdown dates. With all this data, the predictions are much more accurate.

Another challenge Kaggle has hosted is a dataset curation. This challenge is focused on managing data in the form of datasets. There have been some winning submissions. One of them describes patterns based on demographics. Another one bases its data on predictions of when the pandemic will slow down. The last winner gives information on the cause and way the person got infected.

In Kaggle publishers can do self-service tasks. Some publishers are well known Scientific corporations or science-related. The platform allows all users to upload datasets, see the challenges, and be up to date with the winning submissions.

In conclusion, Kaggle is a community that by the use of Data Science and Machine Learning is helping predict and gather Covid-19 data. They have had many winning models that accurately help medical teams. They count with more than 139,000 scientific papers and they filter the data with the latest technology. The focus of Kaggle is to spend time and resources most efficiently. To achieve this, they use a filtering method. The platform counts with 5 million users and it received challenge requests from many important places such as National academies and the White House.

TagsKaggleMachine LearningAI
Lucas Bonder
Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles

Back
DevelopersJuly 13, 2020
How Data Science and ML Can Help Amid COVID-19
Kaggle, a Data Science community that with the use of Machine Learning and challenges is predicting and helping analyze Covid-19 data.

Today we will talk about how Data Scientists are helping in times of pandemic. There's a big community called Kagle, with over 5 million users that are hosting challenges to help the medical field resolve Covid-19. They believe AI can beat the pandemic.

The Kaggle community has forecasted numbers of Covid-19 deaths, and share their work as open –source. Today we will take a closer look at what Kaggle does, and how it makes an impact.

Kaggle is not a small thing, to prove this, the White House has personally asked Kaggle to host a natural language processing challenge to filter data from publications. The amount of research and hours spent on Covid-19 has increased immensely. In February 2020, 16 scientific papers were published per day. In may it increased to 257 papers per day.

Challenges that help the world

As the pandemic takes place, there are millions of scientists worldwide trying to figure out how to help. The fact that a community can connect millions and provide challenges for them to solve surely speeds things up and creates an order.

The Kaggle community is now trying to solve 9 specific questions that can advance the whole process to a new level. The questions were taken from different National Academies. Science, Engineering, and Medicine.

To answer these questions, the community bases its research on 139,000 papers. The community is using transformer language models to solve this. They are using SciBERT and BioBERT.  

The community is analyzing specific transmission data, such as temperature and humidity. From the first results, preliminary tables are built. This preliminary tables extract the most relevant information, working like a filter. The results are then analyzed by a medical team.

A specialist built a semantic similarity index related to the data. This allows researchers to find data by topics and by the use of keywords. The specialist claims that it is more important the context of the data than the data itself.

Efficient data analysis

A community can have a million papers, but if there is no efficiency to browse within the papers, there is no point in having such amount of data. He has contributed by using NLP (natural language processing) to extract metadata.

Kaggle does the prediction of infections and fatalities by region. To achieve it, it employs a global transmission forecasting competition. By having the numbers they can know if they are correlating with the actual cases. The amount of people in hospitals, patients, infected, and recovered.

It is no easy task to do forecasting work, as if the numbers don't correlate then there's time and money wasted. The community has been successfully helping medical researchers so far.  

There is a winning solution and it fits with the RMSLE (Root Mean Square Log Error). The RSMLE measures the differences between the log of predicted values and actual values. This is done to predict deaths in all states over a sample of 29 days.  

One of the technologies that are helping competitors of the community is Machine Learning. The ML models are helping process data and compare data from short-term forecasts and long-term forecasts.

If there is more data, and more knowledge on how the virus spreads, the community is going to be able to add more features and have highly optimized forecasts. They have already realized that it´s not about how much data they have but how efficiently the manage it.

Some of the factors that are being tested in the models are population size, population density, age distribution, smoking rates, economic indicators, and lockdown dates. With all this data, the predictions are much more accurate.

Another challenge Kaggle has hosted is a dataset curation. This challenge is focused on managing data in the form of datasets. There have been some winning submissions. One of them describes patterns based on demographics. Another one bases its data on predictions of when the pandemic will slow down. The last winner gives information on the cause and way the person got infected.

In Kaggle publishers can do self-service tasks. Some publishers are well known Scientific corporations or science-related. The platform allows all users to upload datasets, see the challenges, and be up to date with the winning submissions.

In conclusion, Kaggle is a community that by the use of Data Science and Machine Learning is helping predict and gather Covid-19 data. They have had many winning models that accurately help medical teams. They count with more than 139,000 scientific papers and they filter the data with the latest technology. The focus of Kaggle is to spend time and resources most efficiently. To achieve this, they use a filtering method. The platform counts with 5 million users and it received challenge requests from many important places such as National academies and the White House.

Kaggle
Machine Learning
AI
About the author
Lucas Bonder -Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles