Students of EXPLORE Data Science Academy (EDSA), Africa’s largest institution devoted to data science, are using the current COVID-19 lockdown to build a comprehensive database that has the capacity to store all pandemic related data in South Africa.
“Our intention in building the database is to ensure that it is open source and can be maintained by the broader community,” says Academy cofounder Shaun Dippnall.
“The aim is to centralise data coming from a host of available resources, all of which are useful in making beneficial analyses. These resources include data from Github, repositories such as the one from the University of Pretoria, the NICD and global data sources,” he adds.
All data will be fed into a dashboard and the resulting statistics and COVID-19 information will be made available to all, including those who want to use the data for their own projects.
The database will comprise a large central repository of publicly available data and will be released in stages.
Stage 1 of the database will contain:
- daily testing numbers
- case numbers by country/province (confirmed, deaths, active and recovered)
- government countermeasure data ( measure, date of implementation, duration of implementation)
Further features are that it will be:
- professionally structured, adhering to industry best practices, resulting in high data quality
- automated and near real-time updating
- delivered to the community through APIs (Application Programming Interface)
- The first set of APIs (Stage 1) is aimed at being available online in the next week and a half
Stage 2 will include:
- patient-level data, (time to symptom onset/death, travel history, age, gender, etc )
- hospital-level data ( ICU beds, critical care, patients on ventilators)
- flight data (for modelling the seed risk of the pandemic in each country base off flight volumes/schedules)
Dippnall stresses the development of the database was borne out of the regulatory lockdown.
“Data science is about solving real-world problems and our learners, who now work remotely, wanted to tackle a project that could contribute positively to COVID-19 crisis,” he says.
“We may not have the resources to get into the level of modelling that Governments or universities are achieving. However, by having a solid foundation of data that others can work off, we can leverage the skills of students and the community and encourage others to add to our work,” Dippnall says.
Collaboration is key. Inputs and partnerships with both the public and private sectors will be crucial in centralising the database. Speed is essential and the plan is to have the framework available within the next two weeks.
What makes EDSA's database different from others, is that it will allow its data scientists to manipulate the data using, for example, artificial intelligence and machine learning.
“These powerful techniques will add a new dimension to the practical use of this data by discovering relationships and trends that normal analyses may miss,” Dippnall says.
Judging this as an internal learning experience, the EDSA is involving its faculty, alumni and current learners to build this database, purely online because of the lockdown regulations. Among participants are matriculants with just a few months’ work experience.
Motivation levels are high.
“Everybody involved in the project is enthralled by the prospect of building something that can make a tangible impact towards understanding the pandemic and helping South Africa fight its spread,” Dippnall said.
“We also believe, once complete, that it will strengthen the country’s broader data science capability,” he concludes.
The EXPLORE Data Science Academy is the largest data science academy in Africa. It is led by top South African academics and practitioners with decades of experience in teaching and real-world problem-solving.