Opportunities for AI & Machine Learning in Scientific Discovery
--
Looking around, there are many examples where scientific communities are again working to understand how to draw benefits from AI & ML. Today, AI drives cross-disciplinary scientific discovery across many domains like climate change, healthcare, space science and material science. There are increasing examples of AI applied to drug discovery.
Renewed global interest in understanding how AI and ML can improve the process of scientific discovery is taking shape. One of the more visible examples is DeepMind’s AlphaFold project where AI techniques used by a multidisciplinary team accelerated new scientific discoveries. DeepMind developed 3D models of proteins at such a rapid pace and scale that it had a profound impact in solving one of the hardest problems in biology.
What specific challenges are driving the scientific community to explore AI now? Several of these are;
Running scientific experiments can be time consuming. It takes hundreds of hours to prepare, conduct and evaluate results from experiments especially in cases where there is a broad space of variables to explore.
Difficulties handling large quantities of data. The volume of data produced for scientific discovery is vast and many teams have not fully developed the capacity to analyze this amount of data. Some examples of this include astronomy where telescoping images can capture millions of stars and biology-based discovery where microscopes can capture molecular-scale processes and details.
Complex data. In addition to the volumes of data, scientific communities are often dealing with complex types of data. Research teams may need to capture many parameters like color, shape, size, relationships and other details from advanced sensing devices and instruments.
Lack of metadata. Once all this data is captured, it is rarely usable because of the inability to capture metadata. Missing or incomplete metadata about all the experimental conditions (e.g. temperature, pressure, sample composition and orientation) makes it hard to do neural network training.
Cost and time for data acquisition. Production of data with many laboratory or advanced instruments is costly and time consuming. Many data acquisition instruments are specialized and require significant training, setup and maintenance.
Compute power. Lack of computational power to do complex analysis.
Collaborative science. Another challenge for scientific communities is the need to support research collaborations.
For almost all of these challenges there are multiple tooling and technology developments both with the open source community and enterprise data science platform providers. For example,
AI can enhance how powerful scientific tools work, regardless of the scale of the subject — solar system or molecule. AI could be used for automating data processing steps like ingestion, cleaning and joining of large data sets. AI can transform the process of experimentation. AI can help scientists improve measurement strategies, essentially pinpointing what samples to explore and what details to capture. Several research projects apply generative AI/ML techniques to help reduce the problem or solution space. Improvements in this area can help distributed teams of scientists to collaborate on data and experiments.
Automation. One final opportunity where AI can help the scientific process is in automation using robotics. One university lab created a robotic “lab assistant” that automated many mundane laboratory tasks. This assistant even operated during the Covid-19 pandemic, when social distancing prevented in-person lab work.
In Conclusion
These are just a few examples of how the scientific community can benefit from AI /ML. There is a lot that the data science community can do to help facilitate adoption in this relatively nascent but potentially impactful area.