Identify climate change documents on websites using Machine Learning and Big Data


Project scope
Categories
Data analysis Information technologySkills
machine learning data analytics, machine learning, deep learning, ai nlp big data analyticsThe Goal
We are in the design and planning phase of a new climate change impact planning software product (related to "UN Sustainable Development Goal 13: Take urgent action to combat climate change and its impacts"). The product will recommend a set of actions that ordinary citizens can take to avoid, mitigate, adapt, or rebuild from climate change disasters. Although there is a wealth of guidance information out there in PDFs or on websites, rather than go through those one by one, we have created a machine learning Climate Change Action Extractor that can be used to automatically extract that kind of guidance from a document. However, we have to manually find relevant PDF documents via Google Search and trial-and-error. And, we cannot quickly find content that is on a web page and not in a PDF document. We wish to determine if you can use Big Data and AI techniques to identify web content that either contains a link to a relevant PDF file or contains the actual text that we can process (for instance a web page that says "10 Steps to avoid climate change disasters").
Your Contribution
We would like you to explore Big Data and ML techniques on the Common Crawl open data set (https://registry.opendata.aws/commoncrawl/) and identify which pages have either relevant "climate change action" content, or contain a link to a relevant PDF. This is exploratory and so we anticipate a few iterations where you produce some outputs, we examine those, and see how to structure the results into our proposed action structure. At the start of the project we will provide a set of example PDF files, some suggested sentence structures/keyword examples, a link to a prototype website where you can upload and test the content against our climate change Action Extractor, and anything else you need.
About the company
Deploy Solutions connects Earth observation data with the needs of organisations and businesses by providing innovative Space Apps — software applications which use data from space.
We are streamlining the software development process using a software factory —a standardised approach with unique outcomes. This allows us to reduce development costs, risks, and the total cost of ownership, empowering organisations to take advantage of space data.
Find out how Deploy Solutions can help your organisation benefit from space data using a software factory approach.