“We now have a clearer understanding of how to use LUIS and Cognitive Services” -- Ewan Nicolson | Lead Data Scientist, the BBC
Imagine that students have their own tutor-like study companion at hand, and they can raise any questions about what they are learning and interact with study companion anytime. In fact, empowering students with intelligent tools that retrieve educational content in response to natural human language is trending.
The Natural Language Processing (NLP) focuses on developing efficient algorithms to process text and to make their information accessible to computer applications. The goal is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually people can address computers as though they were addressing another person. This is especially true and practical in modern education given the overwhelming volume and various formats of educational content.
The British Broadcasting Corporation (BBC) is the national broadcaster of the United Kingdom. Headquartered in London, it is the world's oldest national broadcaster, and the largest broadcaster in the world by number of employees. The BBC is impartial and independent, and create distinctive, world-class programs and content which inform, educate, and entertain millions of people in the UK and around the world.
April 2020, the BBC came to Microsoft AI & IoT Insider Lab (Munich), appealing to create a knowledge service infrastructure that allows them to bring all their content, including video, text, audio, etc., together into one place and to navigate the content in new and novel ways. Knowledge services would also allow people to ask questions against the content and get direct responses (rather than links to content assets).
To start on this journey, the BBC had the idea of implementing a companion app that can help 13–16-year-old prepare for their math exams (GCSEs in the UK) by allowing them to ask questions (e.g. “what is an algorithm?”) and receive answers that depend on their knowledge of the subject.
The BBC has a huge content database, which is challenging to aggregate such educational content and to build a system that allows students to prepare for math exams by asking questions using natural language and getting more relevant responses.
To bring the BBC’s idea into reality, the Lab took 6 weeks to get fully prepared to shape the workstream for the project. Upon meticulous inspection, we made some important decisions:
- To not use Azure Cognitive Search as it does not satisfy the requirements for the project;
- To not implement a Bot Framework as the BBC wants to gain more experience with other core components of the project;
- To implement a minimal query-response system that take queries in human language and returns content to the user.
The Lab decided to deploy Azure LUIS and Cosmos DB, as conversational language understanding is the next generation of language understanding. Our engineer team constructed a database to store the BBC’s content aggregated across various mediums and built a dynamic knowledge mining solution that was designed to receive and analyze user’s questions as well as interpret their intent.
To facilitate the whole project, our engineer team worked in two teams, one focused on LUIS while the other one on implementing the Cosmos DB Graph. Regular meetings were scheduled to ensure that the interface and definitions are aligned on both ends. With clear interfaces and detailed documentation for LUIS and the Graph, a Python Azure Function was implemented to analyze the LUIS JSON response and used its content to query the Graph.
Microsoft technologies used:
Azure Language Understanding (LUIS)
Why Azure Language Understanding (LUIS)?
Language studio simplifies creation, labeling, and deployment for your custom models
No machine-learning experience required
Configurable to return the best response from multiple language applications
Enterprise-grade security and privacy applied to both your data and trained models
Azure Language Understanding (LUIS) is a Cognitive Service for Language feature that understands natural language to interpret user goals and extracts key information from conversational phrases. Create multilingual, customizable intent classification and entity extraction models for your domain-specific keywords or phrases across 96 languages. You only need to train in one natural language and use them in multiple languages without retraining. It comes with state-of-the-art language models that understand the utterance's meaning and capture word variations, synonyms, and misspellings while being multilingual. It also automatically orchestrates bots powered by conversational language understanding.
Azure Cosmos DB
Why Azure Cosmos DB?
Guaranteed speed at any scale with instant and limitless elasticity, seamless burst capacity, fast reads, and multi-region writes anywhere in the world
Fast, flexible app development with SDKs for popular languages plus a native Core (SQL) API, APIs for MongoDB, Cassandra, and Gremlin, and free dev/test options
Ready for mission-critical applications with 99.999-percent availability, continuous backup with point-in-time restore, enterprise-level security that guarantees business continuity, and no-ETL analytics over real-time data
Fully managed and cost-effective serverless database that responds to application needs by instantly and automatically scaling up and down, and offers consumption-based pricing options
Azure Cosmos DB is a fully managed, serverless NoSQL database for high-performance applications of any size or scale. Get guaranteed single-digit millisecond performance and 99.999-percent availability, backed by SLAs, automatic and instant scalability, enterprise-grade security, and open-source APIs for NoSQL databases including MongoDB and Cassandra. Enjoy fast writes and reads anywhere in the world with multi-region writes and data replication.
Microsoft AI & IoT Lab helped the BBC successfully create a prototype that retrieved unique educational math-related content in response to user’s natural human language request. In addition, the prototype can analyze user’s intent and deliver more relevant and directly embed responses, rather than linking to content.
Before project delivery, we had testing sessions on the 'companion'. It was able to retrieve webpages and mathematical definitions for human requests. For example, when it was asked "What is a hypotenuse?", the 'companion' come back with a book definition of the term hypotenuse. Nonetheless, we also worked with the BBC to stretch goals such as returning a different page based on the exam board of synthetically created users, which added value to the ‘companion’.