Nearly 98% Automated Accurate Match Rate!
Microsoft Azure Accelerates Global Pharmaceutical Data Processing!

"Leveraging the AI capabilities of Microsoft Cloud Services and the commercialization of prototype validation by Microsoft AI and IoT Lab, our collaboration with the Lab expanded the depth of our pharmaceutical data mining." —Pharmcube

The high cost and time-consuming nature of drug development and the inefficiency of R&D in biopharmaceutical companies have long been deep-rooted pain points in the pharmaceutical industry. However, with the convergence of AI and pharmaceutical industry in recent years, AI has been found to help pharmaceutical companies optimize their development processes and significantly reduce R&D costs. According to a LEK survey of pharmaceutical industry executives, AI will become standard in the operating models of pharmaceutical companies within the next five to ten years. So far, the world's top 10 leading pharmaceutical companies have established partnerships with AI companies, eager to prioritize the market opportunities presented by the development of AI technology.

Pharmacube is a leading one-stop pharmaceutical data service platform in China, providing professional data, media information and consulting services for pharmaceutical companies and investment institutions. Pharmacube collects, cleans, transforms, and integrates massive global pharmaceutical-related data in real time, while combining AI technologies such as natural language processing (NLP) and machine learning, gradually building an agile big data system from data monitoring, data mining to data application.

During Parmcube's residency at Microsoft's AI and IoT lab, we leveraged Azure Form Recognizer and Azure Text analytics for Health to help Pharmacube achieve ultra-accurate standardized storage of sales data extraction for drug fiscal year report (SSS) and ultra-high matching World Clinical Trial Indication Recognition & Dictionary Matching (WORD), further allowed Pharmacube to expand the depth of its pharmaceutical data mining.

How the Lab Enabled Pharmcube

Pain points

There are two main technical difficulties in the collaboration project between Pharmacube and the Lab: standardized storage of sales data extraction for drug fiscal year report (SSS) and World Clinical Trial Indication Recognition & Dictionary Matching.  The SSS is about extracting sales data tables in Chinese and English from each listed pharmaceutical company's financial reports and storing them in a predefined tabular format in a standardized manner. The WORD project is about identifying the indications that are synonymous or hyponymy with the entries in the indications dictionary under clinical trial entry criteria for rapid matching, allowing for quick subsequent data cleaning and analytical processing.


1. In the SSS project, our engineers used Azure Form Recognizer to perform OCR text recognition on PDF documents and extract full-text and form data. The accuracy rate of the interface test reached over 90%, which largely reduced the workload of the Pharmacube's data team and effectively saved time and labor costs spent on the project.

Why Azure Form Recognizer?

Simple text extraction

Customized results

Flexible deployment


Azure Form Recognizer is a cloud-based Azure Applied AI Service that uses machine-learning models to extract key-value pairs, text, and tables from your documents. Form Recognizer analyzes your forms and documents, extracts text and data, maps field relationships as key-value pairs, and returns a structured JSON output. You quickly get accurate results that are tailored to your specific content without excessive manual intervention or extensive data science expertise. Use Form Recognizer to automate your data processing in applications and workflows, enhance data-driven strategies, and enrich document search capabilities.

2. In the WORD project, our engineers successfully automated accurate matching of this functionality for Pharmacube based on Azure Text Analytics for Health service. 97.7% match rate was recorded in our data model tests.

(Azure Text Analytics for Health)

Why Azure Text Analytics for Health?

Identify the main points in unstructured text

Identify and categorize important concepts

Automate your workflow

Azure Text Analytics for Health is one of the features offered by Azure Cognitive Service for Language, a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. Azure Text Analytics for health extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records.

AI is increasingly indispensable in the pharmaceutical industry and will continue to bring evolutions to pharmaceutical companies in depth, accelerating the process of smart healthcare and allowing pharmaceutical companies to lead the industry's future. Pharmaceutical industry is also one of the focused areas in Microsoft AI and IoT Lab's 2022 calendar. We will provide customized AI and IoT solutions based on Microsoft's AI technology to accelerate the digital transformation for more pharmaceutical companies. We look forward to seeing more pharmaceutical companies to join us and to experience Microsoft's cutting-edge AI and IoT technologies. Let Microsoft's industry-leading technologies empower the pharmaceutical industry and lighten patients who suffer from diseases.

AI & IoT landscape is changing fast.

Apply now for Insider Labs to get the opportunity to co-engineer your solutions with Microsoft technology experts.