Pioneering Tech Trends — Tech Pulse's Data & Cloud Computing

Refining Patent Model Searches for Optimal Results

Google Develops Philosophy of Phrases for Enhancing Patent Search Models

, and Administrator

2025 July 28 . 3:28 AM

2 min read

Guide for Developing Patent Query Systems

Refining Patent Model Searches for Optimal Results

In a move aimed at improving the accuracy of patent search models, Google has developed a unique dataset of phrases specifically curated for this purpose. However, it's important to note that Google does not publicly provide a direct downloadable dataset of phrases for patent search model training.

The dataset, which contains approximately 50,000 phrase-to-phrase pairs, is designed to help enhance the search capabilities of patent databases. The pairs in the dataset are labeled to denote how phrases are related to each other, with relationships including synonyms, exact matches, or unrelated terms.

This dataset is particularly useful given that many patent owners use non-standard language to describe their patents' subjects, leading to widely varied and impractical search returns. An example of non-standard language could be describing a soccer ball as a "spherical recreation device."

To access the patent data required to generate this dataset, users can turn to resources like Google Patents or the Global Dossier. Google Patents provides a searchable interface to U.S. and worldwide patents, including full texts, images, and metadata. Users can query titles, abstracts, claims, and descriptions to extract phrases relevant to patent search tasks. The Global Dossier service, offered by the USPTO and cooperating patent offices, provides access to related patent family documents from multiple jurisdictions, including file histories and classifications.

While Google has not released a pre-packaged dataset for public use, the availability of patent data through these resources enables users to generate their own phrase datasets for patent search model training. This process involves collecting patent text data, using text processing techniques to extract and analyze phrases, and then labeling the relationships between these phrases.

For those seeking more open-source or bulk access, specialized patent search tools like PatentLens allow programmatic or bulk querying of international patent data, potentially useful for compiling phrase datasets.

It's worth noting that Google has patented technologies for phrase-based thematic search that use information gain to identify predictive phrases for indexing and retrieval. This suggests that such phrase extraction is done dynamically from actual patent data rather than via a static dataset.

In summary, while Google has not released an official, pre-packaged dataset for patent search model training, full patent documents and their metadata are publicly accessible through Google Patents and related services. Users can generate their own phrase datasets by collecting patent text data, using text processing techniques to extract and analyze phrases, and then labeling the relationships between these phrases. This can help improve the accuracy of patent search models and make the patent search process more efficient.

The AI-driven patent search models may benefit from data-and-cloud-computing technology by using datasets like the one generated by Google, which can help enhance the search capabilities of patent databases. As Google does not publicly provide a direct downloadable dataset, those interested can collect patent text data from resources such as Google Patents, the Global Dossier, or PatentLens, and generate their own dataset for patent search model training.

Latest

In this picture we observe a fuel tank on which AMBUL is written.

Automotive

Mercedes-Benz Unveils New CLE Coupé: A Powerful Blend of C-Class & E-Class

The new CLE Coupé brings together the best of two worlds. With its powerful engine and advanced features, it's set to make a splash in Australia.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

AI Revolution

Amazon's New AI-Powered Seller Assistant Boosts U.S. Merchants' Business

Amazon's new AI-driven Seller Assistant is a game-changer for U.S. merchants. It handles crucial tasks, offers valuable insights, and optimizes product distribution, all at no extra cost.

, and Administrator

2025 October 9

In the center of the image, we can see a fly on the net.

Industry

China Condemns US 'Cyber-Theft' at Defense University

China demands answers after US allegedly steals 140GB of data from a top defense university. The US acknowledges its grey zone cyber-activity but denies industrial espionage.

, and Administrator

2025 October 9

In the picture I can see few cameras which are of different types and there is something written...

Tech Pulse's Top Gadget Picks

Amazon's Prime Deal Days 2025: Big Savings on 4K Dashcams

Amazon's Prime Deal Days 2025 brought massive savings on high-quality 4K dashcams. Upgrade your tech now!

, and Administrator

2025 October 9

Refining Patent Model Searches for Optimal Results

Refining Patent Model Searches for Optimal Results

Read also:

Related

Latest