Analysis of Clusters With Indian Patent Data Using Different Word Embedding Techniques

Authors

  • Pankaj Beldar
  • Mohansingh Pardeshi
  • Rahul Rakhade
  • Shilpa Mene

DOI:

https://doi.org/10.53555/sfs.v10i3.2110

Keywords:

K-means, Agglomerative clustering, Word embedding, Patents, Silhouette Score

Abstract

This study employs advanced Unsupervised Machine Learning (UML) techniques, including K-means and Agglomerative clustering, to analyze descriptive Indian Patent data. Utilizing silhouette score evaluation, elbow method, and dendrogram analysis, optimal cluster numbers are determined. Various word embedding methods like TF-IDF, Word2Vec, and Countvectorizer, combined with rigorous text processing, are explored. Robust testing of categorical and numerical features yields a high silhouette score of 0.8965 for 2 clusters, showcasing Agglomerative clustering's effectiveness. The research emphasizes the crucial role of UML techniques, word embedding methodologies, and comprehensive text processing in revealing complex structures within Indian Patent data. Besides advancing unsupervised learning methodologies, this work aids scholars, practitioners, and policymakers in comprehending the Indian patent landscape, fostering innovation, and technological progress

Author Biographies

  • Pankaj Beldar

    K.K.Wagh Institute of Engineering Education and Research

     

  • Mohansingh Pardeshi

    K.K.Wagh Institute of Engineering Education and Research

     

  • Rahul Rakhade

    K.K.Wagh Institute of Engineering Education and Research

     

  • Shilpa Mene

    K. K. Wagh Institute of Engineering Education & Research,Nashik

Downloads

Published

2023-12-16

Issue

Section

Articles