This research delves into the extensive field of user profiling by examining a large dataset encompassing 200,000 healthcare professionals within both business and sociological settings. Our main objective is to construct detailed profiles based on essential data points such as gender, age, place of residence, type of medical facility, and area of specialization. Our approach is distinctly data-driven, underscoring the importance of modern research methodologies in deriving insightful conclusions. To transcend the basic correlations typically observed in user activities, we utilize Latent Dirichlet Allocation (LDA) to analyze textual data. This technique efficiently extracts significant topics that, once integrated with the medical registry, reveal specific interests and trends prevalent among physicians. We further employ neural network-based clustering methods to group these professionals into well-defined categories, facilitating the identification of behavior patterns linked to demographic factors and reading preferences. Our dataset originates from a relational database and includes records of health-related articles accessed through a web interface over three years. This data supports the creation of a term-frequency matrix vital for subsequent analyses. By integrating personal data with article consultations via a many-to-many relationship, we achieve a granular reconstruction of each physician’s reading habits. Throughout, we maintain rigorous data control and preprocessing to ensure the integrity of our dataset and the validity of our analyses. This sophisticated proach not only validates the accuracy of our machine learning techniques but also showcases their practical effectiveness in efficiently deciphering and leveraging user profile data in real-world scenarios.

Advancing User Profiling: A Comprehensive Analysis of 200k+ Physicians Using LDA Topic Extraction

Antonio, Agliata;Angelo, Ciaramella;Di Nardo, Emanuel;
2026-01-01

Abstract

This research delves into the extensive field of user profiling by examining a large dataset encompassing 200,000 healthcare professionals within both business and sociological settings. Our main objective is to construct detailed profiles based on essential data points such as gender, age, place of residence, type of medical facility, and area of specialization. Our approach is distinctly data-driven, underscoring the importance of modern research methodologies in deriving insightful conclusions. To transcend the basic correlations typically observed in user activities, we utilize Latent Dirichlet Allocation (LDA) to analyze textual data. This technique efficiently extracts significant topics that, once integrated with the medical registry, reveal specific interests and trends prevalent among physicians. We further employ neural network-based clustering methods to group these professionals into well-defined categories, facilitating the identification of behavior patterns linked to demographic factors and reading preferences. Our dataset originates from a relational database and includes records of health-related articles accessed through a web interface over three years. This data supports the creation of a term-frequency matrix vital for subsequent analyses. By integrating personal data with article consultations via a many-to-many relationship, we achieve a granular reconstruction of each physician’s reading habits. Throughout, we maintain rigorous data control and preprocessing to ensure the integrity of our dataset and the validity of our analyses. This sophisticated proach not only validates the accuracy of our machine learning techniques but also showcases their practical effectiveness in efficiently deciphering and leveraging user profile data in real-world scenarios.
2026
9789819540716
9789819540723
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11367/161160
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact