Abstract
Academic publication archives often draw from numerous, heterogeneous sources, whose records can follow differing naming conventions. As such, ambiguity issues concerning authorship of scientific papers often arise, such as authors sharing similar names, the use of first names versus initials, or alternate name spellings for the same author. These ambiguities have plagued research on scientific collaboration and influence. Detecting and correcting these errors is important for maintaining the archive, as well as for ensuring correctness and reliability in any desired subsequent analysis. There are existing analytic methods designed to accomplish this with varying degrees of accuracy, but many of them require fine tuning or manual categorization. We have developed a visual analytics system to interactively control and apply several analytic name disambiguation algorithms in a finely controlled manner and to present the results to the user for verification or correction. We demonstrate the efficacy of our system by using it to find and resolve ambiguities in authorship data collected from Cornell University Library's arXiv.org and the InfoVis 2004 contest dataset with improved accuracy and speed over existing approaches.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016 |
Publisher | Association for Computing Machinery, Inc |
Pages | 52-60 |
Number of pages | 9 |
ISBN (Electronic) | 9781450346177 |
DOIs | |
State | Published - Dec 6 2016 |
Event | 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016 - Shanghai, China Duration: Dec 6 2016 → Dec 9 2016 |
Other
Other | 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016 |
---|---|
Country | China |
City | Shanghai |
Period | 12/6/16 → 12/9/16 |
Keywords
- Bibliographies
- Coauthor graphs
- Name ambiguity
- Visual analytics
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Information Systems