|
AuthorAuthorities
Author Disambiguation
IntroductionAuthor authority control requires a lot of hard work. So who wrote what? Citation ExampleDong, Liang, Jiang, Hongrui. (2007). Autonomous microfluidics with stimuli-responsive hydrogels. Soft Matter, 3(10), 1223-1230. From this Citation, we know these truths:
BibApp's Author Authority WorkflowI'll use myself as an example: User name: Eric William Larson I login to the BibApp for the first time and the website asks me to select all the NameStrings I have published under throughout my career. I type in my last name to filter the NameString index. Here's what I see: Potential NameString variants:
Maybe all the Larson, E* NameStrings are correct (hypothetically, I have published under all these name variants), so I select the check-box next to each NameString to create a new PenName association record, tying my Person record (Person 1: Eric William Larson) to each NameString. Lastly, I select one of these variants as my "preferred" PenName: Larson, E.W. (ETA: 0.9) Now I have created 6 PenNames for myself (Person 1: Eric William Larson). Anytime a new Citation is entered into the BibApp for any of those NameStrings Larson, E*, the BibApp will automatically create an Contributorship row creating the association between the new Citation and me (Person: Eric William Larson). It's not that simple!The above example works great if no one else "claims" one of my selected PenNames. When Erica Larson (Person 2: Erica Larson) enters the application and selects "Larson, E" as one of her PenNames, what should we do? Every Contributorship row, associating a Person to a Citation needs to have a "status" and "score" attribute. Status options: "calculated", "verified", "denied" Score: integer representing "how certain we feel the record is valid" Now to handle Erica's case:
Now two people claiming the "Larson, E" NameString have Contributorship records for some Citations... this could be correct if the same NameString appeared twice in the citation (probably unlikely). Who's the real author?Contributorship "status" and "score" help us clean up the mess. In the BibApp, there are three ways to create Contributorship records:
If we're adding a targeted batch import or a single citation via a web form, we're simultaneously verifying the Contributorship record connecting the Person to the Citation. If we're generically adding citations to application, we need to start (smartly) guessing who the real author is. Take a look at the flow chart below. This picture illustrates our plans for machine scoring the authorships we batch import: First stab at AlgorithmBased on four data fields:
Calculate contributorships.score in the following way. Get all verified citations for person. For each citation, calculate the score using the following algorithm:
Author Disambiguation / Authorship Assignment Flow Chart |
Sign in to add a comment

