Question (IR 1)
Informal interpersonal communication is very important among scientists. Describe a retrieval system to identify collaborators. Include the following in your answer:
a. Knowledge representation to enhance matching queries
b. Search features and fields
c. Relevance criteria that scientists might apply
Answer:
0. Introduction
With increasing complexity in scientific work, collaborations spanning discipline, department, institution, and country are ever more important to exploit the expertise available and to make the most scientific progress. Yet it can be complex to identify potential collaborators outside of a scientist’s particular research area even within an institution and particularly internationally. This essay describes a retrieval system to assist scientists in identifying potential collaborators. The essay starts by describing reasons a scientist might be looking for collaborators and what a good system suggestion would look like. It continues by describing the knowledge representation system required to support retrieval. It then describes the required search features and fields. The essay ends by describing relevance criteria scientists might apply.
1. Problem Space
Finding potential collaborators is not a trivial problem, in particular for scientists entering a new field or searching outside of their primary field. This system seeks to represent the user’s collaboration needs in the form of a series of queries, the pool of collaborators in the system, and to match the two.
Scientists might be looking for collaborators for several reasons. These include looking for similar interests or methods or looking for complementary interests or methods. Looking for similar interests or methods will enable the scientists to form common ground more quickly, and to immediately work on new problems; however, it might be more typical that the collaborator is expected to fill in the missing puzzle piece, and to bring a missing but needed expertise or resources (genes, reagents, equipment, etc.) to the collaboration.
Once scientists identify potential collaborators on topic, they must also assess how well that potential collaborator’s personality, working style, or authority fits in with the existing team. In other words, even if the potential collaborator has the ideal interests or skills, he might not be a good collaborator if he is a jerk. This system will not address these aspects directly; however, the system could link out to social networking sites or social computing sites so that the user can assess personality and work style.
1.1 Similar Interests A scientist might be looking for another scientist who:
- is interested in precisely the same research problem
- uses the same model organism or chemical
- studies the same system or entity
- comes from the same research paradigm, approach, or school of thought
- or who uses the same experimental equipment
In addition to interests, scientists might be seeking collaborators who have similar skills, who use similar experimental methods, or who use similar mathematical, statistical, or computational methods.
1.3 Complementary Interests
An example of complementary interests might be in ecology, when one scientist is interested in a particular organism and another is interested in the system, they might be able to share data or collaborate when that organism is sighted in the ecosystem. Zimmerman and Van House both describe the sharing of biology data between scientists with complementary interests.
1.4 Complementary Skills or Methods
In this case, the scientist might be seeking a collaborator who can bring a necessary expertise to the collaboration. For example, the scientist might have decided that a certain data reduction or computational method is required to deal with the data, so will search for that needed expertise, regardless of the subject area in which the potential collaborator typically works. Statisticians in biomedical research areas typically appear as co-authors on papers with very diverse research participants, diverse diseases or health issues, and so forth.
2. Knowledge Representation
In this system, knowledge representation is how we describe a collaborator so that the system can retrieve relevant collaborator profiles and so the user can assess relevance using the criteria described in 4, below. The user’s knowledge representation, and the representation of his or her query is described below in section 2, Search Features and Fields. Representation of the potential collaborators also includes the derivation of the information to complete the profile.
2.1 Representation of a Potential Collaborator
As described in section 1 above and further described below in section 4, Relevance Criteria, users of the system might be looking for similar or complementary interests or skills. The representation should include fields typically found in a CV and others:
- Name
- Language proficiency
- Current location
- Willingness to travel
- Citizenship, and clearances (if US)
- Education (what degree, from where, with which advisor, at what time)
- Employment (institution, location, lab or research group)
- Model organisms used
- Organisms studied
- Chemicals or systems studied
- Approaches
- Equipment expertise
- Techniques/Skills
- Articles Written (these should be described with the typical citation fields, and also descriptors)
- Conferences attended
- Society memberships
- Classes taught
2.2 Derivation of the Representation Information
As in literature, characterization can be done through an explicit description of the character, what the character says, or through the character’s actions. In other words, we can derive the representation of the scientist through
- a profile the scientists creates;
- through mining of the articles, data sets, protocols, and other deliverables the scientist creates;
- and by compiling information on previous collaborations, guest lectures given, and conferences and workshops attended.
More experienced scientists will have a lot more information in their profile, than younger or less experienced scientists. For this reason, it is equally valid that the user asserts expertise in his or her profile as well as substantiating that expertise with published articles. The system should mine the research literature, have user-created profiles, mine society web sites and membership directories (when allowed), and mine university and research lab web sites.
3. Search Features and Fields
The system has to support query formulation in several ways to be effective. The system should allow standard fielded keyword searching, browsing by various fields, matching or recommendations by similarity, and alerting.
3.1 Keyword or Guided Keyword Searching
Most users expect to find a screen into which they can just enter a few keywords and get some results. This system should provide that feature, and if possible should attempt to map input terms to appropriate fields. For example, if someone enters a chemical name, the system should return scientists that study that chemical, that use methods which require that chemical as matrix, reagent, or whatever, and so forth. The search results should provide facets to allow the user to further refine the search by other criteria.
Guided keyword searching would allow the user to input a term, and require the system to locate that term within a field. As in other research databases, the user should be able to look up the appropriate term in an index, or just enter any keyword that comes to mind. The system should offer search suggestions and spell checking.
3.2 Browsing
Browsing is underrated by many information retrieval practitioners but it is an important way to search for information. In this system, browsing is particularly important because users might not know how a method is described in a different discipline. Once the user clicks or inputs any facet to browse, the system could provide a list, and then display facets by which the user could narrow. For example, show me only those who are within 100 miles of my current location.
Rashmi Sinha described something called pivot browsing, where from a particular record found through any method, you can then start a new browse by clicking on a field. In other words, you could browse other profiles using the same term from the same controlled vocabulary and maybe get recommendations from other vocabularies.
3.3. Matching by Similarity A scientist who is represented in the system should be able to find another scientist listed in the system by highlighting portions of his or her profile and asking the system to locate similar scientists. There could be some prioritization system, too, if multiple fields are selected. For example, it is most important that they have the same citizenship as me, but it is also important that they study this organism, and it would be nice if they use this method, and I could meet them at an upcoming meeting or workshop because they’ve attended that meeting in the past.
Similarly, a user could point to a research article or protocol, and ask the system to locate a scientist relevant to the methods used or the topic studied. This match might occur by looking at citation of that section of the document, or even better, the system could extract the methods or topics from the document, and then match these in the system.
3.4 Alerting
The system should also be configured to alert users to new profiles matching their search profile. This alerting can happen within the system with a message displayed on login, or e-mails or RSS feeds can be sent out to the user if they prefer.
4. Relevance Criteria
There are many criteria a scientist might use to assess the relevance of a potential collaborator using their profile in the system.
4.1 Topical Relevance
Topical relevance describes the match of the features of the searcher with the features of the retrieved scientist. A potential collaborator will be directly relevant if they are interested in the same research problem, they use the same approaches, they have the needed skills, they use the same research or analysis methods, and so forth. A potential collaborator will also be relevant if they bring new and necessary skills and expertise to the collaboration – this is similar to the topical relevance idea of novelty.
4.2 Other Relevance Criteria
There are other important relevance criteria in addition to topical matching. These include availability, policy, seniority, and access to resources.
4.2.1 Availability
An ideal collaborator will have time available at the appropriate portions of the project cycle to devote to the issue. While this seems relatively straightforward, the new project might be more important to the scientist than existing projects, so he or she might be willing to oversubscribe or accept projects even when fully scheduled. On the other hand, existing requirements such as teaching a course or serving on a committee might severely restrict the scientist’s ability to travel to equipment or allocate the time required.
4.2.2 Policy
Policy criteria are multidimensional. Depending on the funding source, the scientist’s citizenship and location might be extremely important. For example, a national government might require that all investigators come from that country. Likewise, a funding body might call the research that results from a collaboration export controlled, so team members could not discuss results without export clearances. Other policies that could impact this are requirements for the protection of privacy of participants (health information, student information, etc), the treatment of animals, or the treatment of intellectual property. Potential collaborators with experience and education in the funding body’s requirements for privacy, for example, will be more attractive than collaborators who need to spend a great deal of time in online compliance courses and certifications.
4.2.3. Seniority
Potential collaborators might be very senior, running their own lab, or might be new post doctoral researchers. More than topic matching on expertise, these criteria rely on mentoring possibilities and project management skills
4.2.4. Access to Resources
For some “big science” projects, the primary relevance criterion might be access to large, very expensive equipment, but it is also important for smaller projects. Access to resources could mean model organisms, human subjects, astronomical datasets, computing power, ecosystems, or spectrometers, for example. A scientist might be skilled in the use of some equipment, but might only be relevant if he or she can access the equipment for the collaboration. An organism or ecosystem might be of interest, but a scientist might need to find a local scientist who has physical access to go to the field.
5. Conclusion
In this essay I have described desirable features for a system to recommend potential collaborators for a scientist. I described the problem space, how the collaborators would be represented and how that information is obtained, how search should be facilitated, and how the users will most likely judge relevance.
In this system, I have omitted discussion of probably one of the most important things in collaboration, chemistry – as in e-dating sites. In other words, whether the two potential collaborators can get along or if they will have personality and working style clashes that will impede the work. The system could facilitate the scientists chatting online, but meeting outside of the system is probably more important in determining if the collaboration will work. Additionally, some might suggest a rating system for collaborators, but this has so many political and ethical implications it could prevent the system from working or indeed, ever being released for use. Unless someone has committed scientific fraud which is well documented, they should stay in the system and be judged based on their actions and outputs which are less controversial and less likely to cause unpleasantness.

Christina K. Pikas is a science and engineering librarian in a special library as well as a doctoral student in information studies.



Comments
This one works for me:
biosemantics.org/jane/
Posted by: Danny O'Reario | June 4, 2009 5:09 PM