A Blog Around The Clock

Continuing with the tradition from last two years, I will occasionally post interviews with some of the participants of the ScienceOnline2010 conference that was held in the Research Triangle Park, NC back in January. See all the interviews in this series here. You can check out previous years’ interviews as well: 2008 and 2009.

Today, I asked Antony Williams from ChemSpider to answer a few questions.

Welcome to A Blog Around The Clock. Would you, please, tell my readers a little bit more about yourself? Where are you coming from (both geographically and philosophically)? What is your (scientific) background? Tell us a little more about your career trajectory so far: interesting projects past and present?

i-f0dfd509566550f0fd8319c097985f01-Tony Williams pic.jpgHi Bora…thanks for the invitation to connect! Where do I come from? When people meet me they’ll interpret my mongrel accent in many ways assuming that I am from Australia commonly (especially the Canadians) or from England (which is of course the common term for the United Kingdom over here). Well, I am from the UK but I am Welsh, not English. Earlier in life I was going to be a Welsh teacher but it’s been almost 30 years since I had a conversation in Welsh! I grew up in a small village in Wales of less than a hundred people. From there I went to Liverpool University to do a degree in Chemistry. I found Organic Chemistry very easy but really struggled with Physical Chemistry, especially spectroscopy. I found it very challenging but something in my personality, my friends call it a defect, has me prefer a challenge over something that it easy. I tend to take on those things that challenge me and push me rather than those things that are easy. So, naturally, I focused on physical chemistry, specifically spectroscopy, and in my final year of my degree did a summer project on NMR and got hooked. From there I went to London University to do my PhD looking at the effects of High Pressure on Lubricant Related Systems by Nuclear Magnetic Resonance, funded by Shell Oil. I engineered my own High Pressure Vessel made from non-magnetic titanium to stick into a magnet and apply pressures of up to 5kbar to liquids and look at the molecular dynamics under pressure. I was writing software to analyze the data and fit to specific models. Fun times – engineering, chemistry, computing – the type of diversity I like in a project.

From there I went to Ottawa, Canada to work at the National Research Centre (NRC) labs switching from Nuclear Magnetic Resonance to Electron Spin Resonance for about 18 months. It was a great place to work and I truly enjoyed the switch to a new type of spectroscopy. However, NMR definitely had more applications so I switched back to NMR and went to the University of Ottawa to run their NMR Facility, again for about 18 months. Lack of funding and the inability to get new equipment in to run even some of the more mundane modern NMR experiments had me look for other opportunities and move South to the United States to work at Kodak in Rochester as their NMR Technology Leader. There I had the responsibility to set the technology vision for NMR and manage a number of their NMR labs. During that period I was focused on the development of walk-up technologies to provide access to modern analytical technologies in the hands of chemists in a “walk-up” environment delivering robotic control, offline data access and processing and an “analytical LIMS” – a laboratory information management system to track samples, structure and spectra through our lab. We build the first web-based LIMS system, called WIMS (Web-based Information Management System) on Netscape Navigator (remember that?) and got a lot of attention and visits from the LIMS vendors. We developed software systems under the simple adage of “The Web is the Way”…how right we were. That work was done in 1996.

From Fortune 500 America I joined a small start-up chemistry software company called Advanced Chemistry Development. I joined as their product manager for NMR and over the next few years grew the product line into the industry leader for NMR prediction, for third party NMR processing and databasing and, one of the best undertakings of my scientific career, a platform for Computer Assisted Structure Elucidation. I had the opportunity to work with some of the best small molecule NMR jocks in the world, an incredible team of developers and scientists at ACD/Labs and then move my skill set outside of NMR. I managed the development of an entire analytical data management system (ADMS) covering Nuclear Magnetic Resonance, Mass Spectrometry, Chromatography, Infrared Spectroscopy and a myriad of other analytical techniques. I managed the structure drawing software, ChemSketch, that has had over a million downloads as it is now freeware, and the nomenclature product line for generating systematic names from structures and converting names to structures. The product lines became so successful that we had to bring in a group of other product managers who could focus on the individual product lines. I became their Chief Science Officer with a major focus on business development but always kept my hands in direct product management, marketing and sales. My passion remained the application of software to data handling, manipulation and delivery to scientists and trying to extract as much information as possible from available data.

A few years ago I floated an idea inside ACD/Labs regarding how it might be possible to index chemical compounds within an organization. Not just ones sitting inside a structure database but those represented in documents, reports, papers, publications, patents and represented by chemical names and structure images. It would require the culmination of multiple technologies including entity extraction techniques to find chemical identifiers, algorithms and look-up dictionaries to convert names to structures and software to convert structure images to structures. The intention was to index inside a central database and provide a tool to structurally index the network. We never moved the project forward because there was too much going on.

A couple of years later I was working extreme hours, focused a lot on sales, marketing and business. While it was fun there was a creative part of me not being exercised and I decided to start a hobby project to stress that particular muscle. I’d been watching what was going on with PubChem and a number of other online databases such as DrugBank. Web technologies had come a long way and I implicitly still believed in the “web is the way”. The concept of spidering an organization’s network had expanded to spidering the internet. Admittedly a major undertaking, a lot of the tools were coming together to allow it to happen. A few of my friends and I got together to create a platform for centrally indexing chemistry on the internet with the intention of linking chemical compounds to related resources on the web. And so ChemSpider was born.

i-6a245d1d06150e2db8ecd61cb38e5644-ChemSpider logo.pngOnce ChemSpider went online as a structure searchable database of about 10 million chemicals we expanded the database by adding data from various other data sources, added functionality to query the data in various ways and added various services to allow organizations to tap into the resource we were building. Our target shifted over the next couple of years to one of building a structure centric community for chemists and, as we started to assemble and index the public chemistry on the internet it became clear that there was an enormous quality issue in the majority of the public compound databases we wanted to link too. There were so many errors in these databases it was quite shocking. As we assembled our database we were inheriting these errors and it was clear that we would need to curate these data in both robotic and manual ways. We built a curation platform to allow crowdsourced curation of the data so that users of ChemSpider could help us clean up the data. We added a deposition system for users to deposit their own chemistry and we added a series of tools to allow users to annotate the data and add supplementary information. The database today is almost 25 million unique entities assembled from over 300 data sources. We’ve truly built a community of chemists around ChemSpider with thousands of users coming to the site everyday and with a number of these users curating, annotating and adding data on an ongoing basis.

In June of last year the Royal Society of Chemistry acquired ChemSpider and that is where I am now as the Vice President of Strategic Development.

What is taking up the most of your time and passion these days? What are your goals?

Our focus remains consistent with the original goal of building a central portal for chemists to facilitate traversing the web to find chemistry related data, information and knowledge. At present we remain focused on linking together structure-based data and resources but will eventually expand this out to chemical compounds that cannot be explicitly defined by a chemical structure table…things such as polymers, minerals and mixtures (coal tar, mineral oil, etc.). We busy building curated disambiguation dictionaries and use them as the basis of chemical name (entity) extraction and recognition so that we can perform semantic markup and linking. We continue to expand the breadth and improve the quality of the data on the database with the intention of being able to query and link to every structure-based database that can be accessed via the internet. Chemists have different personae – there are synthetic chemists, analytical scientists, medicinal chemists, chemistry students and teachers to name just a few. While each of these would want to access different types of data for their work and research a Venn Diagram would provide a specific set of query overlaps – let them search by chemical name, chemical structure/substructure and properties. From there they would layer on different expectations about what to do with the result set. The goal is simple…make the internet structure-searchable and provide interfaces and services to allow chemists to query and use the results.

What aspect of science communication and/or particular use of the Web in science interests you the most?

i-dfb5404fda43c74cb283d52204ec0325-Tony Williams pic2.jpgOne specific area of interest I have right now is to encourage crowdsourced collaboration in chemistry. My bias at present is to present an environment whereby members of the chemistry community can give/share/contribute/educate/enable/improve chemistry on the internet. In our terms this means allowing them to add their data to the ChemSpider database, annotate what’s already online, validate and curate out the junk. By applying their skills and contributing they can build their own professional profile in the community and bring benefit to other chemists. We are intending to layer on recognition and rewards systems and allow chemists to form connection networks of collaboration. We ourselves are already immersed into the network of Open Notebook Science providing access to services and data allowing others to perform their research. One of our areas of focus right now is ChemSpider SyntheticPages, an online database of synthetic procedures built for the community by the community. There is so much chemistry, so many chemical reactions that are performed in labs across the world but the synthetic details and associated analytical data never sees light of day and never gets published. It might make it into a thesis but then that will get put on the supervisors shelf or in a library somewhere. Despite the fact that these can be electronically enabled and discoverable the reality is it hardly happens. If we can get just a fraction of the chemistry community to donate one SyntheticPage a week the database will explode. As it’s a free resource chemists have much to benefit. The challenge is to how to encourage a chemist to invest some of their time in writing up their procedure and putting it online. Contributors to date have commented that if its already in electronic format it might add another 15-30 minutes to their day but the result is public exposure of the work, a permanent record of value to other chemists, a public profile for the submitted (including a digital object identifier for the resume!), and an opportunity to engage the community as they can provide feedback and comments. Everyone wins.

How does (if it does) blogging figure in your work?

I don’t blog as much as I used to simply because I don’t have as much time on my hands. When ChemSpider started I was “dragged” into blogging because of some attacks made on ChemSpider made by very vocal members of the blogosphere. I couldn’t figure out how to defuse some of the misinformation and accusations being made about our efforts with ChemSpider except to become a participant in the blogosphere. I found that blogging became a great way for me to engage the ChemSpider users and get their feedback on ideas for improving the service, to communicate new functionality in the system, to express my views of things going on in the community and to generally release creative expression again through writing.

The ChemSpider blog remains a way to communicate what we’re up to in terms of new developments on ChemSpider and other Cheminformatics projects internal to RSC. It also gives me a voice to comment on what’s going on in chemistry that interests me, what’s happening in the world of Open Science and engaging our users in dialog.

How about social networks, e.g., Twitter, FriendFeed and Facebook? Do you find all this online activity to be a net positive (or even a necessity) in what you do?

Facebook for me, at present, is more of a personal tool in terms of interacting with my friends and family in the UK and around the world. I use Twitter quite regularly (as @ChemSpiderman) and certainly while I am sitting in conferences and seminars. I have found Twitter surprisingly useful, more than I had ever imagined when it first showed up on the scene. My interactions via Friendfeed are certainly useful and I stay connected to certain groups of people on there and stay connected and informed. While each of these takes time it is definitely a net positive, though I would clarify, not a necessity for what I do. I am definitely an advocate for LinkedIn and find the networking aspects of that platform in particular very enabling.

When and how did you first discover science blogs? What are some of your favourites? Have you discovered any cool science blogs by the participants at the Conference?

I first discovered science blogs when I was dragged into the blogosphere by some particularly negative commentaries that were being made about ChemSpider. Lots of judgments, the majority of them not fact-based, were made about what we were trying to achieve with ChemSpider. As they say however, “no press is bad press” and once the fire was lit I entered the blogosphere to respond to the accusations. Without doing so I feel that our reputation would have been very negatively tarnished. It is one of the downsides of the blogosphere unfortunately…people get to say whatever they want, whatever they perceive and, in certain cases have no facts or data to back up their claims. That is when things get very interesting and engaging though!

My Google Reader follows a number of bloggers from my domain. I have a particular appreciation for the insights of Derek Lowe on his “In the Pipeline” blog. I follow Cameron Neylon, Jean-Claude Bradley, Egon Willighagen, Milkshake’s “Org Prep Daily“, Paul Docherty’s “Totally Synthetic” and many others of a similar nature. I had to slim down what was feeding the reader recently as following too many people was becoming overly distracting. I didn’t start following any particular blogs after the ScienceOnline conference but I do watch a lot more people via Twitter now and, when they tweet a post of interest, I navigate over to their blog. Twitter has become another way to link me into blogposts of interest without me overpopulating my reader.

What was the best aspect of ScienceOnline2010 for you? Any suggestions for next year? Is there anything that happened at this Conference – a session, something someone said or did or wrote – that will change the way you think about science communication, or something that you will take with you to your job, blog-reading and blog-writing?

ScienceOnline was fun. I attend a lot of conferences in a year but the energy at ScienceOnline is simply contagious. The level of engagement and contribution far outweighs that I have experienced at any other conference other than the two SciFoo meetings I have attended. Participants at these types of meeting are there to do more than listen. They want to speak…they want to engage and they want to share their opinions. At many conferences there are blocks of time when I am not in sessions. At ScienceOnline there were too many sessions I wanted to sit in on and couldn’t. A much better situation! I walked out of the meeting with new connections, new collaborations and new possibilities. Definitely worth attending.

My one embarrassing moment was when I stood up to do the Lightning (Ignite) Talk at the dinner and hadn’t read the rules of engagement as it were. A pure oversight on my part regarding the flow of the Ignite Talk it actually worked for some strange and unknown reason. Keep the Ignite Talk format next year at the dinner…they were great fun.

It was so nice to see you again and thank you for the interview. I hope to see you again next January.

Comments

  1. #1 ChemSpiderman
    May 20, 2010

    Bora…thanks for the chance to chat about ChemSpider and our ongoing work to build community for chemists with our efforts. Much appreciate the opportunity to connect! Definitely see you at next years ScienceOnline

  2. #2 Glendon Mellow
    May 20, 2010

    That was one of my very favourite talks of the conference. A real eye opener to how many mistakes can be found online.

The site is undergoing maintenance presently. Commenting has been disabled. Please check back later!