Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Peter Brown, Yaoqi Zhou, Aik-Choon Tan, Mohamed A. El-Esawi, Thomas Liehr, Oliver Blanck, Douglas P. Gladue, Gabriel M. F. Almeida, Tomislav Cernava, Carlos O. Sorzano, Andy W. K. Yeung, Michael S. Engel, Arun Richard Chandrasekaran, Thilo Muth, Martin S. Staege, Swapna V. Daulatabad, Darius Widera, Junpeng Zhang, Adrian Meule, Ken HonjoOlivier Pourret, Cong-Cong Yin, Zhongheng Zhang, Marco Cascella, Willy A. Flegel, Carl S. Goodyear, Mark J. van Raaij, Zuzanna Bukowy-Bieryllo, Luca G. Campana, Nicholas A. Kurniawan, David Lalaouna, Felix J. Hüttner, Brooke A. Ammerman, Felix Ehret, Paul A. Cobine, Ene-Choo Tan, Hyemin Han, Wenfeng Xia, Christopher McCrum, Ruud P. M. Dings, Francesco Marinello, Henrik Nilsson, Brett Nixon, Konstantinos Voskarides, Long Yang, Vincent D. Costa, Johan Bengtsson-Palme, William Bradshaw, Dominik G. Grimm, Nitin Kumar, Elvis Martis, Daniel Prieto, Sandeep C. Sabnis, Said E. D. R. Amer, Alan W. C. Liew, Paul Perco, Farid Rahimi, Giuseppe Riva, Chongxing Zhang, Hari P. Devkota, Koichi Ogami, Zarrin Basharat, Walter Fierz, Robert Siebers, Kok-Hian Tan, Karen A. Boehme, Peter Brenneisen, James A. L Brown, Brian P. Dalrymple, David J. Harvey, Grace Ng, Sebastiaan Werten, Mark Bleackley, Zhanwu Dai, Raman Dhariwal, Yael Gelfer, Marcus D. Hartmann, Pawel Miotla, Radu Tamaian, Pragashnie Govender, Oliver J. Gurney-Champion, Joonas H. Kauppila, Xiaolei Zhang, Natalia Echeverría, Santhilal Subhash, Hannes Sallmon, Marco Tofani, Taeok Bae, Oliver Bosch, Páraic O. Cuív, Antoine Danchin, Barthelemy Diouf, Tuomas Eerola, Evangelos Evangelou, Fabian V. Filipp, Hannes Klump, Lukasz Kurgan, Simon S. Smith, Olivier Terrier, Neil Tuttle, David B. Ascher, Sarath C. Janga, Leon N. Schulte, Daniel Becker, Christopher Browngardt, Stephen J. Bush, Guillaume Gaullier, Kazuki Ide, Clement Meseko, Gijsbert D. A. Werner, Jan Zaucha, Abd A. Al-Farha, Noah F. Greenwald, Segun I. Popoola, Md Shaifur Rahman, Jialin Xu, Sunny Y. Yang, Noboru Hiroi, Ozgul M. Alper, Chris I. Baker, Michael Bitzer, George Chacko, Birgit Debrabant, Ray Dixon, Evelyne Forano, Matthew Gilliham, Sarah Kelly, Karl-Heinz Klempnauer, Brett A. Lidbury, Michael Z. Lin, Iseult Lynch, Wujun Ma, Edward W. Maibach, Diane E. Mather, Kutty S. Nandakumar, Robert S. Ohgami, Piero Parchi, Patrizio Tressoldi, Yu Xue, Charles Armitage, Pierre Barraud, Stella Chatzitheochari, Luis P. Coelho, Jiajie Diao, Andrew C. Doxey, Angélique Gobet, Pingzhao Hu, Stefan Kaiser, Kate M. Mitchell, Mohamed F. Salama, Ivan G. Shabalin, Haijun Song, Dejan Stevanovic, Ali Yadollahpour, Erliang Zeng, Katharina Zinke, C. G. Alimba, Tariku J. Beyene, Zehong Cao, Sherwin S. Chan, Michael Gatchell, Andreas Kleppe, Marcin Piotrowski, Gonzalo Torga, Adugna A. Woldesemayat, Mehmet I. Cosacak, Scott Haston, Stephanie A. Ross, Richard Williams, Alvin Wong, Matthew K. Abramowitz, Andem Effiong, Senhong Lee, Muhammad Bilal Abid, Cyrus Agarabi, Cedric Alaux, Dirk R. Albrecht, Gerald J. Atkins, Charles R. Beck, A. M. J. J. Bonvin, Emer Bourke, Thomas Brand, Ralf J. Braun, James A. Bull, Pedro Cardoso, Dee Carter, Robin M. Delahay, Bernard Ducommun, Pascal H. G. Duijf, Trevor Epp, Eeva-Liisa Eskelinen, Mazyar Fallah, Debora B. Farber, Jose Fernandez-Triana, Frank Feyerabend, Tullio Florio, Michael Friebe, Saori Furuta, Mads Gabrielsen, Jens Gruber, Malgorzata Grybos, Qian Han, Michael Heinrich, Heikki Helanterä, Michael Huber, Albert Jeltsch, Fan Jiang, Claire Josse, Giuseppe Jurman, Haruyuki Kamiya, Kim de Keersmaecker, Erik Kristiansson, Frank-Erik de Leeuw, Jiuyong Li, Shide Liang, Jose A. Lopez-Escamez, Francisco J. Lopez-Ruiz, Kevin J. Marchbank, Rolf Marschalek, Carmen S. Martín, Adriana E. Miele, Xavier Montagutelli, Esteban Morcillo, Rosario Nicoletti, Monika Niehof, Ronan O'Toole, Toshihiko Ohtomo, Henrik Oster, Jose-Alberto Palma, Russell Paterson, Mark Peifer, Maribel Portilla, M. C. Portillo, Antonia L. Pritchard, Stefan Pusch, Gajendra P. S. Raghava, Nicola J. Roberts, Kehinde Ross, Birgitt Schuele, Kjell Sergeant, Jun Shen, Alessandro Stella, Olga Sukocheva, Vladimir N. Uversky, Sven Vanneste, Martin H. Villet, Miguel Viveiros, Julia A. Vorholt, Christof Weinstock, Masayuki Yamato, Ioannis Zabetakis, Xin Zhao, Andreas Ziegler, Wan M. Aizat, Lauren Atlas, Kristina M. Bridges, Sayan Chakraborty, Mieke Deschodt, Helena S. Domingues, Shabnam S. Esfahlani, Sebastian Falk, J. L. Guisado, Nolan C. Kane, Gray Kueberuwa, Colleen L. Lau, Dai Liang, Enwu Liu, Andreas M. Luu, Chuang Ma, Lisong Ma, Robert Moyer, Adam D. Norris, Suresh Panthee, Jerod R. Parsons, Yousong Peng, Inês Mendes Pinto, Cristina R. Reschke, Elina Sillanpää, Christopher J. Stewart, Florian Uhle, Hui Yang, Kai Zhou, Shu Zhu, Mohamed Ashry, Niels Bergsland, Maximilian Berthold, Chang-Er Chen, Vito Colella, Maarten Cuypers, Evan A. Eskew, Xiao Fan, Maksymilian Gajda, Rayner Gonzálezlez-Prendes, Amie Goodin, Emily B. Graham, Ewout J. N. Groen, Alba Gutiérrez-Sacristán, Mohamad Habes, Enrico Heffler, Daniel B. Higginbottom, Thijs Janzen, Jayakumar Jayaraman, Lindsay A. Jibb, Stefan Jongen, Timothy Kinyanjui, Rositsa G. Koleva-Kolarova, Zhixiu Li, Yu-Peng Liu, Bjarte A. Lund, Alexandre A. Lussier, Liping Ma, Pablo Mier, Matthew D. Moore, Katja Nagler, Mark W. Orme, James A. Pearson, Anilkumar S. Prajapati, Yu Saito, Simon E. Tröder, Florence Uchendu, Niklas Verloh, Denitza D. Voutchkova, Ahmed Abu-Zaid, Joaira Bakkach, Philipp Baumert, Marcos Dono, Jack Hanson, Sandrine Herbelet, Emma Hobbs, Ameya Kulkarni, Narendra Kumar, Siqi Liu, Nikolai D. Loft, Tristan Reddan, Thomas Senghore, Howard Vindin, Haotian Xu, Ross Bannon, Branson Chen, Johnny T. K. Cheung, Jeffrey Cooper, Ashwini K. Esnakula, Karine A. Feghali, Emilia Ghelardi, Agostino Gnasso, Jeffrey Horbar, Hei M. Lai, Jian Li, Lan Ma, Ruiyan Ma, Zihang Pan, Marco A. Peres, Raymond Pranata, Esmond Seow, Matthew Sydes, Ines Testoni, Anna L. Westermair, Yongliang Yang, Masoud Afnan, Joan Albiol, Lucia G. Albuquerque, Eisuke Amiya, Rogerio M. Amorim, Qianli An, Stig U. Andersen, John D. Aplin, Christos Argyropoulos, Yan W. Asmann, Abdulaziz M. Assaeed, Atanas G. Atanasov, David A. Atchison, Simon V. Avery, Paul Avillach, Peter D. Baade, Lars Backman, Christophe Badie, Alfonso Baldi, Elizabeth Ball, Olivier Bardot, Adrian G. Barnett, Mathias Basner, Jyotsna Batra, O. M. Bazanova, Andrew Beale, Travis Beddoe, Melanie L. Bell, Eugene Berezikov, Sue Berners-Price, Peter Bernhardt, Edward Berry, Theolis B. Bessa, Craig Billington, John Birch, Randy D. Blakely, Mark A. T. Blaskovich, Robert Blum, Marleen Boelaert, Dimitrios Bogdanos, Carles Bosch, Thierry Bourgoin, Daniel Bouvard, Laura M. Boykin, Graeme Bradley, Daniel Braun, Jeremy Brownlie, Albert Brühl, Austin Burt, Lisa M. Butler, Siddappa N. Byrareddy, Hugh J. Byrne, Stephanie Cabantous, Sara Calatayud, Eva Candal, Kimberly Carlson, Sònia Casillas, Valter Castelvetro, Patrick T. Caswell, Giacomo Cavalli, Vaclav Cerovsky, Monica Chagoyen, Chang-Shi Chen, Dong F. Chen, Hao Chen, Hui Chen, Jui-Tung Chen, Yinglong Chen, Changxiu Cheng, Jianlin Cheng, Mai Chinapaw, Christos Chinopoulos, William C. S. Cho, Lillian Chong, Debashish Chowdhury, Andre Chwalibog, A. Ciresi, Shamshad Cockcroft, Ana Conesa, Penny A. Cook, David N. Cooper, Olivier Coqueret, Enoka M. Corea, Elisio Costa, Carol Coupland, Stephanie Y. Crawford, Aparecido D. Cruz, Huijuan Cui, Qiang Cui, David C. Culver, Amedeo D’Angiulli, Tanya E. S. Dahms, France Daigle, Raymond Dalgleish, Havard E. Danielsen, Sebastien Darras, Sean M. Davidson, David A. Day, Volkan Degirmenci, Luc Demaison, Koenraad Koenraad Devriendt, Jiandong Ding, Yunus Dogan, X. C. Dong, Claudio F. Donner, Walter Dressick, Christian A. Drevon, Huiling Duan, Christian Ducho, Nicolas Dumaz, Bilikere S. Dwarakanath, Mark H. Ebell, Steffen Eisenhardt, Naser Elkum, Nadja Engel, Timothy B. Erickson, Michael Fairhead, Marty J. Faville, Marlena S. Fejzo, Fernanda Festa, Antonio Feteira, Patrick Flood-Page, John Forsayeth, Simon A. Fox, Steven J. Franks, Francesca D. Frentiu, Mikko J. Frilander, Xinmiao Fu, Satoshi Fujita, Ian Galea, Luca Galluzzi, Federica Gani, Arvind P. Ganpule, Antonio García-Alix, Kristene Gedye, Maurizio Giordano, Cecilia Giunta, Paul A. Gleeson, Cyrille Goarant, Haipeng Gong, Diop Gora, Michael J. Gough, Ravinder Goyal, Kathryn E. Graham, Ana Grande-Pérez, Patricia M. Graves, Harm Greidanus, Darren Grice, Christoph Grunau, Yosephine Gumulya, Yabin Guo, Vsevolod V. Gurevich, Oleg Gusev, Elke Hacker, Steffen R. Hage, Guy Hagen, Steven Hahn, Dagmar M. Haller, Sven Hammerschmidt, Jianwei Han, Renzhi Han, Martin Handfield, Hapuarachchige C. Hapuarachchi, Timm Harder, Jennifer E. Hardingham, Michelle Heck, Marcel Heers, Khe F. Hew, Yohei Higuchi, Cynthia St. Hilaire, Rachel Hilton, Enisa Hodzic, Andrew Hone, Yuichi Hongoh, Guoku Hu, Heinz P. Huber, Luis E. Hueso, Judith Huirne, Lisa Hurt, Helena Idborg, Kazuho Ikeo, Evan Ingley, Philip M. Jakeman, Arne Jensen, Hong Jia, Husen Jia, Shuqin Jia, Jianping Jiang, Xingyu Jiang, Yi Jin, Daehyun Jo, Andrew M. Johnson, Marie Johnston, Karen R. Jonscher, Philippe G. Jorens, Jens O. L. Jorgensen, Johan W. Joubert, Sin-Ho Jung, Antonio M. Junior, Thomas Kahan, Sunjeev K. Kamboj, Yong-Kook Kang, Yannis Karamanos, Natasha A. Karp, Ryan Kelly, Ralph Kenna, Jonathan Kennedy, Birgit Kersten, Roy A. Khalaf, Javaria M. Khalid, T. Khatlani, Tarig Khider, Gregor S. Kijanka, Sarah R. B. King, Tomasz Kluz, Paul Knox, Tatsuya Kobayashi, Karl-Wilhelm Koch, Maija R. J. Kohonen-Corish, Xiangpeng Kong, Deborah Konkle-Parker, Kalevi M. Korpela, Leondios G. Kostrikis, Peter Kraiczy, Harald Kratz, Günter Krause, Paul H. Krebsbach, Søren R. Kristensen, Prerna Kumari, Akira Kunimatsu, Hatice Kurdak, Young D. Kwon, Carl Lachat, Malgorzata Lagisz, Brenda Laky, Jan Lammerding, Matthias Lange, Mar Larrosa, Andrew L. Laslett, Elizabeth E. LeClair, Kyung-Woo Lee, Ming-Yih Lee, Moon-Soo Lee, Genyuan Li, Jiansheng Li, Klaus Lieb, Yau Y. Lim, Merry L. Lindsey, Paul-Dag Line, Dengcai Liu, Fengbin Liu, Haiyan Liu, Hongde Liu, Vett K. Lloyd, Te-Wen Lo, Emanuela Locci, Josef Loidl, Johan Lorenzen, Stefan Lorkowski, Nigel H. Lovell, Hua Lu, Wei Lu, Zhiyong Lu, Gustavo S. Luengo, Lars-Gunnar Lundh, Philippe A. Lysy, Angela Mabb, Heather G. Mack, David A. Mackey, S. R. Mahdavi, Pamela Maher, Toby Maher, Sankar N. Maity, Brigitte Malgrange, Charalampos Mamoulakis, Arduino A. Mangoni, Thomas Manke, Antony S. R. Manstead, Athanasios Mantalaris, Jan Marsal, Hanns-Ulrich Marschall, Francis L. Martin, Jose Martinez-Raga, Encarnacion Martinez-Salas, Daniel Mathieu, Yoichi Matsui, Elie Maza, James E. McCutcheon, Gareth J. McKay, Brian McMillan, Nigel McMillan, Catherine Meads, Loreta Medina, B. Alex Merrick, Dennis W. Metzger, Frederic A. Meunier, Martin Michaelis, Olivier Micheau, Hisaaki Mihara, Eric M. Mintz, Takuo Mizukami, Yann Moalic, D. P. Mohapatra, Antonia Monteiro, Matthieu Montes, John V. Moran, Sergey Y. Morozov, Matthew Mort, Noriyuki Murai, Denis J. Murphy, Susan K. Murphy, Shauna A. Murray, Shinji Naganawa, Srinivas Nammi, Grigorios Nasios, Roman M. Natoli, Frederique Nguyen, Christine Nicol, Filip van Nieuwerburgh, Erlend B. Nilsen, Clarissa J. Nobile, Margaret O’Mahony, Sophie Ohlsson, Oluremi Olatunbosun, Per Olofsson, Alberto Ortiz, Kostya Ostrikov, Siegmar Otto, Tiago F. Outeiro, Songying Ouyang, Sabrina Paganoni, Andrew Page, Christoph Palm, Yin Paradies, Michael H. Parsons, Nick Parsons, Pigny Pascal, Elisabeth Paul, Michelle Peckham, Nicoletta Pedemonte, Michael A. Pellizzon, M. Petrelli, Alexander Pichugin, Carlos J. C. Pinto, John N. Plevris, Piero Pollesello, Martin Polz, Giovanna Ponti, Piero Porcelli, Martin Prince, Gwendolyn P. Quinn, Terence J. Quinn, Satu Ramula, Juri Rappsilber, Florian Rehfeldt, Jan H. Reiling, Claire Remacle, Mohsen Rezaei, Eric W. Riddick, Uwe Ritter, Neil W. Roach, David D. Roberts, Guillermo Robles, Tiago Rodrigues, Cesar Rodriguez, Jo Roislien, Monique J. Roobol, J. Alexandra Rowe, Andreas Ruepp, Jan van Ruitenbeek, Petra Rust, Sonia Saad, George H. Sack, Manuela Santos, Aurore Saudemont, Gianni Sava, Simone Schrading, Alexander Schramm, Martin Schreiber, Sidney Schuler, Joost Schymkowitz, Alexander Sczyrba, Kate L. Seib, Han-Ping Shi, Tomohiro Shimada, Jeon-Soo Shin, Colette Shortt, Patricia Silveyra, Debra Skinner, Ian Small, Paul A. M. Smeets, Po-Wah So, Francisco Solano, Daniel E. Sonenshine, Jiangning Song, Tony Southall, John R. Speakman, Mandyam V. Srinivasan, Laura P. Stabile, Andrzej Stasiak, Kathryn J. Steadman, Nils Stein, Andrew W. Stephens, Douglas I. Stewart, Keith Stine, Curt Storlazzi, Nataliya V. Stoynova, Wojciech Strzalka, Oscar M. Suarez, Taranum Sultana, Anirudha V. Sumant, Mathew J. Summers, Gang Sun, Paul Tacon, Kozo Tanaka, Haixu Tang, Yoshinori Tanino, Paul Targett-Adams, Mourad Tayebi, et al

Research output: Contribution to journalArticlepeer-review

23 Citations (Scopus)

Abstract

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science.
Original languageEnglish
Article numberbaz085
Pages (from-to)1-66
Number of pages66
JournalDatabase
Volume2019
DOIs
Publication statusPublished - 2019

Open Access - Access Right Statement

© The Author(s) 2019. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Keywords

  • databases
  • research

Fingerprint

Dive into the research topics of 'Large expert-curated database for benchmarking document similarity detection in biomedical literature search'. Together they form a unique fingerprint.

Cite this