The Semantic Scholar Open Data Platform

Rodney Michael Kinney; Chloe Anastasiades; Russell Authur; Iz Beltagy; Jonathan Bragg; Alexandra Buraczynski; Isabel Cachola; Stefan Candra; Yoganand Chandrasekhar; Arman Cohan; Miles Crawford; Doug Downey; Jason Dunkelberger; Oren Etzioni; Rob Evans; Sergey Feldman; Joseph Gorney; D. Graham; F.Q. Hu; Regan Huff; Daniel King; Sebastian Kohlmeier; Bailey Kuehl; Michael Langan; Daniel Lin; Haokun Liu; Kyle Lo; Jaron Lochner; Kelsey MacMillan; Tyler C. Murray; Christopher Newell; Smita R Rao; Shaurya Rohatgi; Paul Sayre; Shannon Zejiang Shen; Amanpreet Singh; Luca Soldaini; Shivashankar Subramanian; A. Tanaka; Alex D Wade; Linda M. Wagner; Lucy Lu Wang; Christopher Wilhelm; Caroline Wu; Jiangjiang Yang; Angele Zamarron; Madeleine van Zuylen; Daniel S. Weld

DOI:10.48550/arXiv.2301.10140
Corpus ID: 256194545

The Semantic Scholar Open Data Platform

@article{Kinney2023TheSS,
  title={The Semantic Scholar Open Data Platform},
  author={Rodney Michael Kinney and Chloe Anastasiades and Russell Authur and Iz Beltagy and Jonathan Bragg and Alexandra Buraczynski and Isabel Cachola and Stefan Candra and Yoganand Chandrasekhar and Arman Cohan and Miles Crawford and Doug Downey and Jason Dunkelberger and Oren Etzioni and Rob Evans and Sergey Feldman and Joseph Gorney and David W. Graham and F.Q. Hu and Regan Huff and Daniel King and Sebastian Kohlmeier and Bailey Kuehl and Michael Langan and Daniel Lin and Haokun Liu and Kyle Lo and Jaron Lochner and Kelsey MacMillan and Tyler C. Murray and Christopher Newell and Smita R Rao and Shaurya Rohatgi and Paul Sayre and Shannon Zejiang Shen and Amanpreet Singh and Luca Soldaini and Shivashankar Subramanian and A. Tanaka and Alex D Wade and Linda M. Wagner and Lucy Lu Wang and Christopher Wilhelm and Caroline Wu and Jiangjiang Yang and Angele Zamarron and Madeleine van Zuylen and Daniel S. Weld},
  journal={ArXiv},
  year={2023},
  volume={abs/2301.10140},
  url={https://api.semanticscholar.org/CorpusID:256194545}
}

Rodney Michael KinneyChloe Anastasiades Daniel S. Weld
Published in arXiv.org 2023
Computer Science

This paper combines public and proprietary data sources using state-of-theart techniques for scholarly PDF content extraction and automatic knowledge graph construction to build the Semantic Scholar Academic Graph, the largest open scientific literature graph to-date.

[PDF] Semantic Reader

Figures and Tables from this paper

Topics

Semantic Scholar Natural Language Summaries Open Data Semantic Features Scientific Output Application Processing Interface

The Semantic Reader Project

Kyle LoJoseph Chee Chang Daniel S. Weld

Computer Science

Communications of the ACM

2024

The Semantic Reader Project is described, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers, and a collection of novel reading interfaces are developed and evaluated and evaluated them with study participants and real-world users to show improved reading experiences for scholars.

The Semantic Scholar Open Data Platform

Ask This Paper

Ask a question about " "

Supporting Statements

Figures and Tables from this paper

Topics

133 Citations

The Semantic Reader Project

Toward Robust URL Extraction for Open Science: A Study of arXiv File Formats and Temporal Trends

MIR: Methodology Inspiration Retrieval for Scientific Research Problems

FoRC4CL: A Fine-grained Field of Research Classification and Annotated Dataset of NLP Articles

On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

S2abEL: A Dataset for Entity Linking from Scientific Tables

The OpenCitations Index: description of a database providing open citation data

CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers

Content Aware Analysis of Scholarly Networks: A Case Study on CORD19 Dataset

The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices

19 References

S2ORC: The Semantic Scholar Open Research Corpus

Construction of the Literature Graph in Semantic Scholar

Structural Scaffolds for Citation Intent Classification in Scientific Publications

An Overview of Microsoft Academic Service (MAS) and Applications

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Identifying Meaningful Citations

PubLayNet: Largest Dataset Ever for Document Layout Analysis

SciBERT: A Pretrained Language Model for Scientific Text

SPECTER: Document-level Representation Learning using Citation-informed Transformers

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Related Papers