scp-api

A dataset of SCP Items, Articles, and Metadata - Updated Daily

View on GitHub

SCP Data API

Welcome to the SCP Data API!

This is a static data dump of the SCP Wiki, broken down by article type. The data is crawled and updated on a daily basis.

There are two ways to use this data-

  1. Downloaded directly from the links below.
  2. Pulled in from this Github Repository.

Universal fields

Content and Metadata

Each SCP Item and Tale contains both Metadata about the article as well as the content of the article itself. This project splits those into separate files so that the metadata can be used directly without needing to download the entire wiki.

The content is also available for those who want it. Content is additional broken up into two types. For each page the #page-content content is stored as raw_content alongside the raw_source wikitext that generated it. This contains the story itself but excludes navigation elements, ads, and header data. Other than running it through a simple beautifier (beautifulsoup) no changes have been made.

If an item has a content_file field then that file is where you can get the content from. Otherwise the raw_content and raw_source fields will contain the article contents.

SCP Main Wiki Data

Hub

The SCP Wiki Hubs group articles together based on theme, canon, subject matter, or just whim. The Hub Dataset contains all of the Hubs with a list of the articles (Items, GOI, and Tales) that are part of that Hub.

Data File - data/scp/hubs/index.json

The Hub data is relatively small so it all exists in a single file. It is formatted as an object where the key is the link and the value an object with these fields (in addition to the universal field)-

Items

The SCP Items are perhaps the most well known part of the wiki. As such the Item Dataset is the largest dataset available.

Metadata File - data/scp/items/index.json

This file contains the metadata for all of the SCP Items. It contains an object with the link as the key and the item data as the value.

In addition to the universal fields it contains-

Content Index File - data/scp/items/index.json

The content index file is a key value pair object where the key is the name of the series and the value is a filename containing the content for that series. The filename is relative to the content_index.json file itself.

The content files themselves are identical to the items in the Metadata File above with the exception that the content_file field is replaced with the raw_content and raw_source fields.

Tales

Tales are short stories in the SCP universe. This datasets contain all of the tales that are not part of the GOI dataset.

Tale Index File - data/scp/items/index.json

Content Index File - data/scp/items/index.json

The content index file is a key value pair object where the key is the year the article was created and the value is a filename containing the content for that series. The filename is relative to the content_index.json file itself.

The content files themselves are identical to the items in the Metadata File above with the exception that the content_file field is replaced with the raw_content and raw_source fields.

GOI

GOI, or “Groups of Interest”, articles are typically created in special formats to match the GOI they are portraying. This dataset contains all of the GOI articles- the specific formats used can be inferred from the tags.

GOI Metadata File - data/scp/goi/index.json

The GOI structure is the same as the Tale structure.

Content GOI File - data/scp/goi/content_goi.json

The GOI items are small enough to fit in a single file. It is identical to the metadata file with the exception that the content_file field is replaced with the raw_content and raw_source fields.

Licensing

This project is not affiliated with the SCP Wiki or any of its admins.

All content from the wiki is subject to the license of the wiki.