SCP Data API
Welcome to the SCP Data API!
This is a static data dump of the SCP Wiki, broken down by article type. The data is crawled and updated on a daily basis.
There are two ways to use this data-
- Downloaded directly from the links below.
- Pulled in from this Github Repository.
Universal fields
history
fields contain an ordered list of objects, each containing-author
- the display name of the author of the revision.author_url
- a link to the author’s wikidot page.comment
- the comment for the revision. This is often blank, but the automated messages can be interesting.date
- the date time of the commit in the format “2020-02-20T19:10:00”.
link
fields are the primary keys for the data, generated by using the path fragment of the URL.page_id
- the Wikidot ID for the page itself. This can be used to hit the WikiDot API to pull additional information about the page.created_at
- the creation date of the article in the format “2020-02-20T19:10:00”.created_by
- WikiDot Username of the author of the first commit.url
- a direct link to the crawled page.
Content and Metadata
Each SCP Item and Tale contains both Metadata about the article as well as the content of the article itself. This project splits those into separate files so that the metadata can be used directly without needing to download the entire wiki.
The content is also available for those who want it. Content is additional broken up into two types. For each page the #page-content
content is stored as raw_content
alongside the raw_source
wikitext that generated it. This contains the story itself but excludes navigation elements, ads, and header data. Other than running it through a simple beautifier (beautifulsoup) no changes have been made.
If an item has a content_file
field then that file is where you can get the content from. Otherwise the raw_content
and raw_source
fields will contain the article contents.
SCP Main Wiki Data
Hub
The SCP Wiki Hubs group articles together based on theme, canon, subject matter, or just whim. The Hub Dataset contains all of the Hubs with a list of the articles (Items, GOI, and Tales) that are part of that Hub.
Data File - data/scp/hubs/index.json
The Hub data is relatively small so it all exists in a single file. It is formatted as an object where the key is the link
and the value an object with these fields (in addition to the universal field)-
title
- The user friendly name of the hub.references
- a list oflink
strings representing articles and tales that are in this hub.tags
- the tags for the hub page.raw_content
Items
The SCP Items are perhaps the most well known part of the wiki. As such the Item Dataset is the largest dataset available.
Metadata File - data/scp/items/index.json
This file contains the metadata for all of the SCP Items. It contains an object with the link as the key and the item data as the value.
In addition to the universal fields it contains-
content_file
- the file, relative to the index.json file, that contains the content for the article.references
- a list oflink
strings representing articles and tales that are in this hub.tags
- the tags for the hub page.title
- The user friendly name of the hub.hubs
- a list oflink
strings for all of the hubs the item is in.images
- a list of URLs for the images on each page.rating
- the rating of the article based off of Wikidot votes.scp
- the full SCP label.scp_number
- the SCP number (SCP-682 would be 682).series
- the SCP Series that Item is part of. Includes the numbered series as well as joke, archive, and other categories.
Content Index File - data/scp/items/index.json
The content index file is a key value pair object where the key is the name of the series
and the value is a filename containing the content for that series. The filename is relative to the content_index.json
file itself.
The content files themselves are identical to the items in the Metadata File
above with the exception that the content_file
field is replaced with the raw_content
and raw_source
fields.
Tales
Tales are short stories in the SCP universe. This datasets contain all of the tales that are not part of the GOI dataset.
Tale Index File - data/scp/items/index.json
content_file
- the file, relative to the index.json file, that contains the content for the article.created_at
- “2020-02-20T19:10:00”created_by
- WikiDot Username of the page creator (which is normally the author).link
- this is the same as the Key.page_id
- the WikiDot ID for the page.references
- a list oflink
strings representing articles and tales that are in this hub.tags
- the tags for the hub page.title
- The user friendly name of the hub.url
- a direct url to the hub page.hubs
- a list oflink
strings for all of the hubs the item is in.images
- a list of URLs for the images on each page.rating
- the rating of the article based off of Wikidot votes.history
Content Index File - data/scp/items/index.json
The content index file is a key value pair object where the key is the year the article was created and the value is a filename containing the content for that series. The filename is relative to the content_index.json
file itself.
The content files themselves are identical to the items in the Metadata File
above with the exception that the content_file
field is replaced with the raw_content
and raw_source
fields.
GOI
GOI, or “Groups of Interest”, articles are typically created in special formats to match the GOI they are portraying. This dataset contains all of the GOI articles- the specific formats used can be inferred from the tags.
GOI Metadata File - data/scp/goi/index.json
The GOI structure is the same as the Tale structure.
Content GOI File - data/scp/goi/content_goi.json
The GOI items are small enough to fit in a single file. It is identical to the metadata file with the exception that the content_file
field is replaced with the raw_content
and raw_source
fields.
Licensing
This project is not affiliated with the SCP Wiki or any of its admins.
All content from the wiki is subject to the license of the wiki.