Ergo documentation¶
Ergo is a Python library for integrating model-based and judgmental forecasting.
Getting Started¶
To get started with a template to work from, load this Colab notebook.
For more information about ergo, see the README.
See the sections below to learn more about using ergo.
To learn about contributing, read our CONTRIBUTING.md.
Metaculus¶
Metaculus¶
-
class
Metaculus
(api_domain='www', username=None, password=None)[source]¶ The main class for interacting with Metaculus
- Parameters
-
get_question
(id, name=None)[source]¶ Load a question from Metaculus
- Parameters
id (
int
) – Question id (can be read off from URL)name – Name to assign to this question (used in models)
- Return type
MetaculusQuestion
-
get_questions
(question_status='all', player_status='any', cat=None, pages=1, fail_silent=False, load_detail=True)[source]¶ Retrieve multiple questions from Metaculus API.
- Parameters
- Return type
List
[MetaculusQuestion
]
MetaculusQuestion¶
-
class
MetaculusQuestion
(id, metaculus, data, name=None)[source]¶ A forecasting question on Metaculus
- Parameters
- Variables
activity –
anon_prediction_count –
author –
author_name –
can_use_powers –
close_time – when the question closes
comment_count –
created_time – when the question was created
id – question id
is_continuous – is the question continuous or binary?
last_activity_time –
page_url – url for the question page on Metaculus
possibilities –
prediction_histogram – histogram of the current community prediction
prediction_timeseries – predictions on this question over time
publish_time – when the question was published
resolution –
resolve_time – when the question will resolve
status –
title –
type –
url –
votes –
-
static
get_central_quantiles
(df, percent_kept=0.95, side_cut_from='both')[source]¶ Get the values that bound the central (percent_kept) of the sample distribution, i.e., cutting the tails from these values will give you the central. If passed a dataframe with multiple variables, the bounds that encompass all variables will be returned.
- Parameters
- Returns
lower and upper values of the central (percent_kept) of the sample distribution.
-
refresh_question
()[source]¶ Refetch the question data from Metaculus, used when the question data might have changed
-
sample_community
()[source]¶ Get one sample from the distribution of the Metaculus community’s prediction on this question (sample is denormalized/on the the true scale of the question)
ContinuousQuestion¶
-
class
ContinuousQuestion
(id, metaculus, data, name=None)[source]¶ A continuous Metaculus question – a question of the form, what’s your distribution on this event?
-
change_since
(since)[source]¶ Calculate change in community prediction median between the argument and most recent prediction
- Parameters
since (
datetime
) – datetime- Returns
change in median community prediction since datetime
-
community_dist
()[source]¶ Get the community distribution for this question NB: currently missing the part of the distribtion outside the question range
- Return type
PointDensity
- Returns
the (true-scale) community distribution as a histogram.
-
community_dist_in_range
()[source]¶ A distribution for the portion of the current normalized community prediction that’s within the question’s range, i.e. 0…(len(self.prediction_histogram)-1).
- Returns
distribution on integers
-
denormalize_samples
(samples)[source]¶ Map samples from the Metaculus normalized scale to the true scale :param samples: samples on the normalized scale :return: samples from a distribution answering the prediction question
(true scale)
-
property
has_predictions
¶ Are there any predictions for the question yet?
-
property
high_open
¶ Are you allowed to place probability mass above the top of this question’s range?
- Return type
-
property
latest_community_percentiles
¶ - Returns
Some percentiles for the metaculus commununity’s latest rough prediction. prediction_histogram returns a more fine-grained histogram of the community prediction
-
property
low_open
¶ Are you allowed to place probability mass below the bottom of this question’s range?
- Return type
-
normalize_samples
(samples)[source]¶ Map samples from their true scale to the Metaculus normalized scale :param samples: samples from a distribution answering the prediction question
(true scale)
- Returns
samples on the normalized scale
-
property
p_outside
¶ How much probability mass is outside this question’s range?
-
prepare_logistic
(normalized_dist)[source]¶ Transform a single logistic distribution by clipping the parameters and adding scale information as needed for submission to Metaculus. The loc and scale have to be within a certain range for the Metaculus API to accept the prediction.
- Parameters
dist – a (normalized) logistic distribution
- Return type
Logistic
- Returns
a transformed logistic distribution
-
prepare_logistic_mixture
(normalized_dist)[source]¶ Transform a (normalized) logistic mixture distribution as needed for submission to Metaculus.
- Parameters
normalized_dist (
LogisticMixture
) – normalized mixture dist- Return type
LogisticMixture
- Returns
normalized dist clipped and formatted for the API
-
property
question_range
¶ Range of answers specified when the question was created
-
sample_community
()[source]¶ Sample an approximation of the entire current community prediction, on the true scale of the question. The main reason that it’s just an approximation is that we don’t know exactly where probability mass outside of the question range should be, so we place it arbitrarily
- Return type
- Returns
One sample on the true scale
-
sample_normalized_community
()[source]¶ Sample an approximation of the entire current community prediction, on the normalized scale. The main reason that it’s just an approximation is that we don’t know exactly where probability mass outside of the question range should be, so we place it arbitrarily.
- Return type
- Returns
One sample on the normalized scale
-
show_community_prediction
(percent_kept=0.95, side_cut_from='both', num_samples=1000, **kwargs)[source]¶ Plot samples from the community prediction on this question
-
show_prediction
(samples, plot_samples=True, plot_fitted=False, percent_kept=0.95, side_cut_from='both', show_community=False, num_samples=1000, **kwargs)[source]¶ Plot prediction on the true question scale from samples or a submission object. Optionally compare prediction against a sample from the distribution of community predictions
- Parameters
samples – samples from a distribution answering the prediction question (true scale). Can either be a 1-d array corresponding to one model’s predictions, or a pandas DataFrame with each column corresponding to a distinct model’s predictions
plot_samples (
bool
) – boolean indicating whether to plot the raw samplesplot_fitted (
bool
) – boolean indicating whether to compute Logistic Mixture Params from samples and plot the resulting fitted distribution. Note this is currently only supported for 1-d samplespercent_kept (
float
) – percentage of sample distrubtion to keepside_cut_from (
str
) – which side to cut tails from, either ‘both’,’lower’, or ‘upper’show_community (
bool
) – boolean indicating whether comparison to community predictions should be madenum_samples (
int
) – number of samples from the communitykwargs – additional plotting parameters
-
submit_from_samples
(samples, verbose=False)[source]¶ Submit prediction to Metaculus based on samples from a prediction distribution
- Parameters
samples – Samples from a distribution answering the prediction question
- Return type
Response
- Returns
logistic mixture params clipped and formatted to submit to Metaculus
-
LinearQuestion¶
-
class
LinearQuestion
(id, metaculus, data, name=None)[source]¶ A continuous Metaculus question that’s on a linear (as opposed to a log) scale”
-
get_true_scale_logistic
(normalized_dist)[source]¶ Convert a normalized logistic distribution to a logistic on the true scale of the question.
- Parameters
normalized_dist (
Logistic
) – normalized logistic distribution- Return type
Logistic
- Returns
logistic distribution on the true scale of the question
-
get_true_scale_mixture
(normalized_dist)[source]¶ Convert a normalized logistic mixture distribution to a logistic on the true scale of the question.
- Parameters
normalized_dist (
LogisticMixture
) – normalized logistic mixture dist- Return type
LogisticMixture
- Returns
same distribution rescaled to the true scale of the question
-
LinearDateQuestion¶
-
class
LinearDateQuestion
(id, metaculus, data, name=None)[source]¶
BinaryQuestion¶
-
class
BinaryQuestion
(id, metaculus, data, name=None)[source]¶ A binary Metaculus question – how likely is this event to happen, from 0 to 1?
-
change_since
(since)[source]¶ Calculate change in community prediction between the argument and most recent prediction
- Parameters
since (
datetime
) – datetime- Returns
change in community prediction since datetime
-
sample_community
()[source]¶ Sample from the Metaculus community distribution (Bernoulli).
- Return type
-
score_my_predictions
()[source]¶ Score all of my predictions according to the question resolution (or according to the current community prediction if the resolution isn’t available)
- Returns
List of ScoredPredictions with Brier scores
-
score_prediction
(prediction, resolution)[source]¶ Score a prediction relative to a resolution using a Brier Score.
- Parameters
prediction – how likely is the event to happen, from 0 to 1?
resolution (
float
) – how likely is the event to happen, from 0 to 1? (0 if it didn’t, 1 if it did)
- Return type
ScoredPrediction
- Returns
ScoredPrediction with Brier score, see https://en.wikipedia.org/wiki/Brier_score#Definition 0 is best, 1 is worst, 0.25 is chance
-
Foretold¶
PredictIt¶
This module lets you get question and prediction information from PredictIt via the API (https://predictit.freshdesk.com/support/solutions/articles/12000001878)
PredictIt¶
-
class
PredictIt
[source]¶ The main class for interacting with PredictIt.
-
get_market
(id)[source]¶ Return the PredictIt market with the given id. A market’s id can be found in the url of the market.
- Parameters
id (
int
) – market id- Return type
- Returns
market
-
property
markets
¶ Generate all of the markets currently in PredictIt.
- Return type
- Returns
iterator of PredictIt markets
-
PredictItMarket¶
-
class
PredictItMarket
(predictit, data)[source]¶ A PredictIt market.
- Parameters
- Variables
api_url (str) – url of the PredictIt API for the given question
name (str) – name of the market
shortName (str) – shortened name of the market
image (str) – url of the image resource of the market
url (str) – url of the market in PredictIt
status (str) – status of the market. Closed markets aren’t included in the API, so always “Open”
timeStamp (datetime.datetime) – last time the market was updated. The API updates every minute, but timestamp can be earlier if it hasn’t been traded in
-
get_question
(id)[source]¶ Return the specified question given by the id number.
- Parameters
id (
int
) – question id- Return type
- Returns
question
-
property
questions
¶ Generate all of the questions in the market.
- Return type
- Returns
generator of questions in market
PredictItQuestion¶
-
class
PredictItQuestion
(market, data)[source]¶ A single binary question in a PredictIt market.
- Parameters
market (
PredictItMarket
) – PredictIt market instancedata (
Dict
) – Contract JSON retrieved from PredictIt API
- Variables
market (PredictItMarket) – PredictIt market instance
dateEnd (datetime.datetime) – end-date of a market, usually None
image (str) – url of the image resource for the contract
name (str) – name of the contract
shortName (str) – shortened name of the contract
status (str) – status of the contract. Closed markets aren’t included in the API, so always “Open”
lastTradePrice (float) – last price the contract was traded at
bestBuyYesCost (float) – cost to buy a single Yes share
bestBuyNoCost (float) – cost to buy a single No share
bestSellYesCost (float) – cost to sell a single Yes share
bestSellNoCost (float) – cost to sell a single No share
lastClosePrice (float) – price the contract closed at the previous day
displayOrder (int) – position of the contract in PredictIt. Defaults to 0 if sorted by lastTradePrice
-
sample_community
()[source]¶ Sample from the PredictIt community distribution (Bernoulli).
- Return type
- Returns
true/false
-
static
to_dataframe
(questions, columns=None)[source]¶ Summarize a list of questions in a dataframe
- Parameters
questions (
List
[PredictItQuestion
]) – questions to summarizecolumns – list of column names as strings
- Return type
DataFrame
- Returns
pandas dataframe summarizing the questions
Contribute to Ergo core¶
To get started:
git clone https://github.com/oughtinc/ergo.git
poetry install
poetry shell
poetry
¶
Ergo uses poetry to manage its dependencies and environments.
Follow these directions to install poetry if you don’t already have it.
Troubleshooting: If you get Could not find a version that satisfies the requirement jaxlib ...
after using poetry to install, this is probably because your virtual environment has old version of pip due to how poetry choses pip versions.
Try:
poetry run pip install -U pip
poetry install
again
Before submitting a PR¶
Run
poetry install
to make sure you have the latest dependenciesFormat code using
make format
(black, isort)Run linting using
make lint
(flake8, mypy, black check)Run tests using
make test
To run the tests in
test_metaculus.py
, you’ll need our secret .env file. If you don’t have it, you can ask us for it, or rely on Travis CI to run those tests for you.
Generate docs using
make docs
, loaddocs/build/html/index.html
and review the generated docsOr run all of the above using
make all
Contribute to Ergo notebooks¶
How to change a notebook and make a PR¶
Open the notebook in JupyterLab or Colab (Run a notebook in Colab or JupyterLab)
Make your changes
Follow our Notebook Style
Run the notebook in Colab. Save the .ipynb file (with output) in
ergo/notebooks
Run make scrub. This will produce a scrubbed version of the notebook in ergo/notebooks/scrubbed/.
1. You can git diff the scrubbed version against the previous scrubbed version to more easily see what you changed
You may want to use nbdime for better diffing
6. You can now make a PR with your changes. If you make a PR in the original ergo repo (not a fork), you can then use the auto-comment from ReviewNB to more thoroughly vet your changes
Run a notebook in Colab or JupyterLab¶
Colab¶
click “GitHub” on the “new notebook” dialog, then enter the notebook URL. Or:
go to “Upload” and upload the notebooks
ipynb
file. Or:
Install and use the Open in Colab Chrome extension
JupyterLab¶
git clone https://github.com/oughtinc/ergo.git
poetry install
poetry shell
jupyter lab
Notebook Style¶
How to clean up a notebook for us to feature:
Make sure that the notebook meets a high standard in general:
high-quality code
illuminating data analysis
clear communication of what you’re doing and your findings
as short as possible, but no shorter
this random style guide I found in a few minutes of Googling seems good, but it’s not our official style guide or anything
Do the following specific things to clean up:
as much as possible, avoid showing extraneous output from cells
you can use the
%%capture
magic to suppress all output from a cell (helpful if a function in the cell prints something)you can add a
;
at the end of the last line in a cell to suppress printing the return value of the linethink about what cells the reader really needs to see vs. which ones just have to be there for setup or whatnot. Collapse the latter.
use the latest version of
ergo
make sure that any secrets like passwords are removed from the notebook
Pull out any code not central to the main point of the model into a module in
ergo/contrib/
. See Notebook contrib folder for details.
The featured notebooks in our README should be exemplars of the above, so refer to those to see what this looks like in practice.
Notebook contrib folder¶
Adding new packages¶
For modules providing functionality specific to the questions
addressed in a notebook, create a new package in contrib
/ergo/contrib/{your_package}
and include an __init__.py
file. You can then access it in your notebook with:
from ergo.contrib.{your_package} import {module_you_want}
For modules providing more general functionality of use across
notebooks (and perhaps a candidate for inclusion in core ergo), you
can use /ergo/contrib/utils
. You can either add a new module or
extend an existing one. You can then access it with:
from ergo.contrib.utils import {module_you_want}
Adding dependencies¶
Usual poetry way with –optional flag
poetry add {pendulum} --optional
You can then (manually in the
pyproject.toml
) add it to the ‘notebook’ group
(Look for “extras” in pyproject.toml
)
[tool.poetry.extras]
notebooks = [
"pendulum",
"scikit-learn",
"{your_dependency}"
]
(To my knowledge) there is no way currently to do this second step with the CLI.
This allows people to then install the additional notebook dependencies with:
poetry install -E notebooks
Loading data from Google Sheets¶
Three methods for loading data from google sheets into a Colab Notebook
Method 1 (Public CSV)¶
If you’re willing to make your spreadsheet public, you can publish it as a CSV file on Google Sheets. Go to File > Publish to the Web, and select the CSV format. Then you can copy the published url, and load it in python using pandas.
import pandas as pd
df = pd.read_csv(url)
Method 2 (OAuth)¶
This method requires the user of the colab to authorize it every time the colab runs, but can work with non-public sheets
# Authentication
import google
google.colab.auth.authenticate_user()
google_sheets_credentials = GoogleCredentials.get_application_default()
gc = gspread.authorize(google_sheets_credentials)
# Load spreadsheet
wb = gc.open_by_url(url)
sheet = wb.worksheet(sheet)
values = sheet.get_all_values()
Method 3 (Service Account)¶
This method requires your to follow the instructions at https://gspread.readthedocs.io/en/latest/oauth2.html to create a google service account. You then need to share the google sheet with the service account email address.
# Need a newer version of gspread than included by default in Colab
!pip install --upgrade gspread
service_account_info = {} #JSON for google service account
import gspread
from google.oauth2.service_account import Credentials
scope = ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']
credentials = Credentials.from_service_account_info(service_account_info, scopes=scope)
gc = gspread.authorize(credentials)
# Load spreadsheet
wb = gc.open_by_url(url)
sheet = wb.worksheet(sheet)
values = sheet.get_all_values()