Building the industry's first open-source datasets for investment research. Unlike traditional vendors who limit trials to institutions, we believe in open access. Free access to all datasets with a brief delay; subscribers receive real-time data.
๐ About Me
I'm Derek Snow, founder and researcher at sov.ai. My focus lies in AI & ML in Quantitative Finance, where I develop and curate datasets to advance research and applications in this field.
๐ Connect with Me
LinkedIn | GitHub | Hugging Face
๐ Datasets
The price is for commercial usage of lagged data. If you are an academic ignore it, simply download your data from Huggingface!
| Emoji | Dataset | Description | Documentation | Price p/m |
|---|---|---|---|---|
| ๐ฐ | sovai/news_sentiment | Two types of news datasets have been developed, one is ticker-matched, and the next is theme-matched. | Documentation | $200 |
| ๐ | sovai/price_breakout | A dataset with daily updated predictions of price breaking upwards for US Equities. | Documentation | $220 |
| ๐ | sovai/insider_flow_prediction | More than 60+ insider trading features helpful for machine learning, including a flow prediction value. | Documentation | $465 |
| ๐ผ | sovai/institutional_trading | The dataset provides a comprehensive analysis of institutional investment behaviors, strategies, and portfolio dynamics. | Documentation | $580 |
| ๐ข | sovai/lobbying_data | A ticker-matched lobbying data to see fine-grained corporate lobbying behavior. | Documentation | $645 |
| ๐ฝ | sovai/short_selling | This section covers the usage of various short-selling datasets for risk analysis. | Documentation | $780 |
| ๐ | sovai/wikipedia_views | A look at some of the largest firms and their daily Wikipedia page views and trends. | Documentation | $200 |
| ๐ | sovai/pharma_clinical_trials | This section covers a very unique dataset that tags clinical trials with their predicted outcome success. | Documentation | $850 |
| ๐ | sovai/factor_signals | This dataset includes traditional accounting factors, alternative financial metrics, and advanced statistical analyses, enabling sophisticated financial modeling. | Documentation | $270 |
| ๐ | sovai/financial_ratios | More than 80+ financial ratios calculated from financial statement and market data. | Documentation | $270 |
| ๐ | sovai/government_contracts | Government contracts data from publicly traded companies. | Documentation | $580 |
| sovai/corp_risks | Chapter 7 and Chapter 11 bankruptcy predictions made easy for over 13,000 US publicly traded stocks. | Documentation | $270 | |
| ๐ก๏ธ | sovai/risks | We offer daily updates on global risk perceptions, using leading indicators and advanced models to forecast various types of risk. | Documentation | $270 |
| ๐ฌ | sovai/cfpb_complaints | This section covers the usage of the Consumer Financial Complaint ticker-mapped dataset. | Documentation | $480 |
| ๐งฎ | sovai/risk_indicators | We construct a comprehensive corporate risk score for US stocks by analyzing company events. | Documentation | $270 |
| ๐ฆ | sovai/traffic_agencies | Data on government website agency traffic. | Documentation | $250 |
| ๐ฅ | sovai/earnings_surprise | Earnings announcements are obtained from external sources as well as estimate information leading up to the actual announcement. | Documentation | $680 |
| โ | sovai/bankruptcy | Chapter 7 and Chapter 11 bankruptcy predictions made easy for over 5,000 US publicly traded stocks. | Documentation | $270 |
Cost Tip: For commercial access to all 30 real-time datasets on docs.sov.ai, I recommend you subscribe to the $285 p/m package, you can save as much as 90% of the costs.
All our datasets are in beta, be part of our development process. Submit suggestions or error reports through the issues portal.
๐งช Example Use Cases
Below are example code snippets demonstrating how to load each dataset using the Hugging Face datasets library.
-
๐ฐ sovai/news_sentiment
from datasets import load_dataset df_news_sentiment = load_dataset("sovai/news_sentiment", split="train").to_pandas()
-
๐ sovai/price_breakout
from datasets import load_dataset df_price_breakout = load_dataset("sovai/price_breakout", split="train").to_pandas()
-
๐ sovai/insider_flow_prediction
from datasets import load_dataset df_insider_flow = load_dataset("sovai/insider_flow_prediction", split="train").to_pandas()
-
๐ผ sovai/institutional_trading
from datasets import load_dataset df_institutional_trading = load_dataset("sovai/institutional_trading", split="train").to_pandas()
-
๐ข sovai/lobbying_data
from datasets import load_dataset df_lobbying_data = load_dataset("sovai/lobbying_data", split="train").to_pandas()
-
๐ฝ sovai/short_selling
from datasets import load_dataset df_short_selling = load_dataset("sovai/short_selling", split="train").to_pandas()
-
from datasets import load_dataset df_wikipedia_views = load_dataset("sovai/wikipedia_views", split="train").to_pandas()
-
๐ sovai/pharma_clinical_trials
from datasets import load_dataset df_pharma_trials = load_dataset("sovai/pharma_clinical_trials", split="train").to_pandas()
-
๐ sovai/factor_signals
from datasets import load_dataset df_factor_signals = load_dataset("sovai/factor_signals", split="train").to_pandas()
-
from datasets import load_dataset df_financial_ratios = load_dataset("sovai/financial_ratios", split="train").to_pandas()
-
๐ sovai/government_contracts
from datasets import load_dataset df_government_contracts = load_dataset("sovai/government_contracts", split="train").to_pandas()
-
โ ๏ธ sovai/corp_risksfrom datasets import load_dataset df_corp_risks = load_dataset("sovai/corp_risks", split="train").to_pandas()
-
๐ก๏ธ sovai/risks
from datasets import load_dataset df_risks = load_dataset("sovai/risks", split="train").to_pandas()
-
from datasets import load_dataset df_cfpb_complaints = load_dataset("sovai/cfpb_complaints", split="train").to_pandas()
-
from datasets import load_dataset df_risk_indicators = load_dataset("sovai/risk_indicators", split="train").to_pandas()
-
from datasets import load_dataset df_traffic_agencies = load_dataset("sovai/traffic_agencies", split="train").to_pandas()
-
from datasets import load_dataset df_earnings_surprise = load_dataset("sovai/earnings_surprise", split="train").to_pandas()
-
โ sovai/bankruptcy
from datasets import load_dataset df_bankruptcy = load_dataset("sovai/bankruptcy", split="train").to_pandas()