Datadog’s Toto model was trained on roughly one trillion time series data points—75% from curated observability metrics and 25% from the LOTSA dataset. Through padding, masking, and data augmentation (including random offsets and Gaussian sampling), Datadog ensured data diversity and quality. Synthetic data (about 5%) simulated additional real-world variability via ARMA processes, seasonal trends, and noise. Together, these methods improved Toto’s robustness and ability to generalize across domains.Datadog’s Toto model was trained on roughly one trillion time series data points—75% from curated observability metrics and 25% from the LOTSA dataset. Through padding, masking, and data augmentation (including random offsets and Gaussian sampling), Datadog ensured data diversity and quality. Synthetic data (about 5%) simulated additional real-world variability via ARMA processes, seasonal trends, and noise. Together, these methods improved Toto’s robustness and ability to generalize across domains.

How Datadog Turned Noisy Observability Metrics Into AI Gold

2025/10/23 00:06
  1. Background
  2. Problem statement
  3. Model architecture
  4. Training data
  5. Results
  6. Conclusions
  7. Impact statement
  8. Future directions
  9. Contributions
  10. Acknowledgements and References

Appendix

4 Training data

We pretrained Toto with a dataset of approximately one trillion time series points. Of these, roughly three-quarters are anonymous observability metrics from the Datadog platform. The remaining points come from the LOTSA dataset [15], a compilation of publicly-available time series datasets across many different domains.

\ 4.1 Datadog dataset

\ The Datadog platform ingests more than a hundred trillion events per day. However, much of this data is sparse, noisy, or too granular or high in cardinality to be useful in its raw form. To curate a highquality dataset for efficient model training, we sample queries based on quality and relevance signals from dashboards, monitor alerts, and notebooks. This provides a strong signal that the data resulting from these queries is of critical importance and sufficient quality for observability of real-world applications.

\ Datadog metrics are accessed using a specialized query language supporting filters, group-bys, time aggregation, and various transformations and postprocessing functions [43]. We consider groups returned from the same query to be related variates in a multivariate time series (Fig. 4). After we retrieve the query results, we discard the query strings and group identifiers, keeping only the raw numeric data.

\ Handling this vast amount of data requires several preprocessing steps to ensure consistency and quality. Initially, we apply padding and masking techniques to align the series lengths, making them divisible by the patch stride. This involves adding necessary left-padding to both the time series data and the ID mask, ensuring compatibility with the model's requirements.

\ Various data augmentations are employed to enhance the dataset's robustness. We introduce random time offsets to prevent memorization caused by having series always align the same way with the patch grid. After concatenating the Datadog and LOTSA datasets for training, we also implement a variate shuffling strategy to maintain diversity and representation. Specifically, 10% of the time, we combine variates that are not necessarily related, thus creating new, diverse combinations of data points. To sample the indices, we employ a normal distribution with a standard deviation of 1000, favoring data points that were closer together in the original datasets. This Gaussian sampling ensures that, while there is a preference for adjacent data points, significant randomness is introduced to enhance the diversity of the training data. This approach improves the model's ability to generalize across different types of data effectively.

\ By implementing these rigorous preprocessing steps and sophisticated data handling mechanisms, we ensure that the training data for Toto is of the highest quality, ultimately contributing to the model's superior performance and robustness.

\ 4.2 Synthetic data

\ We use a synthetic data generation process similar to TimesFM [19] to supplement our training datasets, improving the diversity of the data and helping to teach the model basic structure. We simulate time series data through the composition of components such as piecewise linear trends, ARMA processes, sinusoidal seasonal patterns, and various residual distributions. We randomly combine five of these processes per variate, introducing patterns not always present in our real-world datasets. The generation process involves creating base series with random transformations, clipping extreme values, and rescaling to a specified range. By making synthetic data approximately 5% of our training dataset, we ensure a wide range of time-series behaviors are captured. This diversity exposes our models to various scenarios during training, improving their ability to generalize and effectively handle real-world data.

\

:::info Authors:

(1) Ben Cohen (ben.cohen@datadoghq.com);

(2) Emaad Khwaja (emaad@datadoghq.com);

(3) Kan Wang (kan.wang@datadoghq.com);

(4) Charles Masson (charles.masson@datadoghq.com);

(5) Elise Rame (elise.rame@datadoghq.com);

(6) Youssef Doubli (youssef.doubli@datadoghq.com);

(7) Othmane Abou-Amal (othmane@datadoghq.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Share Insights

You May Also Like

Primev’s FAST RPC Could Speed Up Ethereum Transactions to 200ms

Primev’s FAST RPC Could Speed Up Ethereum Transactions to 200ms

The post Primev’s FAST RPC Could Speed Up Ethereum Transactions to 200ms appeared on BitcoinEthereumNews.com. COINOTAG recommends • Exchange signup 💹 Trade with pro tools Fast execution, robust charts, clean risk controls. 👉 Open account → COINOTAG recommends • Exchange signup 🚀 Smooth orders, clear control Advanced order types and market depth in one view. 👉 Create account → COINOTAG recommends • Exchange signup 📈 Clarity in volatile markets Plan entries & exits, manage positions with discipline. 👉 Sign up → COINOTAG recommends • Exchange signup ⚡ Speed, depth, reliability Execute confidently when timing matters. 👉 Open account → COINOTAG recommends • Exchange signup 🧭 A focused workflow for traders Alerts, watchlists, and a repeatable process. 👉 Get started → COINOTAG recommends • Exchange signup ✅ Data‑driven decisions Focus on process—not noise. 👉 Sign up → Ethereum mainnet transactions can now achieve preconfirmations in under 200 milliseconds using Primev’s FAST RPC solution, rivaling the speed of high-performance blockchains while staying on the layer 1 network. This innovation enables rapid Ether transfers, smart contract interactions, and NFT minting without relying on layer 2 solutions. Primev’s FAST RPC delivers preconfirmations in 200ms or less, supercharging Ethereum’s mainnet for near-instant transactions. Users can integrate FAST RPC easily with wallets like MetaMask, replacing slower providers for faster onchain interactions. Over 400,000 developers use established RPCs like Infura, but FAST RPC offers a promising alternative for speed-focused Ethereum users, potentially processing billions in transactions annually. Discover how Primev’s FAST RPC accelerates Ethereum transactions to 200ms preconfirmations. Stay on mainnet for blazing-fast ETH transfers and dApp interactions—explore the future of layer 1 speed today. What is Ethereum’s FAST RPC and How Does It Work? Ethereum’s FAST RPC is an innovative remote procedure call solution developed by the Ethereum infrastructure platform Primev, designed to drastically reduce transaction confirmation times on the mainnet to under 200 milliseconds. This technology provides early preconfirmations from…
Share
2025/10/23 10:04
Share
Vitalik: ZK and FHE will reshape the future of blockchain, and cryptography is entering the era of "usability"

Vitalik: ZK and FHE will reshape the future of blockchain, and cryptography is entering the era of "usability"

PANews reported on October 23rd that at the 2025 Shanghai International Blockchain Week and the 11th Blockchain Global Summit, Ethereum co-founder Vitalik Buterin stated in his speech that blockchain and cryptography technologies have made tremendous progress over the past decade, moving from early exploration to a new stage of "scalability, developer-friendliness, and low-cost." He noted that the rapid development of zero-knowledge proofs (ZK) and homomorphic encryption (FHE) has made real-time verification of Ethereum's L1 blocks a reality, and that blockchain is becoming more efficient, decentralized, and privacy-enhancing. Vitalik emphasized that cryptography is moving from "theoretical" to "universal availability." He stated that over the next five to ten years, the cost of technologies like ZK, FHE, and L2 will be close to zero, becoming integrated into all applications, just like signatures and encryption. He also proposed a new security philosophy: "Not your silicon, not your private key," emphasizing the importance of hardware trustworthiness and privacy protection. He encouraged developers to actively participate in the development of the ZK and blockchain ecosystems, from entrepreneurship and underlying R&D to application practice, to jointly promote the formation of the next generation of decentralized infrastructure.
Share
2025/10/23 10:12
Share