Time to Embed: Unlocking Foundation Models for Time Series with Channel Descriptions
JEPA is coming to time series!
Authors: Utsav Dutta, Sina Khoshfetrat Pakazad, Henrik Ohlsson
Paper: https://arxiv.org/abs/2505.14543
Code: Not publicly available
Model: Not publicly available
TL;DR
WHAT was done? The paper introduces CHARM (CHannel-Aware Representation Model), a 7M-parameter foundation embedding model for multivariate time series. Its key innovation is the architectural integration of textual channel descriptions to create domain-aware representations. This is achieved through a novel Contextual Temporal Convolutional Network (TCN) and custom Contextual Attention Layers that use text to modulate inter-channel interactions and temporal dependencies. The model is trained using a Joint Embedding Predictive Architecture (JEPA), a self-supervised method that predicts latent representations of masked data segments rather than reconstructing noisy raw signals, making the learned embeddings more robust.
WHY it matters? CHARM sets a new state-of-the-art across diverse downstream tasks—forecasting, classification, and anomaly detection—often outperforming specialized models. It marks a significant shift from treating time series as undifferentiated numerical streams to understanding them through semantic context, similar to how human experts operate. This approach not only improves performance but also enhances model interpretability by revealing learned cross-channel dynamics. By successfully demonstrating a robust, transferable, and semantically grounded foundation model, CHARM paves the way for more intelligent, general-purpose AI for time series analysis.
Details
The Challenge: Time Series Models Lack Context
For years, time series analysis has been dominated by task-specific models that, while effective in narrow domains, struggle to generalize. The rise of foundation models in NLP and vision has inspired similar efforts in the time series community, yet most approaches still treat multivariate time series as a collection of undifferentiated numerical streams. They often overlook a crucial piece of information that human experts rely on: the context of what each channel or sensor actually measures. This lack of semantic awareness limits their ability to adapt to new sensor configurations and can lead to models that are brittle and difficult to interpret.
The paper "Time to Embed" introduces a novel model, CHARM (CHannel-Aware Representation Model), that directly tackles this challenge. It presents a new paradigm for time series representation learning by building a foundation model that is not just aware of temporal patterns, but also of the semantic meaning behind each channel.
Keep reading with a 7-day free trial
Subscribe to ArXivIQ to keep reading this post and get 7 days of free access to the full post archives.