量化研究---qmt股票多周期合成转化代码

*******0079 · 发表于 2025-5-21 15:55:44

在策略交易中经常需要跨多周期调用数据，但是给的接口没有这个数据，需要我们小周期数据合成大周期数据，比如tick转1分钟，qmt的数据也是tick合成的，比如1分钟合成60，65，90等周期数据，这里我们需要利用pandas的合成函数

Pandas resample 教程

是 Pandas 中用于时间序列重采样的强大工具，它可以将时间序列从一个频率转换到另一个频率（如将每日数据转换为每月数据）。本教程将详细介绍的使用方法。

基本概念

重采样分为两种类型：

降采样(Downsampling)：从高频到低频的转换（如天→月）
升采样(Upsampling)：从低频到高频的转换（如月→天）

基本语法

df.resample(rule, axis=0, closed=None, label=None, 
convention='start', kind=None, loffset=None, base=0, 
on=None, level=None)

主要参数：

rule：重采样频率字符串（如 'D'表示天，'M'表示月）
closed：设置区间闭合方向（'right'或'left'）
label：设置聚合结果的标签（'right'或'left'）
convention：重采样时期转换约定（'start'或'end'）

常用频率字符串

别名	描述
B	工作日
D	日历日
W	周
M	月末
Q	季末
A/Y	年末
H	小时
T/min	分钟
S	秒
L/ms	毫秒
U	微秒

示例代码

1. 创建示例数据

import pandas as pd
import numpy as np


# 创建日期范围
date_rng = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')


# 创建DataFrame
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0, 100, size=(len(date_rng)))
df.set_index('date', inplace=True)


print(df.head())

2. 降采样示例

# 按月求平均值
monthly_mean = df.resample('M').mean()
print(monthly_mean)


# 按周求和
weekly_sum = df.resample('W').sum()
print(weekly_sum.head())


# 使用多种聚合函数
monthly_stats = df.resample('M').agg(['mean', 'sum', 'min', 'max'])
print(monthly_stats)

3. 升采样示例

# 创建低频数据
monthly_df = df.resample('M').mean()


# 升采样到日频并填充
daily_ffill = monthly_df.resample('D').ffill()  # 前向填充
daily_bfill = monthly_df.resample('D').bfill()  # 后向填充
daily_interpolate = monthly_df.resample('D').interpolate()  # 线性插值


print(daily_ffill.head())

4. 使用groupby风格的操作

# 按季度计算，并取每季度的第一个值
quarterly_first = df.resample('Q').first()


# 按工作日计算平均值
business_daily_mean = df.resample('B').mean()

6. 自定义重采样

# 自定义重采样函数
def custom_resampler(array_like):
    return np.max(array_like) - np.min(array_like)


custom_resample = df.resample('M').apply(custom_resampler)
print(custom_resample)

3. 多列不同聚合方式

# 假设df有两列数据
df['data2'] = np.random.rand(len(df))


agg_dict = {
    'data': 'sum',
    'data2': 'mean'
}
multi_agg = df.resample('M').agg(agg_dict)

下面利用qmt的数据合成大周期数据

1分钟数据合成100分钟的数据

测定的结果

源代码我全部上传了知识星球可以直接下载使用

最近都在开发策略很多人问我要优惠券我也没有时间弄，五一我给大家优惠券可以直接使用优惠，我在开发自动网格T0算法比较忙

多周期合成的源代码

'''
股票多周期数据转化
作者:西蒙斯量化
微信:xg_quant
'''
from xtquant import xtdata
import pandas as pd
stock='600031.SH'
period='1m'
start_time='20250101'
end_time='20500101'
count=-1
callback=None
xtdata.download_history_data(stock_code=stock,period=period,start_time=start_time,end_time=end_time)
xtdata.subscribe_quote(stock_code=stock,start_time=start_time,end_time=end_time,count=count,callback=callable)
df=xtdata.get_market_data_ex(stock_list=[stock],start_time=start_time,end_time=end_time,period=period,count=count)
df=df[stock]
print(df)
def resample_stock_data_func(df='',resample_rule = '100T'):
    """
    tick数据合成需要补充close=lastPrice
    将1分钟K线数据重采样为100分钟K线数据
    参数:
        df: DataFrame, 包含'open','high','low','close','volume'列
            索引为DatetimeIndex
        
    返回:
        DataFrame: 重采样后的100分钟K线数据
    """
    # 确保索引是DatetimeIndex
    if not isinstance(df.index, pd.DatetimeIndex):
        df.index = pd.to_datetime(df.index)
    
    # 重采样规则
    resample_rule = resample_rule  # 100分钟
    
    # 执行重采样
    resampled = df.resample(resample_rule).agg({
        'open': 'first',
        'high': 'max',
        'low': 'min',
        'close': 'last',
        'volume': 'sum'
    })
    
    # 删除不完整周期（如最后不足100分钟的数据）
    resampled = resampled.dropna()
    
    # 可选：添加其他计算列
    resampled['timestamp'] = resampled.index
    resampled['date'] = resampled.index.date
    resampled['time'] = resampled.index.time
    
    return resampled
 #使用示例
df_100min = resample_stock_data_func(df)
print(df_100min)
#例子完整示例代码
import pandas as pd
import numpy as np
# 创建示例数据
rng = pd.date_range('2023-01-01', periods=1440, freq='T')
df = pd.DataFrame({
    'open': np.random.rand(1440) * 100 + 100,
    'high': np.random.rand(1440) * 5 + 100,
    'low': np.random.rand(1440) * 5 + 95,
    'close': np.random.rand(1440) * 3 + 98,
    'volume': np.random.randint(100, 1000, 1440)
}, index=rng)
# 1分钟转100分钟
df_100min = df.resample('100T').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}).dropna()
print(df_100min.head())
'''
Pandas resample 函数使用详解
resample 是 Pandas 中用于时间序列重采样的强大函数，特别适用于金融数据分析（如股票、期货等）。下面我将详细介绍如何使用 resample 函数进行数据频率转换。
常用参数说明
参数  说明
rule    重采样频率字符串（如 '5T', '1H', 'D' 等）
closed  区间闭合方向：'right'（默认）或 'left'
label   区间标签：'right'（默认）或 'left'
convention  重采样约定：'start'（默认）或 'end'
base    对于分钟级数据，调整分组基准（0-59）
on  如果索引不是时间，指定时间列
level   多级索引时，指定时间级别
常见频率字符串
字符串 说明
'S' 秒
'T' 或 'min' 分钟
'H' 小时
'D' 天
'W' 周
'M' 月
'Q' 季度
'A' 或 'Y'   年
'B' 工作日
股票数据重采样示例
1. 1分钟转5分钟数据
df_5min = df.resample('5T').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
})
2. 1分钟转100分钟数据（1小时40分钟）
df_100min = df.resample('100T').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
})
3. 日线转周线
df_weekly = df.resample('W-FRI').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
})
高级用法
1. 调整分组基准时间
# 每100分钟，从第15分钟开始分组
df.resample('100T', base=15).agg(...)
2. 处理不完整周期
# 保留不完整周期
df.resample('100T').agg(...)  # 默认保留
# 删除不完整周期
df.resample('100T').agg(...).dropna()
3. 自定义聚合函数
def vwap(df):
    return (df['close'] * df['volume']).sum() / df['volume'].sum()
df.resample('100T').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum',
    'vwap': vwap  # 自定义VWAP计算
})
注意事项
时间索引：确保DataFrame的索引是DatetimeIndex类型
数据对齐：重采样可能导致数据偏移，注意检查结果的时间戳
性能优化：大数据量时，考虑使用asfreq()或groupby()替代
缺失处理：重采样可能产生NA值，需要适当处理
完整示例代码
import pandas as pd
import numpy as np
# 创建示例数据
rng = pd.date_range('2023-01-01', periods=1440, freq='T')
df = pd.DataFrame({
    'open': np.random.rand(1440) * 100 + 100,
    'high': np.random.rand(1440) * 5 + 100,
    'low': np.random.rand(1440) * 5 + 95,
    'close': np.random.rand(1440) * 3 + 98,
    'volume': np.random.randint(100, 1000, 1440)
}, index=rng)
# 1分钟转100分钟
df_100min = df.resample('100T').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}).dropna()
print(df_100min.head())
'''