I have read related questions like this one and this blog post.
Unfortunately I am unable to modify the solutions to my needs.
Consider a Series with a DatetimeIndex which may look like this:
Code to instantiate an example:
s = pd.Series([0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0], index=pd.date_range(start=0, freq='1d', periods=12, name='A')
Ultimately, I want to get the result
(t4 - t1)
+ (t8 - t5)
+ (t10 - t8)
This means I need to identify streaks of 1 padded with 0 on each side. I can do everything after that myself, i.e. grouping by streak (possibly with cumcount) and diffing the first and last timestamp in each group.
There are some special cases when the Series starts/ends with a 1.In this case I want to treat it as if it was preceded/followed by a 0 at the same timestamp, e.g.
Attempt so far:
I'm going to concat some sub-solutions for easier visualization.
Pad the series with a zero on each end, to avoid special cases.
s = pd.Series([0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0], index=pd.date_range(start=0, freq='1d', periods=12, name='A') s = pd.concat([pd.Series([0], index=s.index[:1]), s, pd.Series([0], index=s.index[-1:])])Get the last
0before and the first0after a streak of ones.>>> tmp = pd.concat([s, s.diff(-1).eq(-1).astype(int).rename('starter'), s.diff(1).eq(-1).astype(int).rename('ender')], axis=1) >>> tmp A starter ender 1970-01-01 0 0 0 1970-01-02 0 1 0 1970-01-03 1 0 0 1970-01-04 1 0 0 1970-01-05 0 0 1 1970-01-06 0 1 0 1970-01-07 1 0 0 1970-01-08 1 0 0 1970-01-09 0 1 1 1970-01-10 1 0 0 1970-01-11 0 0 1 1970-01-12 0 0 0Fill single zero gaps in the
'A'column with1because they don't change the desired result. (This step might not be necessary but helps the visualization.)>>> tmp.loc[(both := tmp['starter'].eq(1) & tmp['ender'].eq(1)), 'A'] = 1 >>> tmp A starter ender 1970-01-01 0 0 0 1970-01-02 0 1 0 1970-01-03 1 0 0 1970-01-04 1 0 0 1970-01-05 0 0 1 1970-01-06 0 1 0 1970-01-07 1 0 0 1970-01-08 1 0 0 1970-01-09 1 1 1 1970-01-10 1 0 0 1970-01-11 0 0 1 1970-01-12 0 0 0Adjust the
'starter'and'ender'columns.>>> tmp.loc[both, ['starter', 'ender']] = 0 >>> tmp A starter ender 1970-01-01 0 0 0 1970-01-02 0 1 0 1970-01-03 1 0 0 1970-01-04 1 0 0 1970-01-05 0 0 1 1970-01-06 0 1 0 1970-01-07 1 0 0 1970-01-08 1 0 0 1970-01-09 1 0 0 1970-01-10 1 0 0 1970-01-11 0 0 1 1970-01-12 0 0 0
And this is where I'm stuck.

