Sleep Analysis Using Jupyter Notebooks (Part One)

Load packages and files.

In [1]:
%load_ext autoreload
%matplotlib inline
import pandas as pd
import numpy as np
import os
files = os.listdir(os.curdir)

Check out the files we've pulled from my phone (look at all that data)! There's two sleep files so we'll load them both.

In [2]:
files
Out[2]:
['.ipynb_checkpoints',
 'com.samsung.health.caffeine_intake.201908141922.csv',
 'com.samsung.health.device_profile.201908141922.csv',
 'com.samsung.health.exercise.201908141922.csv',
 'com.samsung.health.floors_climbed.201908141922.csv',
 'com.samsung.health.food_info.201908141922.csv',
 'com.samsung.health.food_intake.201908141922.csv',
 'com.samsung.health.heart_rate.201908141922.csv',
 'com.samsung.health.height.201908141922.csv',
 'com.samsung.health.oxygen_saturation.201908141922.csv',
 'com.samsung.health.user_profile.201908141922.csv',
 'com.samsung.health.weight.201908141922.csv',
 'com.samsung.shealth.activity.day_summary.201908141922.csv',
 'com.samsung.shealth.activity_level.201908141922.csv',
 'com.samsung.shealth.calories_burned.201908141922.csv',
 'com.samsung.shealth.sleep.201908141922.csv',
 'com.samsung.shealth.sleep_data.201908141922.csv',
 'com.samsung.shealth.step_daily_trend.201908141922.csv',
 'com.samsung.shealth.stress.201908141922.csv',
 'personal-data-analysis',
 'Sleep Analysis.ipynb',
 'Sleep20DayRollingAverage.html']
In [3]:
sleep=pd.read_csv('com.samsung.shealth.sleep.201908141922.csv',header=1)
In [4]:
sleep2=pd.read_csv('com.samsung.shealth.sleep_data.201908141922.csv',header=1)

Check out the first file - it looks it has sleep start and end time so we'll go with this one.

In [5]:
sleep.head(5)
Out[5]:
com.samsung.health.sleep.datauuid efficiency original_efficiency original_bed_time has_sleep_data com.samsung.health.sleep.pkg_name com.samsung.health.sleep.create_time com.samsung.health.sleep.time_offset com.samsung.health.sleep.end_time com.samsung.health.sleep.custom original_wake_up_time quality com.samsung.health.sleep.deviceuuid extra_data com.samsung.health.sleep.start_time com.samsung.health.sleep.update_time com.samsung.health.sleep.comment
0 2fa55158-d78c-fafc-443a-a2b352eb0802 94.004800 NaN NaN 1.0 com.sec.android.app.shealth 1514294139350 UTC-0600 1514293260000 NaN NaN NaN 3CaTqKloqY NaN 1514266740000 1514294139350 NaN
1 34833e70-6b4b-6fbb-e3c4-b3a0cd57fbad 95.879120 NaN NaN 1.0 com.sec.android.app.shealth 1514380544325 UTC-0600 1514379660000 NaN NaN NaN 3CaTqKloqY NaN 1514356620000 1514380544325 NaN
2 87de75d2-8b35-e628-a347-97e7c15207aa 90.214066 NaN NaN 1.0 com.sec.android.app.shealth 1514467738301 UTC-0600 1514466840000 NaN NaN NaN 3CaTqKloqY NaN 1514445960000 1514467738301 NaN
3 9444943f-d3b1-9a9e-c297-c1d063b9bcf0 94.750656 NaN NaN 1.0 com.sec.android.app.shealth 1514550033623 UTC-0600 1514549100000 NaN NaN NaN 3CaTqKloqY NaN 1514524740000 1514550033623 NaN
4 13dd3307-da7e-96a1-c50a-350407ceb724 92.045456 NaN NaN 1.0 com.sec.android.app.shealth 1514557115339 UTC-0600 1514556240000 NaN NaN NaN 3CaTqKloqY NaN 1514550660000 1514557115339 NaN

We have to convert the start and end times to datetime objects. We also create a 'time slept' column by subtracting one from the other.

In [6]:
import datetime  
sleep['start_time']= sleep.apply(lambda x: datetime.datetime.fromtimestamp(x['com.samsung.health.sleep.start_time'] / 1e3),axis=1)
sleep['end_time']= sleep.apply(lambda x: datetime.datetime.fromtimestamp(x['com.samsung.health.sleep.end_time'] / 1e3),axis=1)
#date = datetime.datetime.fromtimestamp(sleep['com.samsung.health.sleep.start_time'] / 1e3)
In [7]:
sleep['time_slept']=sleep['end_time']-sleep['start_time']
In [8]:
sleep['end_date']=sleep.apply(lambda x: x['end_time'].date(),axis=1)
In [9]:
x=sleep['end_time'][0].date()

There are some days that have multiple sleep sessions (I often wake up at night to go to the bathroom, etc.), so we need to group by date. Looking at the data I also saw there were some days where it said that I slept for 22+ hours or only an hour or two - that's not my style, so it's much more likely that it's an error with the tracking. We'll remove those outliers as well.

In [10]:
by_date=sleep[['end_date','time_slept']].groupby('end_date').sum().reset_index()
In [11]:
by_date['hours_slept']=0
placeholder=by_date.apply(lambda x: x['time_slept'].total_seconds()/60/60,axis=1)
by_date['hours_slept']=placeholder
In [12]:
def remove_outlier(df_in, col_name):
    q1 = df_in[col_name].quantile(0.25)
    q3 = df_in[col_name].quantile(0.75)
    iqr = q3-q1 #Interquartile range
    fence_low  = q1-1.5*iqr
    fence_high = q3+1.5*iqr
    df_out = df_in.loc[(df_in[col_name] > fence_low) & (df_in[col_name] < fence_high)]
    return df_out

by_date=remove_outlier(by_date,'time_slept')

Visualizing Data

Before we visualize I want to add a 7 day rolling average and 28 day rolling average to better visualize the data. We'll be using plot.ly

In [13]:
Weekly_Mean=by_date.hours_slept.rolling(window=7).mean()
by_date['weekly_mean']=Weekly_Mean
fourweekmean=by_date.hours_slept.rolling(window=28).mean()
by_date['four_week_mean']=fourweekmean
In [14]:
import plotly.express as px
import plotly.graph_objects as go


fig = go.Figure()
fig.add_trace(go.Scatter( x=by_date["end_date"], y=by_date['hours_slept'],mode='lines',name='Hours Slept'))
fig.add_trace(go.Scatter(x=by_date["end_date"], y=by_date['weekly_mean'],mode='lines',name='7 Day Rolling Average'))
fig.add_trace(go.Scatter(x=by_date["end_date"], y=by_date['four_week_mean'],mode='lines',name='28 Day Rolling Average'))
fig.update_layout(title='Hours Slept Nightly',
                   xaxis_title='Day',
                   yaxis_title='Time Slept',
            )


fig.show()