Python Time Series Data

Author: Mansoor Ahmed
Introduction

Python Time Series Data is a significant practice of structured data. This is formed in several changed fields, for example, economics, finances, biology, neuroscience, or physics. Everything that is experimental or measured at many points in time forms a time series. A lot of time series are the static frequency that is to say that data points happen at fixed pauses. According to some rule it occur in intervals, for example, every 15 seconds, every 5 minutes, or once per month. We might have seconds and minute-wise time series as well, equal, number of clicks and user visits every minute etc. Time series may also be uneven deprived of a fixed unit or time or offset between units. How we mark and mention to time series data be determined by on the request and we can have one of the following:

  • Timestamps, precise split second in time
  • Stationary periods, for example, the month January 2020 or the full year 2021
  • Pauses of time, designated by a start and end timestamp. Periods may be believed of as distinctive cases of intervals.
  • Testing or passed time; each timestamp is an amount of time relative to a specific start time. For instance, the width of a cookie baking to each second in the meantime being retained in the oven
Description

We even study a time series because it is the introductory step before we develop a forecast of the series. Moreover, time series forecasting has huge commercial implications due to stuff that is essential to a business like demand and sales, number of visitors to a website, stock price etc are in essence time series data. Consequently what does examining a time series include?

Time series investigation includes considerate many features about the characteristic nature of the series so that we are well informed to create expressive and precise forecasts.

Pandas make available a standard set of time series tools and data algorithms. Using this, we can proficiently work with very big time series and without difficulty slice and dice, aggregate, and resample uneven and static frequency time series. For instance we might estimate, numerous of these tools are particularly beneficial for financial and economics applications. However we could surely use them to examine server log data, as well.

Data Types and Tools for Date and Time

The Python standard library comprises data types for date and time data, along with calendar associated functionality. To start the datetime, time, and calendar modules are the main seats. The datetime.datetime type, or just datetime, is broadly used:

In [317]: from datetime import datetime In [318]: now = datetime.now() In [319]: now Out[319]: datetime.datetime(2021, 8, 4, 17, 9, 21, 832092) In [320]: now.year, now.month, now.day Out[320]: (2021, 8, 4)

datetime stores together the date and time down to the microsecond. datetime.time delta signifies the of time change between two datetime objects:

In [321]: delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15) In [322]: delta Out[322]: datetime.timedelta(926, 56700) In [323]: delta.days In [324]: delta.seconds Out[323]: 926 Out[324]: 56700

We may add or subtract a timedelta or several thereof to a datetime thing to return a new shifted object:

In [325]: from datetime import timedelta In [326]: start = datetime(2011, 1, 7) In [327]: start + timedelta(12) Out[327]: datetime.datetime(2011, 1, 19, 0, 0) In [328]: start - 2 * timedelta(12) Out[328]: datetime.datetime(2010, 12, 14, 0, 0) Fixed-frequency dates and time spans In [4]: dti = pd.date_range("2018-01-01", periods=3, freq="H") In [5]: dti Out[5]: DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00'], dtype='datetime64[ns]', freq='H')Date times with timezone information conversionIn [6]: dti = dti.tz_localize("UTC") In [7]: dti Out[7]: DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 01:00:00+00:00', '2018-01-01 02:00:00+00:00'], dtype='datetime64[ns, UTC]', freq='H') In [8]: dti.tz_convert("US/Pacific") Out[8]: DatetimeIndex(['2017-12-31 16:00:00-08:00', '2017-12-31 17:00:00-08:00', '2017-12-31 18:00:00-08:00'], dtype='datetime64[ns, US/Pacific]', freq='H')How to resample or convert a time series to a specific frequencyIn [9]: idx = pd.date_range("2018-01-01", periods=5, freq="H") In [10]: ts = pd.Series(range(len(idx)), index=idx) In [11]: ts Out[11]: 2018-01-01 00:00:00 0 2018-01-01 01:00:00 1 2018-01-01 02:00:00 2 2018-01-01 03:00:00 3 2018-01-01 04:00:00 4 Freq: H, dtype: int64 In [12]: ts.resample("2H").mean() Out[12]: 2018-01-01 00:00:00 0.5 2018-01-01 02:00:00 2.5 2018-01-01 04:00:00 4.0 Freq: 2H, dtype: float64How to perform date and time arithmetic with total or relative time increments.In [13]: friday = pd.Timestamp("2018-01-05") In [14]: friday.day_name() Out[14]: 'Friday' # Add 1 day In [15]: saturday = friday + pd.Timedelta("1 day") In [16]: saturday.day_name() Out[16]: 'Saturday' # Add 1 business day (Friday --> Monday) In [17]: monday = friday + pd.offsets.BDay() In [18]: monday.day_name() Out[18]: 'Monday'Timestamps versus time spans

Timestamped data is the greatest simple type of time series data that links values with points in time. For pandas objects it worth using the points in time.

In [28]: pd.Timestamp(datetime.datetime(2012, 5, 1)) Out[28]: Timestamp('2012-05-01 00:00:00') In [29]: pd.Timestamp("2012-05-01") Out[29]: Timestamp('2012-05-01 00:00:00') In [30]: pd.Timestamp(2012, 5, 1) Out[30]: Timestamp('2012-05-01 00:00:00')How to convert to timestamps

We may use the to_datetime function to convert a Series or list-like object of date-like objects for example strings, epochs, or a mixture. It returns a series by the same index while a list-like is converted to a DatetimeIndex when passed a series:

In [43]: pd.to_datetime(pd.Series(["Jul 31, 2009", "2010-01-10", None])) Out[43]: 0 2009-07-31 1 2010-01-10 2 NaT dtype: datetime64[ns] In [44]: pd.to_datetime(["2005/11/23", "2010.12.31"]) Out[44]: DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None)

We can pass the dayfirst flag if we use dates which start with the day first for instance European style:

In [45]: pd.to_datetime(["04-01-2012 10:00"], dayfirst=True) Out[45]: DatetimeIndex(['2012-01-04 10:00:00'], dtype='datetime64[ns]', freq=None) In [46]: pd.to_datetime(["14-01-2012", "01-14-2012"], dayfirst=True) Out[46]: DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None)How to provide format argument

Furthermore to the necessary datetime string, a format argument may be passed to make sure exact parsing. This could likewise potentially speed up the conversion significantly.

In [51]: pd.to_datetime("2010/11/12", format="%Y/%m/%d") Out[51]: Timestamp('2010-11-12 00:00:00') In [52]: pd.to_datetime("12-11-2010 00:00", format="%d-%m-%Y %H:%M") Out[52]: Timestamp('2010-11-12 00:00:00')