Dates and Times in Python

Master dates and times in Python: date and datetime, strftime and strptime, timedelta, timezones and the DST traps, plus pandas parsing, the dt accessor, and resampling.

Working with dates feels deceptively simple until you hit the details: how many days between two events, parsing a string into a real date, comparing times across time zones, or computing a duration that crosses a daylight saving boundary. Python handles all of this through a small family of objects, and pandas extends them to whole columns of data. This guide walks through the building blocks, datedatetime, and timedelta, then time zones and their daylight saving traps, and finally the pandas tooling that makes time-series analysis practical, ending with the mental model that ties it together.

The date object

date represents a calendar day with no time attached, just year, month, and day.

from datetime import date
d = date(1992, 8, 24)
d.year # 1992
d.weekday() # 0 for Monday through 6 for Sunday

Notice that yearmonth, and day are attributes accessed without parentheses because they are stored values, while weekday()is a method with parentheses because it computes something. Dates know their natural order, so you can sort a list of them or take the minimum and maximum directly, and subtracting one date from another gives you a duration.

gap = date(2007, 12, 13) - date(2007, 5, 9)
gap.days # the number of days between them

That subtraction produces a timedelta, and pulling .days off it gives the integer day count, which is usually exactly what you want for “how many days between X and Y.”

The datetime object

datetime is a date plus a time of day, constructed the same way with extra arguments for hour, minute, and second.

from datetime import datetime
dt = datetime(2017, 10, 1, 15, 26, 26)

All the date attributes work, plus .hour.minute, and .second. Beyond direct construction there are two common ways to make one. fromtimestamp converts a Unix timestamp, the seconds since the start of 1970 that machines use internally, into a readable datetime, and strptime parses a string into a datetime using format codes you supply.

datetime.fromtimestamp(1514665153)
datetime.strptime("2017-02-03 00:00:01", "%Y-%m-%d %H:%M:%S")

Reach for datetime whenever the time of day matters, and plain date when only the calendar day does.

Format codes and converting to and from strings

Format codes are the small language for translating between datetime objects and human-readable text. Each percent code is one piece of the date or time: %Y for the four-digit year, %m for the two-digit month, %d for the day, and %H%M%S for hours, minutes, and seconds. Two methods use them in opposite directions.

d.strftime("%Y-%m-%d") # datetime to string: "2017-08-24"
datetime.strptime(s, "%Y-%m-%d") # string to datetime

The mnemonic is “f for format, p for parse.” strftime formats a datetime into a string however you like, while strptime parses a string into a datetime, and crucially the format you give it must match the string exactly, including every separator. There is also isoformat(), a shortcut that produces the ISO 8601 standard like “2017-08-24T15:26:26”, which is the safe choice for storing dates because it sorts correctly as plain text and parses cleanly in other languages.

Durations with timedelta

timedelta represents a span of time rather than a point in time. It appears naturally when you subtract two datetimes, and you can also build one directly with keyword arguments.

from datetime import timedelta
duration = end_dt - start_dt
duration.total_seconds() # the whole span as a float
tomorrow = today + timedelta(days=1)
later = dt + timedelta(hours=6)

The most useful method is total_seconds(), which collapses the entire duration to a single number of seconds, ideal for maths and comparisons. Be aware that .days gives only the whole-day component and discards anything finer, so prefer total_seconds() unless you specifically want days. Adding a timedelta to a datetime moves it forward in time and subtracting moves it back, which is how you compute “thirty days from now” or “six hours before this event.”

Time zones

A datetime with no zone attached is called naive, because it does not know where in the world it happened, and attaching a zone makes it aware. Two operations are easy to confuse and must be kept straight. replace(tzinfo=...) attaches a zone label without changing the clock numbers, which is what you want when you know the local time but it is missing its zone tag. astimezone(...) converts to a different zone, shifting the clock numbers so they still represent the same instant. So labelling 3pm as Eastern keeps it at 3pm, but converting that 3pm Eastern to UTC moves it to 7pm, because that is the same moment expressed elsewhere.

You can use fixed offsets, but they are blind to daylight saving, so the right tool for real work is an IANA time zone name from the time zone database.

from dateutil import tz
eastern = tz.gettz('America/New_York')
uk = tz.gettz('Europe/London')
dt = dt.replace(tzinfo=eastern) # attach the original zone
dt = dt.astimezone(uk) # convert to another

Names like America/New_York reference a database of historical and current rules, so a date in 1985 uses 1985’s daylight saving rules and a future date uses the latest ones, something a fixed offset can never do. For any code that crosses zones or spans a daylight saving change, IANA names are the only safe choice.

Daylight saving itself creates two genuine bugs. The spring-forward hour does not exist, because clocks jump straight from 2am to 3am, so a time like 2:30am simply is not valid that day. The fall-back hour happens twice, because clocks go from 2am back to 1am, so 1:30am occurs once before and once after the shift. Python distinguishes the two using the foldattribute, where zero is the first occurrence and one is the second, and tz.datetime_ambiguous tells you when you are in that danger zone while tz.enfold lets you mark which one you mean. The larger lesson outweighs the details: whenever you do arithmetic across a daylight saving boundary, convert both datetimes to UTC first, because UTC has no daylight saving and the maths becomes simple and correct.

start_utc = start.astimezone(tz.UTC)
end_utc = end.astimezone(tz.UTC)
duration = (end_utc - start_utc).total_seconds()

The golden rule is to store and compute in UTC, and only translate to local time when displaying to people.

Dates and times in pandas

Pandas brings all of this to entire columns. When loading a CSV, dates arrive as plain strings unless you tell pandas to parse them, which you do with parse_dates.

import pandas as pd
sessions = pd.read_csv("sessions.csv", parse_dates=["start", "end"])

Parsing at load time is faster and cleaner than converting afterwards, and it is required before you can use the datetime tooling. The gateway to that tooling is the .dt accessor, which exposes datetime operations across a whole column in one vectorised step.

sessions["start"].dt.year
sessions["start"].dt.weekday # 0 for Monday
sessions["start"].dt.day_name() # "Monday", a method, so parentheses

From there, the everyday recipes follow. Subtracting two datetime columns gives a duration column, which you turn into a numeric one with total_seconds().

sessions["duration"] = (sessions["end"] - sessions["start"]).dt.total_seconds()

To find the gap between consecutive events, shift(1) moves a column down by one row so each row can see the previous row’s value, and the first row becomes NaT, the datetime null, because there is nothing before it.

sessions["time_since"] = sessions["start"] - sessions["end"].shift(1)
sessions["time_since"] = sessions["time_since"].dt.total_seconds()

Time zones in pandas mirror the Python distinction with column-level methods. tz_localize attaches a zone to a naive column without shifting the clock, and tz_convert shifts an already-aware column to a different zone, so the two-step pattern is localise once, then convert as needed. The ambiguous="NaT" option handles the fall-back hour by marking ambiguous times as missing rather than raising an error.

sessions["start"] = sessions["start"].dt.tz_localize("America/New_York", ambiguous="NaT")
sessions["start"] = sessions["start"].dt.tz_convert("Europe/London")

Resampling time series

Resampling is groupby for time. Instead of grouping by a category, you group rows into time buckets like every day or every month, then aggregate.

sessions.resample("D", on="start").size() # daily count
sessions.resample("M", on="start")["duration"].median() # monthly median

The frequency strings are short codes, "D" for day, "W" for week, "M" for month-end, "H" for hour, with combinations like "15min" and "Q" for quarter, and any aggregation that works after a groupby works after a resample. Combining the two is where it gets powerful: group by a category first, then resample within each group, to see a metric trend over time per segment.

sessions.groupby("plan").resample("M", on="start")["duration"].median()

That produces a result indexed by both plan and month, the exact answer to “how does this metric trend over time, broken down by segment,” which is the backbone of most monthly dashboards.

Conclusion

Hold onto a few relationships and the whole topic stays clear. A date is year, month, and day; a datetime adds the time of day; a timedelta is the duration between two datetimes; and a time zone says where in the world a datetime happened. Subtracting two datetimes yields a timedelta, and adding a timedelta to a datetime yields a new datetime. Two rules keep you out of trouble: always convert to UTC before doing arithmetic across a daylight saving boundary, and always use IANA time zone names rather than fixed offsets. Get those right, lean on the .dt accessor and resampling in pandas, and date handling stops being a source of subtle bugs and becomes routine.

See you soon.

View Comments (1)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Discuss Data Science, Machine Learning and Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading