RFS Python Primer#
Overview#
Python is a versatile language with a rich package ecosystem that makes it easy to build powerful scripts and programs quickly, but it's also very idiomatic and if you don't understand its fundamental concepts and features it can be very inefficient. This document will focus on the fundamentals, which will inform how you interact with any library you may need to use, but will touch on standard library packages frequently used in RFS systems. By the end you will hopefully have the context and foundational knowledge to drive further research and make educated inferences about new code you encounter. It is expected that as you read you will research unfamiliar syntax and concepts introduced in examples.
Python mental model and basic syntax#
Indentation as syntax#
One of Python's more unique features is its use of whitespace. Unlike most languages, Python enforces strict rules on the indentation of lines as part of its syntax, which has the benefits of lending a uniform general style to python code and reducing cluttered code block delimiters.
In general, the indentation level of a block of code indicates its scope (the region of code in which an object can be accessed and modified). That is to say, a variable defined in a less indented line of code will be available in a more indented line of code, but not the reverse.
Example: indentation + a simple decision
pressure_kpa = 512.3
if pressure_kpa > 500:
print("High pressure")
else:
print("Normal pressure")
Example: implicit line joining for readable long expressions
Python lets you split long expressions across lines inside (...), [...], {...}. You can also split lines using backslashes at the end of lines, but we don't use this style.
is_in_spec = (
450.0 <= pressure_kpa <= 550.0
and temperature_c is not None
and -20.0 <= temperature_c <= 80.0
)
Example: indentation dictates scope
The code below will raise a NameError because the variable last is defined in a narrower scope (inside the for loop) than it is being accessed in when print() is called. Note that this is not true for if/else blocks, which do not narrow scope despite being indented.
items = [1, 2, 3]
for item in items:
last = item
print(last)
“Bytecode”, “CPython”, and what “interpreted” means#
When you run Python with CPython (the standard implementation you'll be using), your .py file is compiled into bytecode (an internal representation) executed by a virtual machine; .pyc files cache bytecode so reruns can skip recompilation.
This matters because execution speed depends heavily on how much work happens in Python-level bytecode vs optimized loops implemented in C under the hood. (we’ll revisit this in performance).
Type annotations#
Type annotations, which we'll be using throughout the examples, have no function in the actual running of python code, they are ignored when the code is interpreted. They shine, however, when using a modern text editor with support for type checkers. If you are careful about annotating your code throughly and being specific in those annotations, you can catch many errors far before they happen, and make the usage of your functions and classes much clearer to readers.
Basic example
def parse_limit(raw: str) -> float | None: # This function accepts a string and returns either a float or None
raw = raw.strip()
if raw == "":
return None
return float(raw)
Some quick definitions#
You’ll see these terms pop up as you study Python:
An object is immutable if it has a fixed value and cannot be altered in-place; if you “change” it, you really create a new object. Numbers, strings, and tuples are common immutable types.
An object is hashable if it has a hash value that never changes during its lifetime and supports equality comparison; hashability is what allows an object to be used as a dict key or set member.
You don’t need to memorize these definitions, but you do need to recognize them when they become relevant. (e.g., “why can’t I use a list as a dict key?” → lists are mutable and not hashable).
Core data types and structures#
Python’s standard library docs describe the “principal built-in types” as numerics, sequences, mappings (like dict), and sets; these are the workhorses of most production Python.
Strings, lists, sets, tuples: quick practical orientation#
A str is Unicode text (what you want for names, IDs, logs). A list is an ordered, mutable sequence (good for collections you’ll append/sort). A tuple is an ordered, immutable sequence (good for fixed-shape records like (x, y)), and a set is an unordered bag of unique hashable items (great for membership tests and set algebra).
Dictionaries: your primary “record” and “lookup table”#
A dict is a mapping from keys to values. Use it for configurations, JSON-like records, caches, routing tables, “by id” indexes, and basically everything that feels like a key/value table.
Basic example: dict as configuration
cfg = {"host": "db01", "port": 5432}
host = cfg["host"] # raises KeyError if missing
timeout_s = cfg.get("timeout_s", 5) # safe default if missing
cfg["timeout_s"] = 10 # inserts a new key-value pair
# {"timeout_s": 10} into the existing dictionary
cfg.update({"retries": 3, "port": 5599}) # combines the two dictionaries, overwriting
# matching keys with the new value
Advanced example: dict merging with | (handy for defaults + overrides)
If you don't want to mutate the original dictionary, the | operator will create a new dictionary with the merged values.
defaults = {"timeout_s": 5, "retries": 3}
overrides = {"timeout_s": 10}
effective = defaults | overrides # right side wins on conflicts
Hashable keys and “why some things can’t be keys”#
Dict keys must be hashable. That’s why ("PT", 101) is a fine key (immutable tuple of hashables), but ["PT", 101] is not (lists are mutable and not hashable).
Basic example: valid keys
ok = {
"PT-101": 512.3,
("PT", 101): 512.3,
}
Advanced example: dataclass-like “key objects”
You can create your own classes that can be used as dict keys as long as they implement the right dunder methods (more on these later).
class TagId:
__slots__ = ("prefix", "number")
def __init__(self, prefix: str, number: int):
self.prefix = prefix
self.number = number
def __hash__(self) -> int:
return hash((self.prefix, self.number))
def __eq__(self, other) -> bool:
return isinstance(other, TagId) and (self.prefix, self.number) == (other.prefix, other.number)
d = {TagId("PT", 101): 512.3}
Practical example: recursive traversing of nested data
Dictionaries can contain any object as a value despite the restrictions on keys, including another dictionary. This function, used in our codebase to combine project-level settings with office-level overrides, traverses such a nested dictionary structure to merge any dictionaries it finds in a given key and replacing non-dictionary values.
def deep_merge(base: Any, override: Any) -> Any:
"""
Recursively merge override into base.
- Dicts are merged deeply.
- Other types are replaced.
"""
if isinstance(base, dict) and isinstance(override, dict):
out = dict(base) # create a copy of the base dictionary so we don't mutate
for key, val in override.items():
if key in out:
out[key] = deep_merge(out[key], val) # merge again on matching keys
else:
out[key] = val # add new keys
return out
else:
return override # overwrite the value for non-dicts
List/dict/string methods#
There are a variety of methods built into these data types that make processing them simpler. These built-ins are also much faster than something built manually, so check the Python docs for a method before you start implementing your own.
Common methods cheat sheet#
| Task | Idiomatic tool | Why |
|---|---|---|
| Add 1 item to a list | lst.append(x) |
Add an item to a list in-place |
| Add many items | lst.extend(iterable) |
Avoids nested lists |
| Remove last item | lst.pop() |
Returns and removes last item |
| Safe lookup in dict | d.get(k, default) |
Avoids KeyError |
| Iterate key/value pairs | for k, v in d.items(): ... |
Unpacks k-v pairs for easy access |
| Clean input text | s.strip() / s.split() |
Standard parsing |
| Build string from parts | ",".join(parts) |
Faster and cleaner than repeated + |
Example: string cleanup + parsing
raw = " tag=PT-101, value=512.3 "
raw = raw.strip() # "tag=PT-101, value=512.3"
fields = raw.split(",") # ["tag=PT-101", "value=512.3"]
tag = fields[0].split("=", 1)[1].strip() # "PT-101"
value = float(fields[1].split("=", 1)[1].strip()) # 512.3
Advanced example: mapping-based formatting with format_map
record = {"tag": "PT-101", "value": 512.3}
msg = "Reading {tag}: {value:.1f} kPa".format_map(record)
# Take special note of {value:.1f}, look up what that .1f is doing there.
Text processing with regex (re)#
Regular expressions (regex) is a precision tool: use it when string methods get awkward or error-prone. Python’s re docs strongly encourage raw strings (r"...") because backslashes behave differently in Python string literals vs regex syntax. There are many regex references online which won't be reproduced here, and online tools to help create and test regex patterns. These examples are meant to demonstrate what is possible.
Basic example: validate a tag format
import re
TAG = re.compile(r"^[A-Z]{2}-\d{3}$")
def is_valid_tag(s: str) -> bool:
return TAG.fullmatch(s) is not None
Advanced example: named groups + structured extraction
import re
LINE = re.compile(
r"^(?P<ts>\S+)\s+tag=(?P<tag>[A-Z]{2}-\d{3})\s+value=(?P<value>-?\d+(?:\.\d+)?)\s+unit=(?P<unit>\w+)$"
)
# The ?P<name> syntax is a python specific feature that allows you to name capture groups
line = "2026-03-31T12:05:18Z tag=PT-101 value=512.3 unit=kPa"
m = LINE.match(line)
if m:
rec = m.groupdict()
rec["value"] = float(rec["value"])
print(rec) # 512.3
match vs search vs fullmatch#
match() checks from the beginning of the string; if you want to find a match anywhere in the string, use search(). fullmatch() requires the entire string to match.
Example: match vs search
import re
print(re.match("c", "abcdef")) # None
print(re.search("c", "abcdef")) # match at index 2
Advanced example: compile once, reuse in a loop
import re
pattern = re.compile(r"\b(\w+)\s+(\w+)\b") # two "words" separated by whitespace
for line in ["Isaac Newton", "Marie Curie"]:
m = pattern.fullmatch(line)
if m:
first, last = m.group(1), m.group(2)
print(first, last)
Loop types, comprehensions, and generator expressions#
Loops: for, while, and loop else#
Python implements two kinds of loops: for and while. A for loop accepts an iterable (lists being the most common) and will run the loop for each element exactly once. If the iterable is ordered, the items will iterate in that order. A while loop defines conditional logic at the top of the loop which will be checked at the beginning of each run, ending the loop if the conditional becomes False.
Example: for over items
values = [10.2, 10.1, 10.4]
for v in values:
print(v)
Example: while timeout logic
start_time = time.monotonic() # precise method for calculating time differences
while time.monotonic() - start_time < 300: # remains true for 300 seconds
print("5 minutes have not elapsed.")
Practical example: control flow with break and continue
The break and continue keywords are important for controlling the execution of complex loops. continue causes the current iteration of the loop to terminate and start the loop again (from the next object in the iterator in the case of a for loop). break causes the current loop to immediately terminate and continues running the rest of the code.
for stage in wrk.stages:
if skip_stages is not None and stage in skip_stages:
continue # if the stage is flagged to be skipped, restart the loop before it is run
result = self._run_stage(
run_id=run_id,
resume_id=resume_id,
run_time=run_time,
stage=stage,
ctx=ctx,
op=op,
tables=tables,
source_ref=wrk,
)
out.append(result)
if result.result == "fail":
break # if the stage failed to run properly, do not run any more stages
return out
Advanced example: loop else
An else block after a loop will only execute if the loop did not exit naturally (the for loop reached the end of its iterator or the while loop's conditional became False). Typically this only occurs when break is used.
needle = "PT-101"
haystack = ["FT-202", "TT-303", "PT-404"]
for tag in haystack:
if tag == needle:
print("Found it")
break
else:
print("Not found (no break)")
enumerate() to access indices#
enumerate(iterable) yields (count, value) pairs, useful for accessing indices while looping.
Basic example
for idx, v in enumerate([10.2, 10.1, 10.4]):
print(f"Test {idx}: {v}")
Advanced example: stable “rank” labeling
scores = [("A", 91), ("B", 85), ("C", 91)]
for rank, (name, score) in enumerate(sorted(scores, key=lambda x: x[1], reverse=True), start=1):
print(rank, name, score)
Comprehensions and generator expressions#
Comprehensions let you build containers concisely and are much more performant than building one with a for loop. They are still scoped similarly to a loop though, as if they were in their own indented block.
Example: list comprehension
squares = [x * x for x in range(10)]
Example: dict comprehension for indexing records by ID
records = [
{"tag": "PT-101", "value": 512.3},
{"tag": "FT-202", "value": 18.1},
]
by_tag = {r["tag"]: r for r in records}
Example: set comprehension for uniqueness
tags = ["PT-101", "PT-101", "FT-202"]
unique = {t for t in tags} # {"PT-101", "FT-202"}
Advanced example: generator expression to avoid intermediate lists Generator expressions are best when you just need to iterate or aggregate without storing everything.
total = sum(v * v for v in range(1_000_000))
Practical example: validating and cleansing lists
A common pattern in our codebase, given a list which may contain empty, null, or unwanted data, ensure that only data we can process exists in it before processing. In this case, given a list of pipeline stage results, we compile a list of those that did not succeed.
failed_stages = [res for res in res_list if res.result != "success"]
Built-in function cheat sheet#
There is a long list of built in functions that can be applied to any object that implements the right methods, some common ones are listed here.
| Built-in | Typical engineering use |
|---|---|
len(x) |
sizing, validation, empty checks |
min(...), max(...) |
limits, bounds |
sum(...) |
totals, aggregation |
sorted(iterable, key=...) |
deterministic ordering, reporting |
enumerate(...) |
index + value iteration |
zip(...) |
pair or transpose iterables; data alignment checks |
map(...) |
lazy function application; pipeline steps |
zip(strict=True) to catch silent data bugs#
zip() is lazy and, by default, stops at the shortest iterable (which can hide mismatched data). Python’s docs recommend strict=True when equal lengths are expected; it raises ValueError if lengths differ.
Basic example: normal zip
zipped = zip([1, 2, 3], ["one", "two", "three"])
for paired in zipped:
print(paired) # (1, "one") (2, "two") (3, "three")
Advanced example: strict zip for alignment safety
tags = ["PT-101", "FT-202", "TT-303"]
limits = [100.0, 250.0] # BUG: shorter
for tag, lim in zip(tags, limits, strict=True):
print(tag, lim) # raises ValueError instead of silently dropping TT-303
Practical example: mapping scraped values based on row index
In this example we are working with data scraped from a website. The ID values we get are from a different element than the statuses, but they always have the same row index. So by creating two ordered lists, we can zip them into a dict mapping of id to status.
rows = self.detect_all(*self._locator.FLAG_ID)
current_ids = [row.text for row in rows]
rows_data = self.detect_all(*self._locator.FLAG_DATA)
flag_statuses = []
for row in rows_data:
cols = row.find_elements(*self._locator.FLAG_CELL)
flag_statuses.append(cols[7].text)
status_map = dict(zip(current_ids, flag_statuses))
map(...)#
map is a versatile function that allows you to apply any function (including user-defined or anonymous functions) to every element in an iterable quickly.
Basic example: map over one iterable
upper = list(map(str.upper, ["pt101", "ft202"])) # ["PT101", "FT202"]
Functions plus module/package structure and import behavior#
Functions: signatures, defaults, and calling conventions#
Functions allow you to build reusable code blocks you can run over and over in a compact way using the def keyword. The signature of a function looks like this:
def function_name(positional_arg, pos_arg2, keyword_arg="default", *, forced_keyword):
print(positional_arg, pos_arg2, keyword_arg, forced_keyword, sep=" ")
# Use it like this:
first_arg = "this is positional_arg"
function_name(first_arg, 2, "keyword", forced_keyword="forced")
# -> this is positional_arg 2 keyword forced
function_name("new arg", "also new", forced_keyword="out of order", keyword_arg="still works")
# -> new arg also new still works out of order
# Note that the variables are still printed as prescribed in the function
# function_name(keyword_arg="keyword", "before", "positional", forced_keyword="breaks")
# -> Raises an error
function_name(forced_keyword="positional", pos_arg2="args", positional_arg="as keywords")
# -> as keywords args default positional
# Note that we didn't pass a value for keyword_arg, which we provided with a default
Default argument pitfall#
Python evaluates default parameter expressions once, when the function is defined, not each time it’s called. This is especially important when the default is a mutable object like a list or dict; all function calls will share and reuse the same container.
Basic example: the bug
def add_event(ev: str, events: list[str] = []): # DON'T DO THIS
events.append(ev)
return events
print(add_event("ALARM")) # ["ALARM"]
print(add_event("CLEAR")) # ["ALARM", "CLEAR"]
Advanced example: the correct pattern
from typing import Optional
def add_event(ev: str, events: Optional[list[str]] = None) -> list[str]:
if events is None:
events = []
events.append(ev)
return events
Positional-only and keyword-only parameters#
Python supports explicit calling conventions in function signatures. Parameters before / are positional-only, parameters after * are keyword-only, and this can improve readability and API stability.
Basic example: keyword-only makes call sites clearer
def connect(host: str, *, port: int = 5432, timeout_s: float = 5.0) -> str:
return f"{host}:{port}?timeout={timeout_s}"
connect("db01", port=5432)
Advanced example: positional-only protects APIs from name changes
def scale(value: float, /, *, factor: float) -> float:
return value * factor
scale(10.0, factor=2.5) # OK
# scale(value=10.0, factor=2.5) # TypeError: positional-only
Practical example: iterating through a dict of functions
In our codebase, we often use dictionaries to map complex relationships. In this case, we start by building a dictionary mapping export types to a previously obtained table of data. Each of the BlocksExportType values are mapped by cleanse_map to a function which is designed to cleanse that type of export.
export_jobs: set[BlocksExportType] = set()
df_dict: dict[BlocksExportType, pd.DataFrame] = {}
# Build list of export jobs based on the tables we need to produce
for table in wrk.crm.tables:
export_jobs |= set(BLOCKS_EXPORT_REG[table])
# Builds a mapping from export type to data table, discarding empty keys
for export in export_jobs:
raw_export = resume.loc[resume["name"] == export]
if not raw_export.empty:
df = expand_payload(raw_export)
df = df.loc[df["office_id"] == wrk.office.id]
df_dict[export] = df
# Note here that the values in this dictionary are functions, everything in python is an object,
# and there are no restrictions on the values of a dictionary. It is also important to note
# that this only works because these functions have identical interfaces (signatures)
cleanse_map = {
BlocksExportType.SHIFT: cleanse_shift_report,
BlocksExportType.SHIFT_AUX: cleanse_shift_export,
BlocksExportType.CANVASSER: cleanse_canvasser,
BlocksExportType.LOCATION: cleanse_location,
BlocksExportType.PACKET: cleanse_packet,
BlocksExportType.QC: cleanse_qc,
}
rows = 0
# Iterate through the data, applying a cleansing function from the map above
# based on the type of data
for export_type, df in df_dict.items():
ctx.raise_exit()
cleansed_df = cleanse_map[export_type](wrk, ctx, df)
Modules and packages: how to structure real code#
A module is a .py file; the module name is the filename and the module’s name is available as the __name__ built-in variable. When you run a module directly, it is given the special name "__main__".
A package is a directory structure, commonly with __init__.py files to manage access to internal code.
RFS repository layout#
We use one centralized repo for multiple packages in our codebase. The src folder contains each package in its own folder. If a package utilizes code from another one of our packages, we simply import it.
rfs-services/
pyproject.toml
src/
rfs_bigquery/
__init__.py
auth.py
executor.py
rfs_etl/
cleanse/
extract/
interfaces/
bigquery.py # We import and elaborate on code from rfs_bigquery here
...
load/
transform/
rfs_paycom/
tests/
How imports work#
The import system reference is very direct: the import statement combines two operations—(1) search for the module, then (2) bind the result in the local scope.
When a module is first imported, Python searches and, if found, creates a module object and initializes it; if it can’t be found you get ModuleNotFoundError.
Also important: Python has only one “module object” type regardless of whether the module is implemented in Python, C, or something else. This is one of the reasons C-optimized standard library modules plug in seamlessly.
Visual: high-level import flow#
flowchart TD
A[import package.module] --> B{Is 'package.module' in sys.modules cache?}
B -->|Yes| C[Return cached module object]
B -->|No| D[Finders search path: sys.meta_path / sys.path]
D --> E{Found module?}
E -->|No| F[ModuleNotFoundError]
E -->|Yes| G[Create module object; insert into sys.modules early]
G --> H[Loader executes module code and initializes globals]
H --> I[Bind name(s) in caller scope]
Enums, OOP basics, classes/methods, dataclasses, and type annotations#
This is where you move from “scripts” to “maintainable systems.”
Enums: why they’re useful#
An Enum is a set of symbolic names bound to unique values. This allows you to use the construct EnumName.ENUMVALUE anywhere that enum's value is relevant, reducing the clutter of having string tokens scattered around the code and improving readability. They’re most useful when a variable can take one of a limited set of values and can be used with type annotations to limit human error further.
Basic example: model state cleanly
from enum import Enum
class ValveState(Enum):
OPEN = "open"
CLOSED = "closed"
state = ValveState.OPEN
if state is ValveState.OPEN:
print("Flow allowed")
Advanced example: add behavior to an Enum
from enum import Enum
class Severity(Enum):
INFO = 1
WARN = 2
ALARM = 3
def is_actionable(self) -> bool:
return self in {Severity.WARN, Severity.ALARM}
non_actionable = Severity.INFO
if non_actionable.is_actionable():
print("uh-oh") # Will not print for Severity.INFO
We use enums extensively in our codebase. As a general rule, if a string argument can only be a handful of values, make an enum for it.
OOP basics: what Python classes actually do#
Classes are a way to bundle data and functionality. Instances of classes are called objects and can have attributes (state) and methods (behavior). Think of a class as a stamp and the object as the image that stamp presses into the paper. Once an object is created it is independent of other objects of the same class and can be handled individually, but it will share functionality with those other objects; a powerful tool.
Basic example: a simple class with one responsibility
class Pump:
def __init__(self, tag: str):
self.tag = tag
self.running = False
def start(self) -> None:
self.running = True
pump_obj = Pump(tag="my_first_pump")
print(pump_obj.running) # False
next_pump = Pump(tag="another_pump")
pump_obj.start()
print(pump_obj.running) # True
print(next_pump.running) # False
This sharing of functionality can be further leveraged using inheritance. Inheritance is when a class uses the functionality from another class (called subclassing), and the resulting object can use methods from both classes.
Advanced example: inheritance and overriding
class Device:
def __init__(self, tag: str):
self.tag = tag
self.running = False
def start(self) -> None:
self.running = True
def status(self) -> str:
return "unknown"
class Pump(Device):
def status(self) -> str:
return "running" if self.running else "stopped"
main_pump = Pump("main")
main_pump.start() # Can use methods of Device class
print(main.status()) # Prints "running", using the child method's implementation
Dataclasses#
Dataclasses allow you to quickly build full featured classes for storing structured data quickly. They allow you to impart meaning to your data and improve readability compared to arbitrary data structures like a dict or list.
Basic example: a clean measurement record
from dataclasses import dataclass
@dataclass # This is called a decorator, more on these later. This one defines a dataclass
class DeviceReading:
tag: str
value: float
unit: str
# usage
new_reading = DeviceReading("aux", 121, "kPa")
print(new_reading.value) # 121
Advanced example: default_factory, immutability, and slots
The dataclasses decorator supports options including frozen and slots. Use the field function from dataclasses to provide default values to mutable types.
from dataclasses import dataclass, field
@dataclass(frozen=True, slots=True)
class Batch:
batch_id: str
readings: list[DeviceReading] = field(default_factory=list)
Use frozen=True when your class is intended to store values that shouldn't change after creation. Use slots=True when you have many instances (in the high thousands) and want lower memory overhead and to avoid accidental new attributes.
Time and datetime in the real world#
Time is where good engineers get humbled. Python gives you strong tools, but you must choose the right ones.
Naive vs aware datetimes (and why you should care)#
Python’s datetime categorizes date/time objects as aware or naive depending on whether they include timezone info. An aware object can represent a specific moment in time not open to interpretation, while naive ones don’t include timezone context.
Naive datetime objects are treated as local times by many methods, and you should use aware datetimes wherever possible, but especially when constructing UTC timestamps.
Practical DST-mitigation best practice#
Store timestamps in UTC (aware), convert to/from local zones only at system boundaries (input/output/reporting).
Basic example: get “now” in UTC (aware)
import datetime as dt
now_utc = dt.datetime.now(dt.timezone.utc)
Advanced example: convert to a real timezone via zoneinfo
The zoneinfo library supports the IANA timezone database and uses system tzdata or the first-party tzdata package if system data is unavailable.
import datetime as dt
from zoneinfo import ZoneInfo
utc = dt.datetime(2026, 3, 31, 12, 0, tzinfo=dt.timezone.utc)
phoenix = utc.astimezone(ZoneInfo("America/Phoenix"))
print(phoenix)
DST ambiguity and fold (PEP 495)#
When clocks “fall back,” a local wall-clock time can occur twice. PEP 495 adds a fold attribute to disambiguate repeated local times. During ambiguous transitions, fold=0 uses the offset before the transition and fold=1 uses the offset after.
Basic example: show the repeated hour
import datetime as dt
from zoneinfo import ZoneInfo
la = ZoneInfo("America/Los_Angeles")
t = dt.datetime(2020, 11, 1, 1, 0, tzinfo=la) # fold=0 default
t2 = t.replace(fold=1) # the "second" 1:00 AM
print(t) # 2020-11-01 01:00:00-07:00
print(t2) # 2020-11-01 01:00:00-08:00
Measuring durations#
For timeouts and elapsed-time measurements, you want clocks that don’t jump if system time is adjusted (NTP, manual changes). time.monotonic() avoids this issue and should be used any time you are doing these kinds of measurements.
Basic example: timeout loop using time.monotonic()
import time
deadline = time.monotonic() + 2.0 # 2 second timeout
while time.monotonic() < deadline:
# poll / retry / wait
pass
Advanced example: latency measurement with perf_counter()
Higher precision for benchmarking performance.
import time
start = time.perf_counter()
# ... do work ...
elapsed = time.perf_counter() - start
print(f"Elapsed: {elapsed:.6f}s")
Performance, context managers, and concurrency#
Performance: CPython, C-backed operations, and measurement tools#
When you import a module, it can be implemented in Python or C, and a large part of Python's core functionality is implemented in C, which is why you want to use built-in functions wherever possible.
Measurement: timeit for microbenchmarks, profilers for whole programs#
The timeit module avoids a number of common traps in timing small snippets. For profiling whole programs, profile and cProfile (implemented in C) can be used to track where time is being lost during a run.
Basic example: timeit compares alternative shapes
import timeit
print(timeit.timeit("sum(range(1000))", number=50_000))
print(timeit.timeit("total=0\nfor i in range(1000): total+=i", number=50_000))
Advanced example: profiling
import cProfile
import pstats
def main():
# ... run your workload ...
pass
cProfile.run("main()", "out.prof")
p = pstats.Stats("out.prof").sort_stats("cumulative")
p.print_stats(20)
Context managers (with)#
The language reference explains that with wraps a block using methods defined by a context manager, encapsulating common try/finally patterns.
Basic example: open a file safely
with open("data.txt", "r", encoding="utf-8") as f:
text = f.read()
Advanced example: a custom context manager with contextlib
from contextlib import contextmanager
@contextmanager
def opened(path: str):
f = open(path, "r", encoding="utf-8")
try:
yield f
finally:
f.close()
with opened("data.txt") as f:
print(f.readline())
Concurrency: futures, threads, executors, and common pitfalls#
The GIL (Global Interpreter Lock) in plain language#
In CPython, the GIL is the mechanism that ensures only one thread executes Python bytecode at a time, simplifying thread-safety for core objects like dict but limiting CPU-bound parallelism.
That means:
- Threads are often excellent when you’re I/O-bound (waiting on network/disk).
- Threads often disappoint for CPU-heavy pure-Python workloads
Because most of our work revolves around sending and receiving data over the network, threaded execution works fine.
Free-threaded Python (GIL optional) context#
Starting with Python 3.13, CPython supports an experimental “free-threading” build where the GIL is disabled, enabling true parallel thread execution, but it is not the default configuration.
concurrent.futures: high-level async execution#
The threading library provides granular control of threads, but for more hands-off, managed usage concurrent.futures provides a ThreadPoolExecutor class which can manage multiple threads and limit the number of active threads.
Basic example: ThreadPoolExecutor with futures
from concurrent.futures import ThreadPoolExecutor
def work(x: int) -> int:
return x * x
with ThreadPoolExecutor(max_workers=4) as ex:
fut = ex.submit(work, 12)
print(fut.result())
When used as a context manager like this, the executor will also automatically wait for all threads to shutdown before it exits.
Regex performance notes in concurrent or high-throughput code#
If you apply regex repeatedly in hot paths, compile patterns once (for clarity and sometimes speed). But also remember Python caches recently compiled patterns, so “compile everywhere” isn’t always necessary if you only use a few patterns.
Basic example: reuse compiled regex
import re
PAT = re.compile(r"\bALARM\b")
def has_alarm(msg: str) -> bool:
return PAT.search(msg) is not None
Advanced example: avoid regex if string methods suffice
def has_alarm_fast(msg: str) -> bool:
return "ALARM" in msg