Data Sources¶
HolySheet auto-detects and converts data from multiple Python data formats. You never need to manually transform data โ just pass it directly to any block that accepts data.
Supported Formats¶
List of Dicts¶
The most common format. Each dict is a row, keys are column names.
data = [
{"month": "Jan", "revenue": 124_500, "costs": 78_200},
{"month": "Feb", "revenue": 138_200, "costs": 82_100},
{"month": "Mar", "revenue": 152_800, "costs": 85_600},
]
LineChart(title="Revenue Trend", data=data, x="month", y=["revenue", "costs"])
This is the internal format
Internally, HolySheet converts all data to list[dict[str, Any]]. If your data is already in this format, it passes through with minimal overhead (just value sanitization).
Dict of Lists¶
Column-oriented format where keys are column names and values are lists.
data = {
"month": ["Jan", "Feb", "Mar", "Apr"],
"revenue": [124_500, 138_200, 152_800, 149_300],
"costs": [78_200, 82_100, 85_600, 83_900],
}
LineChart(title="Revenue Trend", data=data, x="month", y="revenue")
Uniform Length Required
All lists must have the same length. If column lengths differ, a DataConversionError is raised:
Pandas DataFrames¶
import pandas as pd
df = pd.DataFrame({
"month": ["Jan", "Feb", "Mar", "Apr"],
"revenue": [124_500, 138_200, 152_800, 149_300],
"costs": [78_200, 82_100, 85_600, 83_900],
})
LineChart(title="Revenue Trend", data=df, x="month", y=["revenue", "costs"])
DataTable(title="Raw Data", data=df)
How it works: Internally calls df.to_dict(orient='records') and then sanitizes each value.
Polars DataFrames¶
import polars as pl
df = pl.DataFrame({
"month": ["Jan", "Feb", "Mar", "Apr"],
"revenue": [124_500, 138_200, 152_800, 149_300],
"costs": [78_200, 82_100, 85_600, 83_900],
})
LineChart(title="Revenue Trend", data=df, x="month", y=["revenue", "costs"])
DataTable(title="Raw Data", data=df)
How it works: Internally calls df.to_dicts() and then sanitizes each value.
Value Sanitization¶
HolySheet automatically cleans values for safe JSON serialization. This happens transparently on all data formats.
| Input Type | Output | Example |
|---|---|---|
None | None | โ |
float NaN | None | float('nan') โ None |
float Inf | None | float('inf') โ None |
Decimal | float | Decimal("3.14") โ 3.14 |
datetime | ISO string | datetime(2024, 1, 15) โ "2024-01-15T00:00:00" |
date | ISO string | date(2024, 1, 15) โ "2024-01-15" |
bytes | UTF-8 string | b"hello" โ "hello" |
| NumPy scalar | Native Python | np.float64(3.14) โ 3.14 |
| NumPy NaN | None | np.nan โ None |
Automatic NaN Handling
import pandas as pd
import numpy as np
df = pd.DataFrame({
"name": ["Alice", "Bob", "Carol"],
"score": [95, np.nan, 87], # NaN is auto-converted to None
"date": pd.to_datetime(["2024-01-01", "2024-02-01", "2024-03-01"]),
})
# Just pass it โ HolySheet handles the NaN and datetime conversion
DataTable(title="Scores", data=df)
The to_records() Function¶
Under the hood, all data conversion goes through holysheet.data.to_records():
from holysheet.data import to_records
# Convert any supported format to list[dict]
records = to_records(my_dataframe)
records = to_records(my_dict_of_lists)
records = to_records(my_list_of_dicts)
# Returns: [{"col1": val1, "col2": val2}, ...]
Conversion Flow¶
Input Data (any format)
โ
โโโ list[dict] โโโ Clean values โโโ list[dict]
โโโ dict[str, list] โโโ Transpose โโโ Clean values โโโ list[dict]
โโโ pd.DataFrame โโโ .to_dict("records") โโโ Clean values โโโ list[dict]
โโโ pl.DataFrame โโโ .to_dicts() โโโ Clean values โโโ list[dict]
Error Handling¶
If data cannot be converted, a DataConversionError is raised:
from holysheet.exceptions import DataConversionError
try:
records = to_records("not valid data")
except DataConversionError as e:
print(e.message) # "Unsupported data type: str"
print(e.source_type) # "str"
Tips¶
Mixing Data Sources¶
You can use different data formats across blocks in the same report:
import pandas as pd
# Some data as dicts
kpi_data = {"revenue": 2_260_000, "users": 42_000}
# Chart data from pandas
chart_df = pd.read_csv("monthly_revenue.csv")
# Table data as list of dicts
customers = [
{"name": "Acme Corp", "mrr": "$12,400"},
{"name": "GlobalTech", "mrr": "$9,800"},
]
report = Report(title="Mixed Sources", theme="dark")
report.add(KPI(label="Revenue", value=f"${kpi_data['revenue']:,.0f}"))
report.add(LineChart(title="Trend", data=chart_df, x="month", y="revenue"))
report.add(DataTable(title="Top Customers", data=customers))
Large DataFrames¶
Performance Note
All data is embedded in the HTML file as JSON. Very large datasets (100K+ rows) will increase file size and may impact browser performance. Consider:
- Aggregating data before passing to charts
- Limiting DataTable rows to a reasonable size
- Using
paginated=True(default) for large tables
Data is Serialized at Export Time¶
Data conversion happens when you call export_html(), export_json(), or export_folder() โ not when you create the block. This means you can modify your DataFrames after adding them to blocks:
df = pd.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
chart = LineChart(title="Chart", data=df, x="x", y="y")
report.add(chart)
# โ ๏ธ This modification WILL be reflected in the export
# because 'data' holds a reference to df
df["y"] = [100, 200, 300]
report.export_html("report.html") # Uses the modified data