When I want to save the current state of a pandas DataFrame for “manual consumption”, I often write df.to_excel('foo.xlsx')
within my IPython session or Jupyter Notebook. However, the default style does not look pretty and often needs manual adjustments (e.g. column widths) to be usable.
If you want to create custom reports from pandas, you therefore might want to to this styling programmatically. Package openpyxl does this job nicely. Here is a motivational example that shows you the basic ways of its API. First, have some random data:
import pandas as pd from io import StringIO df = pd.read_csv(StringIO("""\ alpha beta gamma 2000-01-01 -0.173215 0.119209 -1.044236 2000-01-02 -0.861849 -2.104569 -0.494929 2000-01-03 1.071804 0.721555 -0.706771 2000-01-04 -1.039575 0.271860 -0.424972 2000-01-05 0.567020 0.276232 -1.087401 2000-01-06 -0.673690 0.113648 -1.478427 2000-01-07 0.524988 0.404705 0.577046 2000-01-08 -1.715002 -1.039268 -0.370647 """), sep="\s+", parse_dates=True) output_filename = 'pandas-to-excel.xlsx'
No customization
Just for comparison, a plain export of the dataframe to the first worksheet:
sheet_name = 'No customization' df.to_excel(output_filename, sheet_name)
Little customisation
As the first column is not readable with the default column width, we increase it slightly. The unit roughly corresponds to Excel’s column width unit. On my machine, the resulting worksheets has a column width of 20.29…
with pd.ExcelWriter( output_filename, mode='a', # append; default='w' (overwrite) engine='openpyxl') as xlsx: sheet_name = 'Little customization' df.to_excel(xlsx, sheet_name) # set index column width ws = xlsx.sheets[sheet_name] ws.column_dimensions['A'].width = 21
Full customisation
Motivational example of what you can do with some more specifications, namely:
- Conditional formatting, here with a custom color scale using percentiles for lower and upper bound (10%-90%), while the midpoint is defined by value (0 is white). Symbols and color bars are possible as well, of course.
- Number format, using the same syntax as in Excel’s cell properties dialogue. Here, the index column’s date includes the shortened weekday name (Mo-Su).
- Applying a builtin named style by name (Headline 2) to the title row.
- Creating and applying a custom named style (Index Style) including number format, font and alignment.
First, the imports and rule/style definitions:
from openpyxl.formatting.rule import ColorScaleRule from openpyxl.styles import Alignment, Font, NamedStyle from openpyxl.utils import get_column_letter percentile_rule = ColorScaleRule( start_type='percentile', start_value=10, start_color='ffaaaa', # red-ish mid_type='num', mid_value=0, mid_color='ffffff', # white end_type='percentile', end_value=90, end_color='aaffaa') # green-ish # custom named style for the index index_style = NamedStyle( name="Index Style", number_format='YYYY-MM-DD, DDD', font=Font(color='999999', italic=True), alignment=Alignment(horizontal='left')) # pass keyword args as dictionary writer_args = { 'path': output_filename, 'mode': 'a', 'engine': 'openpyxl'}
With that, the actual writing looks like this:
with pd.ExcelWriter(**writer_args) as xlsx: sheet_name = 'Full customization' df.to_excel(xlsx, sheet_name) ws = xlsx.sheets[sheet_name] # cell ranges index_column = 'A' value_cells = 'B2:{col}{row}'.format( col=get_column_letter(ws.max_column), row=ws.max_row) title_row = '1' # index column width ws.column_dimensions[index_column].width = 21 # color all value cells ws.conditional_formatting.add(value_cells, percentile_rule) # for general styling, one has to iterate over # all cells individually for row in ws[value_cells]: for cell in row: cell.number_format = '0.00' # builtin or named styles can be applied by using # the style object or their name (shown below) for cell in ws[index_column]: cell.style = index_style # style title row last, so that headline style # wins over index style in top-left cell A1 for cell in ws[title_row]: cell.style = 'Headline 2'
If you want the above in one script, try my gist pandas-to-excel.py. It creates a single Excel file with the three appropriately named spreadsheets.