d6t.stack package

Submodules

d6t.stack.combine_csv module

class d6t.stack.combine_csv.CombinerCSV(fname_list, sep=', ', all_strings=False, header_row=0, skiprows=0, nrows_preview=3, logger=None)[source]

Bases: object

Core combiner class. Checks columns, generates preview, combines.

Parameters:
  • fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
  • sep (string) – CSV delimiter, see pandas.read_csv()
  • all_strings (boolean) – read all values as strings (faster)
  • header_row (int) – header row, see pandas.read_csv()
  • skiprows (int) – rows to skip at top of file, see pandas.read_csv()
  • nrows_preview (boolean) – number of rows in preview
  • logger (object) – logger object with send_log()
combine(is_col_common=False, is_preview=False)[source]

Combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:
  • is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
  • is_preview (bool) – read only self.nrows_preview top rows
Returns:

pandas dataframe with combined data from all files

Return type:

df_all (dataframe)

combine_preview(is_col_common=False)[source]

Preview of combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
Returns:pandas dataframe with combined data from all files, only self.nrows_preview top rows
Return type:df_all (dataframe)
preview_columns()[source]

Checks column consistency in list of files. It checks both presence and order of columns in all files

Returns:
results dictionary with
files_columns (dict): dictionary with information, keys = filename, value = list of columns in file columns_all (list): all columns in files columns_common (list): only columns present in every file is_all_equal (boolean): all files equal in all files? df_columns_present (dataframe): which columns are present in which file? df_columns_order (dataframe): where in the file is the column?
Return type:col_preview (dict)
read_csv(fname, is_preview=False, chunksize=None)[source]
read_csv_all(msg=None, is_preview=False, chunksize=None, cfg_col_sel=None, cfg_col_rename={})[source]
class d6t.stack.combine_csv.CombinerCSVAdvanced(combiner, cfg_col_sel, cfg_col_rename={})[source]

Bases: object

combine()[source]
combine_preview()[source]
combine_preview_save(fname_out)[source]
combine_save(fname_out)[source]
d6t.stack.combine_csv.sniff_settings_csv(fname_list)[source]

d6t.stack.combine_files module

d6t.stack.combine_xls module

class d6t.stack.combine_xls.XLStoCSVMultiFile(fname_list, cfg_xls_sheets_sel_mode, cfg_xls_sheets_sel, logger=None)[source]

Bases: object

Converts xls|xlsx files to csv files. Selects a SINGLE SHEET from each file. To extract MULTIPLE SHEETS from a file use XLStoCSVMultiSheet

Parameters:
  • fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’]
  • cfg_xls_sheets_sel_mode (string) –

    mode to select tabs

    • name: select by name, provide name for each file, can customize by file
    • name_global: select by name, one name for all files
    • idx: select by index, provide index for each file, can customize by file
    • idx_global: select by index, one index for all files
  • cfg_xls_sheets_sel (list) – values to select tabs NEEDS TO BE IN THE SAME ORDER AS `fname_list`
  • logger (object) – logger object with send_log(), optional
convert_all()[source]

Executes conversion. Writes to the same path as file and appends .csv to filename.

Returns:output file names
Return type:list
set_files(fname_list)[source]

Update input files. You will also need to update sheet selection with .set_select_mode().

Parameters:fname_list (list) – see class description for details
set_select_mode(cfg_xls_sheets_sel_mode, cfg_xls_sheets_sel)[source]

Update sheet selection values

Parameters:
  • cfg_xls_sheets_sel_mode (string) – see class description for details
  • cfg_xls_sheets_sel (list) – see class description for details
class d6t.stack.combine_xls.XLStoCSVMultiSheet(fname, logger=None)[source]

Bases: object

Converts ALL SHEETS from a SINGLE xls|xlsx files to separate csv files

Parameters:
  • fname (string) – file path
  • logger (object) – logger object with send_log()
convert_all()[source]
set_files(fname)[source]

d6t.stack.helpers module

Module with several helper functions

class d6t.stack.helpers.PrintLogger[source]

Bases: object

send(data)[source]
send_log(msg, status)[source]
d6t.stack.helpers.check_valid_xls(fname_list)[source]
d6t.stack.helpers.cols_filename_tofront(_list)[source]
d6t.stack.helpers.columns_all_equal(col_list)[source]

Checks that all lists in col_list are equal.

Parameters:col_list (list) – columns, eg [[‘a’,’b’],[‘a’,’b’,’c’]]
Returns:all lists in list are equal?
Return type:bool
d6t.stack.helpers.df_filename_tofront(dfg)[source]
d6t.stack.helpers.file_extensions_all_equal(ext_list)[source]

Checks that all file extensions are equal.

Parameters:ext_list (list) – file extensions, eg [‘.csv’,’.csv’]
Returns:all extensions are equal to first extension in list?
Return type:bool
d6t.stack.helpers.file_extensions_contains_csv(ext_list)[source]
d6t.stack.helpers.file_extensions_contains_xls(ext_list)[source]
d6t.stack.helpers.file_extensions_contains_xlsx(ext_list)[source]
d6t.stack.helpers.file_extensions_get(fname_list)[source]

Returns file extensions in list

Parameters:fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
Returns:file extensions for each file name in input list, eg [‘.csv’,’.csv’]
Return type:list
d6t.stack.helpers.file_extensions_valid(ext_list)[source]

Checks if file list contains only valid files

Notes

Assumes all file extensions are equal! Only checks first file

Parameters:ext_list (list) – file extensions, eg [‘.csv’,’.csv’]
Returns:first element in list is one of [‘.csv’,’.txt’,’.xls’,’.xlsx’]?
Return type:bool
d6t.stack.helpers.list_common(_list, sort=True)[source]
d6t.stack.helpers.list_tofront(_list, val)[source]
d6t.stack.helpers.list_unique(_list, sort=True)[source]

d6t.stack.helpers_ui module

d6t.stack.helpers_ui.column_mismatch_dict(fname_col)[source]
d6t.stack.helpers_ui.combined_preview(df_all, df_all_preview, fname_out, cfg_settings, cfg_is_xls, cfg_return_df)[source]
d6t.stack.helpers_ui.preview_dict(df)[source]

d6t.stack.sniffer module

Finds CSV settings and Excel sheets in multiple files. Often needed as input for stacking

class d6t.stack.sniffer.CSVSniffer(fname, nlines=10, delims=', ;t|')[source]

Bases: object

Automatically detects settings needed to read csv files. SINGLE file only, for MULTI file use CSVSnifferList

Parameters:
  • fname (string) – file path
  • nlines (int) – number of lines to sample from each file
  • delims (string) – possible delimiters, default ‘,; |
check_column_length_consistent()[source]
count_skiprows()[source]
get_delim()[source]
has_header()[source]
has_header_inverse()[source]
read_nlines()[source]
scan_delim()[source]
class d6t.stack.sniffer.CSVSnifferList(fname_list, nlines=10, delims=', ;t|')[source]

Bases: object

Automatically detects settings needed to read csv files. MULTI file use

Parameters:
  • fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
  • nlines (int) – number of lines to sample from each file
  • delims (string) – possible delimiters, default ‘,; |
count_skiprows()[source]
get_all(fun_name, msg_error)[source]
get_delim()[source]
has_header()[source]
class d6t.stack.sniffer.XLSSniffer(fname_list, logger=None)[source]

Bases: object

Extracts available sheets from MULTIPLE Excel files and runs diagnostics

Parameters:
  • fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’]
  • logger (object) – logger object with send_log(), optional
all_contain_sheetname(sheet_name)[source]

Check if all files contain a certain sheet

Parameters:sheet_name (string) – sheetname to check
Returns:If true
Return type:boolean
all_have_idx(sheet_idx)[source]

Check if all files contain a certain index

Parameters:sheet_idx (string) – index to check
Returns:If true
Return type:boolean
all_same_count()[source]

Check if all files contain the same number of sheets

Parameters:sheet_idx (string) – index to check
Returns:If true
Return type:boolean
all_same_names()[source]
sniff()[source]

Executes sniffer

Returns:True if everything ok. Results are accessible in .df_xls_sheets
Return type:boolean
d6t.stack.sniffer.csv_count_rows(fname)[source]

d6t.stack.stack_csv module

class d6t.stack.stack_csv.CombinerCSV(fname_list, sep=', ', all_strings=False, header_row=0, skiprows=0, nrows_preview=5, logger=None)[source]

Bases: object

Core combiner class. Checks columns, generates preview, combines.

Parameters:
  • fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
  • sep (string) – CSV delimiter, see pandas.read_csv()
  • all_strings (boolean) – read all values as strings (faster)
  • header_row (int) – header row, see pandas.read_csv()
  • skiprows (int) – rows to skip at top of file, see pandas.read_csv()
  • nrows_preview (boolean) – number of rows in preview
  • logger (object) – logger object with send_log()
combine(is_col_common=False, is_preview=False)[source]

Combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:
  • is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
  • is_preview (bool) – read only self.nrows_preview top rows
Returns:

pandas dataframe with combined data from all files

Return type:

df_all (dataframe)

combine_preview(is_col_common=False)[source]

Preview of combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
Returns:pandas dataframe with combined data from all files, only self.nrows_preview top rows
Return type:df_all (dataframe)
preview_columns()[source]

Checks column consistency in list of files. It checks both presence and order of columns in all files

Returns:
results dictionary with
files_columns (dict): dictionary with information, keys = filename, value = list of columns in file columns_all (list): all columns in files columns_common (list): only columns present in every file is_all_equal (boolean): all files equal in all files? df_columns_present (dataframe): which columns are present in which file? df_columns_order (dataframe): where in the file is the column?
Return type:col_preview (dict)
read_csv(fname, is_preview=False, chunksize=None)[source]
read_csv_all(msg=None, is_preview=False, chunksize=None, cfg_col_sel=None, cfg_col_rename={})[source]
class d6t.stack.stack_csv.CombinerCSVAdvanced(combiner, cfg_col_sel, cfg_col_rename={})[source]

Bases: object

combine()[source]
combine_preview()[source]
combine_preview_save(fname_out)[source]
combine_save(fname_out)[source]
d6t.stack.stack_csv.sniff_settings_csv(fname_list)[source]

Module contents