d6t.stack package¶

Submodules¶

d6t.stack.combine_csv module¶

class d6t.stack.combine_csv.CombinerCSV(fname_list, sep=', ', all_strings=False, header_row=0, skiprows=0, nrows_preview=3, logger=None)[source]¶

Bases: object

Core combiner class. Checks columns, generates preview, combines.

Parameters:

fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
sep (string) – CSV delimiter, see pandas.read_csv()
all_strings (boolean) – read all values as strings (faster)
header_row (int) – header row, see pandas.read_csv()
skiprows (int) – rows to skip at top of file, see pandas.read_csv()
nrows_preview (boolean) – number of rows in preview
logger (object) – logger object with send_log()

combine(is_col_common=False, is_preview=False)[source]¶

Combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:	is_col_common (bool) – keep only common columns? If false returns all columns filled with nans is_preview (bool) – read only self.nrows_preview top rows
Returns:	pandas dataframe with combined data from all files
Return type:	df_all (dataframe)

combine_preview(is_col_common=False)[source]¶

Preview of combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:	is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
Returns:	pandas dataframe with combined data from all files, only self.nrows_preview top rows
Return type:	df_all (dataframe)

preview_columns()[source]¶

Checks column consistency in list of files. It checks both presence and order of columns in all files

Returns:

results dictionary with: files_columns (dict): dictionary with information, keys = filename, value = list of columns in file columns_all (list): all columns in files columns_common (list): only columns present in every file is_all_equal (boolean): all files equal in all files? df_columns_present (dataframe): which columns are present in which file? df_columns_order (dataframe): where in the file is the column?

Return type: col_preview (dict)

read_csv(fname, is_preview=False, chunksize=None)[source]¶

read_csv_all(msg=None, is_preview=False, chunksize=None, cfg_col_sel=None, cfg_col_rename={})[source]¶

class d6t.stack.combine_csv.CombinerCSVAdvanced(combiner, cfg_col_sel, cfg_col_rename={})[source]¶

Bases: object

combine()[source]¶

combine_preview()[source]¶

combine_preview_save(fname_out)[source]¶

combine_save(fname_out)[source]¶

d6t.stack.combine_csv.sniff_settings_csv(fname_list)[source]¶

d6t.stack.combine_files module¶

d6t.stack.combine_xls module¶

class d6t.stack.combine_xls.XLStoCSVMultiFile(fname_list, cfg_xls_sheets_sel_mode, cfg_xls_sheets_sel, logger=None)[source]¶

Bases: object

Converts xls|xlsx files to csv files. Selects a SINGLE SHEET from each file. To extract MULTIPLE SHEETS from a file use XLStoCSVMultiSheet

Parameters:

fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’]
cfg_xls_sheets_sel_mode (string) –
mode to select tabs
- name: select by name, provide name for each file, can customize by file
- name_global: select by name, one name for all files
- idx: select by index, provide index for each file, can customize by file
- idx_global: select by index, one index for all files
cfg_xls_sheets_sel (list) – values to select tabs NEEDS TO BE IN THE SAME ORDER AS `fname_list`
logger (object) – logger object with send_log(), optional

convert_all()[source]¶

Executes conversion. Writes to the same path as file and appends .csv to filename.

Returns:	output file names
Return type:	list

set_files(fname_list)[source]¶

Update input files. You will also need to update sheet selection with .set_select_mode().

Parameters:	fname_list (list) – see class description for details

set_select_mode(cfg_xls_sheets_sel_mode, cfg_xls_sheets_sel)[source]¶

Update sheet selection values

Parameters:	cfg_xls_sheets_sel_mode (string) – see class description for details cfg_xls_sheets_sel (list) – see class description for details

class d6t.stack.combine_xls.XLStoCSVMultiSheet(fname, logger=None)[source]¶

Bases: object

Converts ALL SHEETS from a SINGLE xls|xlsx files to separate csv files

Parameters:	fname (string) – file path logger (object) – logger object with send_log()

convert_all()[source]¶

set_files(fname)[source]¶

d6t.stack.helpers module¶

Module with several helper functions

class d6t.stack.helpers.PrintLogger[source]¶

Bases: object

send(data)[source]¶

send_log(msg, status)[source]¶

d6t.stack.helpers.check_valid_xls(fname_list)[source]¶

d6t.stack.helpers.cols_filename_tofront(_list)[source]¶

d6t.stack.helpers.columns_all_equal(col_list)[source]¶

Checks that all lists in col_list are equal.

Parameters:	col_list (list) – columns, eg [[‘a’,’b’],[‘a’,’b’,’c’]]
Returns:	all lists in list are equal?
Return type:	bool

d6t.stack.helpers.df_filename_tofront(dfg)[source]¶

d6t.stack.helpers.file_extensions_all_equal(ext_list)[source]¶

Checks that all file extensions are equal.

Parameters:	ext_list (list) – file extensions, eg [‘.csv’,’.csv’]
Returns:	all extensions are equal to first extension in list?
Return type:	bool

d6t.stack.helpers.file_extensions_contains_csv(ext_list)[source]¶

d6t.stack.helpers.file_extensions_contains_xls(ext_list)[source]¶

d6t.stack.helpers.file_extensions_contains_xlsx(ext_list)[source]¶

d6t.stack.helpers.file_extensions_get(fname_list)[source]¶

Returns file extensions in list

Parameters:	fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
Returns:	file extensions for each file name in input list, eg [‘.csv’,’.csv’]
Return type:	list

d6t.stack.helpers.file_extensions_valid(ext_list)[source]¶

Checks if file list contains only valid files

Notes

Assumes all file extensions are equal! Only checks first file

Parameters:	ext_list (list) – file extensions, eg [‘.csv’,’.csv’]
Returns:	first element in list is one of [‘.csv’,’.txt’,’.xls’,’.xlsx’]?
Return type:	bool

d6t.stack.helpers.list_common(_list, sort=True)[source]¶

d6t.stack.helpers.list_tofront(_list, val)[source]¶

d6t.stack.helpers.list_unique(_list, sort=True)[source]¶

d6t.stack.helpers_ui module¶

d6t.stack.helpers_ui.column_mismatch_dict(fname_col)[source]¶

d6t.stack.helpers_ui.combined_preview(df_all, df_all_preview, fname_out, cfg_settings, cfg_is_xls, cfg_return_df)[source]¶

d6t.stack.helpers_ui.preview_dict(df)[source]¶

d6t.stack.sniffer module¶

Finds CSV settings and Excel sheets in multiple files. Often needed as input for stacking

class d6t.stack.sniffer.CSVSniffer(fname, nlines=10, delims=', ;t|')[source]¶

Bases: object

Automatically detects settings needed to read csv files. SINGLE file only, for MULTI file use CSVSnifferList

Parameters:	fname (string) – file path nlines (int) – number of lines to sample from each file delims (string) – possible delimiters, default ‘,; \|’

check_column_length_consistent()[source]¶

count_skiprows()[source]¶

get_delim()[source]¶

has_header()[source]¶

has_header_inverse()[source]¶

read_nlines()[source]¶

scan_delim()[source]¶

class d6t.stack.sniffer.CSVSnifferList(fname_list, nlines=10, delims=', ;t|')[source]¶

Bases: object

Automatically detects settings needed to read csv files. MULTI file use

Parameters:	fname_list (list) – file names, eg [‘a.csv’,’b.csv’] nlines (int) – number of lines to sample from each file delims (string) – possible delimiters, default ‘,; \|’

count_skiprows()[source]¶

get_all(fun_name, msg_error)[source]¶

get_delim()[source]¶

has_header()[source]¶

class d6t.stack.sniffer.XLSSniffer(fname_list, logger=None)[source]¶

Bases: object

Extracts available sheets from MULTIPLE Excel files and runs diagnostics

Parameters:	fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’] logger (object) – logger object with send_log(), optional

all_contain_sheetname(sheet_name)[source]¶

Check if all files contain a certain sheet

Parameters:	sheet_name (string) – sheetname to check
Returns:	If true
Return type:	boolean

all_have_idx(sheet_idx)[source]¶

Check if all files contain a certain index

Parameters:	sheet_idx (string) – index to check
Returns:	If true
Return type:	boolean

all_same_count()[source]¶

Check if all files contain the same number of sheets

Parameters:	sheet_idx (string) – index to check
Returns:	If true
Return type:	boolean

all_same_names()[source]¶

sniff()[source]¶

Executes sniffer

Returns:	True if everything ok. Results are accessible in `.df_xls_sheets`
Return type:	boolean

d6t.stack.sniffer.csv_count_rows(fname)[source]¶

d6t.stack.stack_csv module¶

class d6t.stack.stack_csv.CombinerCSV(fname_list, sep=', ', all_strings=False, header_row=0, skiprows=0, nrows_preview=5, logger=None)[source]¶

Bases: object

Core combiner class. Checks columns, generates preview, combines.

Parameters:

fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
sep (string) – CSV delimiter, see pandas.read_csv()
all_strings (boolean) – read all values as strings (faster)
header_row (int) – header row, see pandas.read_csv()
skiprows (int) – rows to skip at top of file, see pandas.read_csv()
nrows_preview (boolean) – number of rows in preview
logger (object) – logger object with send_log()

combine(is_col_common=False, is_preview=False)[source]¶

Combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:	is_col_common (bool) – keep only common columns? If false returns all columns filled with nans is_preview (bool) – read only self.nrows_preview top rows
Returns:	pandas dataframe with combined data from all files
Return type:	df_all (dataframe)

combine_preview(is_col_common=False)[source]¶

Preview of combines all files

Note

Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations

Parameters:	is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
Returns:	pandas dataframe with combined data from all files, only self.nrows_preview top rows
Return type:	df_all (dataframe)

preview_columns()[source]¶

Checks column consistency in list of files. It checks both presence and order of columns in all files

Returns:

results dictionary with: files_columns (dict): dictionary with information, keys = filename, value = list of columns in file columns_all (list): all columns in files columns_common (list): only columns present in every file is_all_equal (boolean): all files equal in all files? df_columns_present (dataframe): which columns are present in which file? df_columns_order (dataframe): where in the file is the column?

Return type: col_preview (dict)

read_csv(fname, is_preview=False, chunksize=None)[source]¶

read_csv_all(msg=None, is_preview=False, chunksize=None, cfg_col_sel=None, cfg_col_rename={})[source]¶

class d6t.stack.stack_csv.CombinerCSVAdvanced(combiner, cfg_col_sel, cfg_col_rename={})[source]¶

Bases: object

combine()[source]¶

combine_preview()[source]¶

combine_preview_save(fname_out)[source]¶

combine_save(fname_out)[source]¶

d6t.stack.stack_csv.sniff_settings_csv(fname_list)[source]¶

d6t.stack package¶

Submodules¶

d6t.stack.combine_csv module¶

d6t.stack.combine_files module¶

d6t.stack.combine_xls module¶

d6t.stack.helpers module¶

d6t.stack.helpers_ui module¶

d6t.stack.sniffer module¶

d6t.stack.stack_csv module¶

Module contents¶