d6t.stack package¶
Submodules¶
d6t.stack.combine_csv module¶
-
class
d6t.stack.combine_csv.
CombinerCSV
(fname_list, sep=', ', all_strings=False, header_row=0, skiprows=0, nrows_preview=3, logger=None)[source]¶ Bases:
object
Core combiner class. Checks columns, generates preview, combines.
Parameters: - fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
- sep (string) – CSV delimiter, see pandas.read_csv()
- all_strings (boolean) – read all values as strings (faster)
- header_row (int) – header row, see pandas.read_csv()
- skiprows (int) – rows to skip at top of file, see pandas.read_csv()
- nrows_preview (boolean) – number of rows in preview
- logger (object) – logger object with send_log()
-
combine
(is_col_common=False, is_preview=False)[source]¶ Combines all files
Note
Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations
Parameters: - is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
- is_preview (bool) – read only self.nrows_preview top rows
Returns: pandas dataframe with combined data from all files
Return type: df_all (dataframe)
-
combine_preview
(is_col_common=False)[source]¶ Preview of combines all files
Note
Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations
Parameters: is_col_common (bool) – keep only common columns? If false returns all columns filled with nans Returns: pandas dataframe with combined data from all files, only self.nrows_preview top rows Return type: df_all (dataframe)
-
preview_columns
()[source]¶ Checks column consistency in list of files. It checks both presence and order of columns in all files
Returns: - results dictionary with
- files_columns (dict): dictionary with information, keys = filename, value = list of columns in file columns_all (list): all columns in files columns_common (list): only columns present in every file is_all_equal (boolean): all files equal in all files? df_columns_present (dataframe): which columns are present in which file? df_columns_order (dataframe): where in the file is the column?
Return type: col_preview (dict)
d6t.stack.combine_files module¶
d6t.stack.combine_xls module¶
-
class
d6t.stack.combine_xls.
XLStoCSVMultiFile
(fname_list, cfg_xls_sheets_sel_mode, cfg_xls_sheets_sel, logger=None)[source]¶ Bases:
object
Converts xls|xlsx files to csv files. Selects a SINGLE SHEET from each file. To extract MULTIPLE SHEETS from a file use XLStoCSVMultiSheet
Parameters: - fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’]
- cfg_xls_sheets_sel_mode (string) –
mode to select tabs
name
: select by name, provide name for each file, can customize by filename_global
: select by name, one name for all filesidx
: select by index, provide index for each file, can customize by fileidx_global
: select by index, one index for all files
- cfg_xls_sheets_sel (list) – values to select tabs NEEDS TO BE IN THE SAME ORDER AS `fname_list`
- logger (object) – logger object with send_log(), optional
-
convert_all
()[source]¶ Executes conversion. Writes to the same path as file and appends .csv to filename.
Returns: output file names Return type: list
d6t.stack.helpers module¶
Module with several helper functions
-
d6t.stack.helpers.
columns_all_equal
(col_list)[source]¶ Checks that all lists in col_list are equal.
Parameters: col_list (list) – columns, eg [[‘a’,’b’],[‘a’,’b’,’c’]] Returns: all lists in list are equal? Return type: bool
-
d6t.stack.helpers.
file_extensions_all_equal
(ext_list)[source]¶ Checks that all file extensions are equal.
Parameters: ext_list (list) – file extensions, eg [‘.csv’,’.csv’] Returns: all extensions are equal to first extension in list? Return type: bool
-
d6t.stack.helpers.
file_extensions_get
(fname_list)[source]¶ Returns file extensions in list
Parameters: fname_list (list) – file names, eg [‘a.csv’,’b.csv’] Returns: file extensions for each file name in input list, eg [‘.csv’,’.csv’] Return type: list
-
d6t.stack.helpers.
file_extensions_valid
(ext_list)[source]¶ Checks if file list contains only valid files
Notes
Assumes all file extensions are equal! Only checks first file
Parameters: ext_list (list) – file extensions, eg [‘.csv’,’.csv’] Returns: first element in list is one of [‘.csv’,’.txt’,’.xls’,’.xlsx’]? Return type: bool
d6t.stack.helpers_ui module¶
d6t.stack.sniffer module¶
Finds CSV settings and Excel sheets in multiple files. Often needed as input for stacking
-
class
d6t.stack.sniffer.
CSVSniffer
(fname, nlines=10, delims=', ;t|')[source]¶ Bases:
object
Automatically detects settings needed to read csv files. SINGLE file only, for MULTI file use CSVSnifferList
Parameters: - fname (string) – file path
- nlines (int) – number of lines to sample from each file
- delims (string) – possible delimiters, default ‘,; |’
-
class
d6t.stack.sniffer.
CSVSnifferList
(fname_list, nlines=10, delims=', ;t|')[source]¶ Bases:
object
Automatically detects settings needed to read csv files. MULTI file use
Parameters: - fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
- nlines (int) – number of lines to sample from each file
- delims (string) – possible delimiters, default ‘,; |’
-
class
d6t.stack.sniffer.
XLSSniffer
(fname_list, logger=None)[source]¶ Bases:
object
Extracts available sheets from MULTIPLE Excel files and runs diagnostics
Parameters: - fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’]
- logger (object) – logger object with send_log(), optional
-
all_contain_sheetname
(sheet_name)[source]¶ Check if all files contain a certain sheet
Parameters: sheet_name (string) – sheetname to check Returns: If true Return type: boolean
-
all_have_idx
(sheet_idx)[source]¶ Check if all files contain a certain index
Parameters: sheet_idx (string) – index to check Returns: If true Return type: boolean
d6t.stack.stack_csv module¶
-
class
d6t.stack.stack_csv.
CombinerCSV
(fname_list, sep=', ', all_strings=False, header_row=0, skiprows=0, nrows_preview=5, logger=None)[source]¶ Bases:
object
Core combiner class. Checks columns, generates preview, combines.
Parameters: - fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
- sep (string) – CSV delimiter, see pandas.read_csv()
- all_strings (boolean) – read all values as strings (faster)
- header_row (int) – header row, see pandas.read_csv()
- skiprows (int) – rows to skip at top of file, see pandas.read_csv()
- nrows_preview (boolean) – number of rows in preview
- logger (object) – logger object with send_log()
-
combine
(is_col_common=False, is_preview=False)[source]¶ Combines all files
Note
Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations
Parameters: - is_col_common (bool) – keep only common columns? If false returns all columns filled with nans
- is_preview (bool) – read only self.nrows_preview top rows
Returns: pandas dataframe with combined data from all files
Return type: df_all (dataframe)
-
combine_preview
(is_col_common=False)[source]¶ Preview of combines all files
Note
Unlike CombinerCSVAdvanced.combine() this function supports simple combine operations
Parameters: is_col_common (bool) – keep only common columns? If false returns all columns filled with nans Returns: pandas dataframe with combined data from all files, only self.nrows_preview top rows Return type: df_all (dataframe)
-
preview_columns
()[source]¶ Checks column consistency in list of files. It checks both presence and order of columns in all files
Returns: - results dictionary with
- files_columns (dict): dictionary with information, keys = filename, value = list of columns in file columns_all (list): all columns in files columns_common (list): only columns present in every file is_all_equal (boolean): all files equal in all files? df_columns_present (dataframe): which columns are present in which file? df_columns_order (dataframe): where in the file is the column?
Return type: col_preview (dict)