sleepless.data.dataset¶
Classes
|
Generic multi-subset filelist dataset that yields samples. |
|
Generic multi-protocol/subset filelist dataset that yields samples. |
- class sleepless.data.dataset.JSONDataset(protocols, fieldnames, loader)[source]¶
Bases:
objectGeneric multi-protocol/subset filelist dataset that yields samples.
To create a new dataset, you need to provide one or more JSON formatted filelists (one per protocol) with the following contents:
{ "subset1": [ [ "value1", "value2", "value3" ], [ "value4", "value5", "value6" ] ], "subset2": [ ] }
Your dataset many contain any number of subsets, but all sample entries must contain the same number of fields.
- Parameters:
protocols (
Union[Iterable[tuple[str|Path|Traversable,Mapping]],dict[str,tuple[str|Path|Traversable,Mapping]]]) – Paths to one or more JSON formatted files containing the various protocols to be recognized by this dataset, or a dictionary, mapping protocol names to paths (or opened file objects) of CSV files. Internally, we save a dictionary where keys default to the basename of paths (list input).fieldnames (
Iterable[str]) – An iterable over the field names (strings) to assign to each entry in the JSON file. It should have as many items as fields in each entry of the JSON file.loader (
Callable) –A function that receives as input, a context dictionary (with at least a “protocol” and “subset” keys indicating which protocol and subset are being served), and a dictionary with
{fieldname: value}entries, and returns an object with at least 2 attributes:key: which must be a unique string for every sample acrosssubsets in a protocol, and
data: which contains the data associated witht this sample
- check(limit=0)[source]¶
For each protocol, check if all data can be correctly accessed.
This function assumes each sample has a
dataand akeyattribute. Thekeyattribute should be a string, or representable as such.
- subsets(protocol)[source]¶
Returns all subsets in a protocol.
This method will load JSON information for a given protocol and return all subsets of the given protocol after converting each entry through the loader function.
Parameters:
- Parameters:
protocol (
str) – Name of the protocol data to load- Return type:
dict[str,list[DelayedSample]]- Returns:
A dictionary mapping subset names to lists of objects (respecting the
key,datainterface).
- class sleepless.data.dataset.CSVDataset(subsets, fieldnames, loader)[source]¶
Bases:
objectGeneric multi-subset filelist dataset that yields samples.
To create a new dataset, you only need to provide a CSV formatted filelist using any separator (e.g. comma, space, semi-colon) with the following information:
value1,value2,value3 value4,value5,value6 ...
Notice that all rows must have the same number of entries.
- Parameters:
subsets (
Union[Iterable[str],dict[str,str]]) – Paths to one or more CSV formatted files containing the various subsets to be recognized by this dataset, or a dictionary, mapping subset names to paths (or opened file objects) of CSV files. Internally, we save a dictionary where keys default to the basename of paths (list input).fieldnames (
Iterable[str]) – An iterable over the field names (strings) to assign to each column in the CSV file. It should have as many items as fields in each row of the CSV file(s).loader (
Callable) – A function that receives as input, a context dictionary (with, at least, a “subset” key indicating which subset is being served), and a dictionary with{key: path}entries, and returns a dictionary with the loaded data.
- check(limit=0)[source]¶
For each subset, check if all data can be correctly accessed.
This function assumes each sample has a
dataand akeyattribute. Thekeyattribute should be a string, or representable as such.
- subsets()[source]¶
Returns all available subsets at once.
- Return type:
dict[str,list[DelayedSample]]- Returns:
A dictionary mapping subset names to lists of objects (respecting the
key,datainterface).
- samples(subset)[source]¶
Returns all samples in a subset.
This method will load CSV information for a given subset and return all samples of the given subset after passing each entry through the loading function.
- Parameters:
subset (
str) – Name of the subset data to load- Return type:
- Returns:
A lists of objects (respecting the
key,datainterface).