sleepless.data.dataset#
Classes
|
Generic multi-subset filelist dataset that yields samples. |
|
Generic multi-protocol/subset filelist dataset that yields samples. |
- class sleepless.data.dataset.JSONDataset(protocols, fieldnames, loader)[source]#
Bases:
object
Generic multi-protocol/subset filelist dataset that yields samples.
To create a new dataset, you need to provide one or more JSON formatted filelists (one per protocol) with the following contents:
{ "subset1": [ [ "value1", "value2", "value3" ], [ "value4", "value5", "value6" ] ], "subset2": [ ] }
Your dataset many contain any number of subsets, but all sample entries must contain the same number of fields.
- Parameters:
protocols (
Union
[Iterable
[tuple
[str
|Path
|Traversable
,Mapping
]],dict
[str
,tuple
[str
|Path
|Traversable
,Mapping
]]]) – Paths to one or more JSON formatted files containing the various protocols to be recognized by this dataset, or a dictionary, mapping protocol names to paths (or opened file objects) of CSV files. Internally, we save a dictionary where keys default to the basename of paths (list input).fieldnames (
Iterable
[str
]) – An iterable over the field names (strings) to assign to each entry in the JSON file. It should have as many items as fields in each entry of the JSON file.loader (
Callable
) –A function that receives as input, a context dictionary (with at least a “protocol” and “subset” keys indicating which protocol and subset are being served), and a dictionary with
{fieldname: value}
entries, and returns an object with at least 2 attributes:key
: which must be a unique string for every sample acrosssubsets in a protocol, and
data
: which contains the data associated witht this sample
- check(limit=0)[source]#
For each protocol, check if all data can be correctly accessed.
This function assumes each sample has a
data
and akey
attribute. Thekey
attribute should be a string, or representable as such.
- subsets(protocol)[source]#
Returns all subsets in a protocol.
This method will load JSON information for a given protocol and return all subsets of the given protocol after converting each entry through the loader function.
Parameters:
- Parameters:
protocol (
str
) – Name of the protocol data to load- Return type:
dict
[str
,list
[DelayedSample
]]- Returns:
A dictionary mapping subset names to lists of objects (respecting the
key
,data
interface).
- class sleepless.data.dataset.CSVDataset(subsets, fieldnames, loader)[source]#
Bases:
object
Generic multi-subset filelist dataset that yields samples.
To create a new dataset, you only need to provide a CSV formatted filelist using any separator (e.g. comma, space, semi-colon) with the following information:
value1,value2,value3 value4,value5,value6 ...
Notice that all rows must have the same number of entries.
- Parameters:
subsets (
Union
[Iterable
[str
],dict
[str
,str
]]) – Paths to one or more CSV formatted files containing the various subsets to be recognized by this dataset, or a dictionary, mapping subset names to paths (or opened file objects) of CSV files. Internally, we save a dictionary where keys default to the basename of paths (list input).fieldnames (
Iterable
[str
]) – An iterable over the field names (strings) to assign to each column in the CSV file. It should have as many items as fields in each row of the CSV file(s).loader (
Callable
) – A function that receives as input, a context dictionary (with, at least, a “subset” key indicating which subset is being served), and a dictionary with{key: path}
entries, and returns a dictionary with the loaded data.
- check(limit=0)[source]#
For each subset, check if all data can be correctly accessed.
This function assumes each sample has a
data
and akey
attribute. Thekey
attribute should be a string, or representable as such.
- subsets()[source]#
Returns all available subsets at once.
- Return type:
dict
[str
,list
[DelayedSample
]]- Returns:
A dictionary mapping subset names to lists of objects (respecting the
key
,data
interface).
- samples(subset)[source]#
Returns all samples in a subset.
This method will load CSV information for a given subset and return all samples of the given subset after passing each entry through the loading function.
- Parameters:
subset (
str
) – Name of the subset data to load- Return type:
- Returns:
A lists of objects (respecting the
key
,data
interface).