Filenames#
The acquisition of large amounts of data often results in a large number of data files that can be difficult to navigate. To simplify this process, it is helpful to use strict but versatile file-naming conventions. In this guide, we will present the file-naming conventions used in our package and provide examples of valid filenames. We will also showcase some common mistakes to avoid.
Our file-naming system was developed to store experimental metadata concisely in a given data file.
This system has evolved over the years and has recently been updated for increased functionality.
We use the FilenameManager
class to easily access the metadata from one or more files simultaneously.
The FilenameManager
provides three methods to parse through multiple filenames, which we will demonstrate in this
guide.
To use this guide, it is helpful to define some terms. “Experimental metadata” refers to the information stored in a data file that describes the experiment’s conditions, equipment, and procedures. “FilenameManager” is a class that provides easy access to filename metadata.
In this guide, we will first present the rules that dictate our filename conventions, which are designed to
simplify data file navigation. We will then provide examples of valid filenames and showcase some of the most common
mistakes to avoid. Additionally, we will show you how to use the FilenameManager
to easily access the metadata from
one or more files simultaneously. For more information on how to use the FilenameManager
and the FilenameInfo
classes, go here.
For a thorough guide on how to write a valid filename, go here.
For a comprehensive list of all the predetermined filename headers, go
here.
Quick and dirty#
To easily access the filename metadata from a file that follows the
filename conventions, use the FilenameManager
class.
Three methods are provided to parse through multiple filenames.
>>> from qdl_zno_analysis.filename_utils.filename_manager import FilenameManager
>>> filenames = ['001_Msc~Example1a_Tmp~1p2.csv', '001_Msc~Example1b_Tmp~1p2.txt', '002_Msc~Example2_Tmp~1p2.csv']
>>> folder = 'some_folder'
>>> # Method 1: Directly provide a list of filenames and the folder path.
>>> # This is useful if you already have a list of filenames that you want to analyze.
>>> fnm = FilenameManager(filenames, folder)
>>> # Method 2: Provide a range of file numbers and a list of file types.
>>> # This is useful if you want to analyze a range of files with specific file types,
>>> # such as all CSV files from file 1 to file 2.
>>> fnm = FilenameManager.from_file_numbers(range(1, 3), None, folder)
>>> # Method 3: Find all files in a folder that match a specific string in their filename.
>>> # This is useful if you want to analyze all files in a folder that have a certain pattern in their name,
>>> # such as all files with "Msc~Example" in the filename.
>>> fnm = FilenameManager.from_matching_string('*Msc~Example*', folder)
All methods will result in the same object. The class that stores all parsed filename information is called
FilenameInfo
. Through this class you can access all the information you are interested in three forms:
a python dataclass,
a dictionary with embedded sub-dictionaries, or
a normalized/flattened dictionary.
See more on FilenameInfo
here. You can easily access all three from the fnm
object we just created:
>>> fnm.filenames
>>> fnm.filename_info_list # list of FilenameInfo objects for each filename
>>> fnm.filename_info_dicts # dictionary of embedded sub-dictionaries
>>> fnm.changing_filename_info_dict # normalized/flattened dictionary
For more information on FilenameManager
, go here.
Comprehensive reference guide to filename conventions#
This section is here to provide an experienced user with a memory refresher of the filename conventions. If you are a new user, you may wat to skip forward to the next section.
Special symbols#
To separate and properly assign information:
Equality/Assignment:
~
Primary header separator:
_
Secondary separators in order:
-
,;
,,
.
Input types#
Input Type |
Format |
Special Symbols |
Example |
---|---|---|---|
String |
Letters, Numbers, no special symbols |
N/A |
“ExampleName1” |
Numeric |
- Sign (optional) |
- “p” for positive (optional) or “n” for negative |
“n0p13n” |
List Entry |
Strings or numerics in a serial format, separated by secondary separators. |
Secondary separators |
“Msc~Potato-Tomato” |
Headers#
Here
[I] stands for input,
[H] stands for header name,
[S(N)] stands for subheader, with N (optional) being the order in which subheader comes in the list of subheaders within a header,
[SS(N)|] stands for subsubheader.
etc.
The following use all special symbols except the primary separator.
Format Type |
Format |
Examples |
---|---|---|
Dictionary |
[H]~[S]~[I]-[S]~[I]… |
“Lsr~Wvl~737p8n_Pwr~100n” |
List |
[H]~[I for S1]-[I for S2]… |
“Lsr~Matisse-737p8-100n-2-Air” |
Nested |
[H]~[S]~[SS]~[I];[SS]~[I]-[S]~[I]… |
“Lsr~Matisse~Wvl~737p8;Pwr~10n;Ord~2-Toptica~Wvl~368p1;Pwr~20n” |
Filename headers#
Example:
001_Smp~ZnO1_Lsr~Matisse-737p8-10n-2_Col~HWP~45deg-PnH~40u-Flt~LP380;BP370_MgF~5_Tmp~120m_Spt~From~n2u;3u-To~4u;8u-StepNo~61;26_MsT~ConfocalScanPL.csv
Header name |
Full name |
|
|
Input types |
Examples |
---|---|---|---|---|---|
“FNo” |
file number |
fno, file_number |
None |
Integer |
“001” |
“Smp” |
sample name |
smp, sample |
None |
String (rec), Numeric, List entry |
“Smp~Zn1” |
“MgF” |
magnet field |
mgf, magnet_field |
None, |
String, Numeric (rec), List entry, Dictionary format |
“MgF~5” |
“Tmp” |
temperature |
tmp, temperature |
None, |
String, Numeric (rec), List entry, Dictionary format |
“Tmp~120m” |
“MsT” |
measurement type |
mst, measurement_type |
None |
String (rec), Numeric, List entry |
“MsT~Lifetime” |
“Msc” |
miscellaneous |
msc, miscellaneous |
None |
String, Numeric, List entry |
“Msc~Any-Information;you,want” |
“Spt” |
spot |
spt, spot |
None, |
List entry, Dictionary format |
“Spt~n1p05u-2p45u” |
“Lsr” |
lasers |
lsr, lasers |
|
Dictionary format, List format |
“Lsr~Matisse-737p8n-10n-2” |
“Exc” |
excitation path optics |
exc, excitation_path_optics |
|
Dictionary format, List format |
“Exc~Plr~V-HWP~45deg” |
“EnC” |
excitation and collection path optics |
enc, excitation_and_collection_path_optics |
|
Dictionary format, List format |
“EnC~QWP~90deg” |
“Col” |
collection path optics |
exc, collection_path_optics |
|
Dictionary format, List format |
“Col~PnH~40u-Flt~LP380;BP370” |
“RFS” |
RF sources |
rfs, rf_sources |
|
Dictionary format, List format |
“RFS~Name~Synth-Frq~8p8G” |
“RFL” |
RF lines |
rfl, rf_lines |
|
Not Implemented |
Not Implemented |
“Misused” |
Misused, other (For internal use only) |
other |
None |
Any elements not matching the conversions |
Do not use! Use “Msc” instead! |
Source headers#
For lasers and RF sources. Example:
Lsr~Matisse-737p8-10n-2
Header name |
Full name |
|
|
Input types |
Examples |
---|---|---|---|---|---|
“Name” |
name |
name |
None |
String (rec), Numeric, List entry |
“Matisse” |
“Wvl” |
wavelength |
wfe, wvl_air, wvl_vac, wavelength_air, wavelength_vacuum |
None, |
String, Numeric (rec), List entry, Dictionary format |
“Wvl~737p8n” |
“Frq” |
frequency |
wfe, frq, frequency |
None, |
String, Numeric (rec), List entry, Dictionary format |
“Frq~8p8G” |
“Eng” |
energy |
wfe, eng, energy |
None, |
String, Numeric (rec), List entry, Dictionary format |
“Eng~334p5m” |
“Pwr” |
power |
pwr, power |
None, |
String, Numeric (rec), List entry, Dictionary format |
“Pwr~100n” |
“Ord” |
Nth order generation |
ord, order |
None |
Integer |
“Ord~2” |
“Mdm” |
Input medium |
mdm, medium |
None |
String: “Air” (default) or “Vacuum”. |
“Mdm~Air” |
“Msc” |
miscellaneous |
msc, miscellaneous |
None |
String, Numeric, List entry |
“Msc~Any;Extra,Information” |
Optical path headers#
For excitation, collection, and excitation/collection path optics. Example:
HWP~n21p2deg-QWP~0p33rad-Plr~V-PnH~40u-Flt~LP380;BP370
Header name |
Full name |
|
|
Input types |
Examples |
---|---|---|---|---|---|
“HWP”, “WP2” |
half waveplate angle |
hwp, wp2, half_waveplate_angle |
None, |
String, Numeric (rec), List entry, Dictionary format |
“HWP~n21p2deg” |
“QWP”, “WP4” |
quarter waveplate angle |
qwp, wp4, quarter_waveplate_angle |
None, |
String, Numeric (rec), List entry, Dictionary format |
“QWP~0p33rad” |
“Plr” |
polarizer |
plr, polarizer |
None, |
String, Numeric (rec), List entry, Dictionary format |
“Plr~V” |
“PnH” |
pinhole |
pnh, pinhole |
None |
String, Numeric (rec), List entry |
“PnH~40u” |
“Flt” |
filter |
flt, filter |
None |
String, Numeric (rec), List entry |
“Flt~LP380;BP370” |
“Msc” |
miscellaneous |
msc, miscellaneous |
None |
String, Numeric, List entry |
“Msc~Any;Extra,Information” |
RF-line headers#
Not implemented, yet.
Scan Headers#
Example:
Spt~From~n2u;3u-To~4u;8u-StepNo~61;26
Header name |
Full name |
|
|
Input types |
Examples |
---|---|---|---|---|---|
“Start”, “From”, “Init”, “Initial” |
start value |
start, initial |
None |
String, Numeric (rec), List entry |
“From~n2u;3u” |
“Stop”, “To”, “Final” |
final value |
stop, final |
None |
String, Numeric (rec), List entry |
“To~4u;8u” |
“Step”, “Res”, “Resolution” |
step size |
step |
None |
String, Numeric (rec), List entry |
“Step~100n;50n” |
“StepΝο”, “Res”, “Resolution” |
step number |
step_no |
None |
String, Numeric (rec), List entry |
“StepNo~61;26” |
“StepΝο”, “Res”, “Resolution” |
step number |
step_no |
None |
String, Numeric (rec), List entry |
“StepNo~61;26” |
“Rate” |
scanning rate |
rate |
None |
String, Numeric (rec), List entry |
“Rate~0p005;0p003” |
“Dur”, “Duration” |
total accumulation time |
duration |
None |
String, Numeric (rec), List entry |
“Dur~100s” |
“Mode” |
scan mode |
mode |
None |
String (e.g. “Continuous”, “Discreet”) |
“Mode~Continuous” |
“Msc” |
miscellaneous |
msc, miscellaneous |
None |
String, Numeric, List entry |
“Msc~Anything” |
Filename conventions#
As per our lab’s tradition, each filename starts with a n-digit numeral, e.g. 001
to indicate that this is the 1st
file in the folder. This convention helps us identify the files in our notes.
Headers#
The file number is the only metadatum that is stored without any prefix or header.
A header is usually a 3-letter sequence of letters that corresponds to a specific metadatum family.
For example, the backend file number header is FNo
. The header is followed by ~
to indicate an assignment or
equality. In our file number example, it would be FNo~001
, however, as we said, this header can be omitted.
As we progress through this section, we will slowly add more metadata to our dummy filename. As of now our filename is:
001.csv
String values#
Since we are performing different types of measurements, and the more often than not, the file extension is not enough
to determine the type of measurement stored in a file, we define the MsT
header. e.g. MsT~PL
. The primary
symbol that is used to differentiate between headers is _
. This is a special symbol and can not be used for anything
else. Hence, our filename becomes:
001_MsT~PL.csv
As a material lab, we work with different samples. Hence, another simple header we define is Smp
for sample,
e.g. Smp~ZnO1
. Our filename is:
001_Smp~ZnO1_MsT~PL.csv
Numerical values and units#
The environment in which an experiment is performed at is also important.
We therefore define two more headers, MgF
and Tmp
for magnetic field and temperature, respectively.
Both of these usually take numeric values, which follow specific conventions:
We start with an optional sign
n
used for negative andp
for positive (can be omitted).Then comes the integer part. If its is
0
, then it is mandatory to state so.The optional floating part should always start with
p
and be followed by at lease one numeral.The optional unit part. Units are handles with
pint.Quantity
objects, and you can find more information about them here. There are three formats that units can be defined in.No units. If the header you are populated has default units (e.g.
K
for temperature). If the unit is omitted, then the program will assume it is in the default units.Only unit prefix. You can use any valid unit prefix defined in
pint.UnitRegistry
(e.g.m
for milli). The program will assume that the full unit is the prefix + default core unit (e.g.mK
). Note: Beware that some units are also identified as prefixes. In that case the program will assume that you are using the prefix, not the unit. For example settingMgF~5T
would result in a teratesla unit! Hence, omit the unit all together when possible.The full unit. You can use any valid combination of prefixes and base units (e.g.
Tmp~120mK
)
With these in mind, we can now update our filename:
001_Smp~ZnO1_MgF~5_Tmp~120m_MsT~PL.csv
Subheaders#
Since our lab works primarily with optical setups, properly storing input laser information is paramount. Hence,
we define a header Lsr
that can store multiple information. This header, alongside some others we will discuss later,
can take subheaders. For laser information, these are:
Name
Wvl
,Frq
, orEng
for wavelength, frequency or energy respectively. Only define one.Pwr
for power,Ord
for frequency conversion order andMdm
for medium (only important if you provide wavelength, to distinguish betweenair
(default) andvacuum
).
We can define such a header in two ways, a dictionary or a list format. In both cases
secondary separators are used. The defined secondary separators in order of increasing depth are -
, ;
, ,
.
Subheader dictionary format.#
A dictionary format is defined by a series of subheader-value items separated by the appropriate secondary
separator (in this case: -
), e.g Lsr~Wvl~737p8n-Pwr~10n
. Subheaders, same as headers do not need to be provided in
a specific order, hence the following is also valid: Lsr~Pwr~100n-Wvl~737p8n
. This is best used when you do not need
to define all the subheadings.
Subheader list format.#
A list format is defined by a series of values separated by the appropriate separator (in this case, again, -
), e.g.
Lsr~Matisse-737p8-10n-2-Air
. This definition assumes that you provide the values is a specific predefined order. This
string tells us that the Matisse laser was used, at 737.8 nm measured in air, then was converted via SHB to 368.9 nm
and the reference power was 10 nW.
Going back to the filename we are building, we can see its starting to flesh out:
001_Smp~ZnO1_Lsr~Matisse-737p8-10n-2_MgF~5_Tmp~120m_MsT~PL.csv
List entry#
In an optics lab, we often use various optical elements in the beam path. Such elements can be listed in three headers:
Exc
for excitation pathCol
for collection pathEnC
for the common path between excitation and collection
Each of these headers can take various subheaders, such as:
HWP
orWP2
for half waveplate angleQWP
orWP4
for quarter waveplate anglePlr
for polarizerPnH
for pinholeFlt
for filters
An example of this subheader would look like: Col~HWP~45deg-PnH~40u-Flt~LP380;BP370
. You probably noticed the use of
the second secondary separator, ;
. Here we are using it so that we can assign a list of filters to the Flt
subheader. With this, our filename looks like:
001_Smp~ZnO1_Lsr~Matisse-737p8-10n-2_Col~HWP~45deg-PnH~40u-Flt~LP380;BP370_MgF~5_Tmp~120m_MsT~PL.csv
Scan Entry#
The final piece of the puzzle, scans! Often times you spend a lot of time performing measurements manually, and that can certainly be tedious. Then, sometimes, you are lucky (or hardworking) and you find (or compile) a program that automates this process. In order to be able to tell the scan parameters from a plethora of scan files, I created a special entry that can be implemented on almost all headers.
Let’s say that we want to perform a confocal scan, which is scanning e.g. a mirror on a piezo-stage. A scan can take multiple subheaders:
Start
,From
,Initial
orInit
for the starting value of the scan.Stop
,To
orFinal
for the final value of the scan.Step
,Resolution
orRes
for the step size (calculated by the start, stop and step number values, if left empty)StepNo
for the step number (calculated by the start, stop and step size values, if left empty)Rate
for the scanning rate.Duration
orDur
for the total accumulation time.Mode
for the scan mode (e.g. continuous vs discrete)
Let’s use the header Spt
that stands for spot for this example. A simple spot example can be Spt~1
(meaning that you
are on your first identified spot), or Spt~1p1u;n2p67u
meaning you are at the coordinates (x = 1.1 μm, y = -2.67 μm).
A scan Spot will look like: Spt~From~n2u;3u-To~4u;8u-StepNo~61;26
which means that we scan from (x = -2 μm, y = 3 μm)
to (x = 4 μm, y = 8 μm) with a total step number of (xsn = 61, ysn = 26). This would automatically calculate a step size
of (xs = 0.1 μm, ys = 0.2 μm). This will get us a quite hefty but informative filename:
001_Smp~ZnO1_Lsr~Matisse-737p8-10n-2_Col~HWP~45deg-PnH~40u-Flt~LP380;BP370_MgF~5_Tmp~120m_Spt~From~n2u;3u-To~4u;8u-StepNo~61;26_MsT~ConfocalScanPL.csv
Note: a scan can not be set with the list format, but only the dictionary format!!
Nested sources#
Only for sources, we are able to input more than one element via the dict format. A simple example with two laser sources would be:
Lsr~Matisse~Wvl~737p8;Pwr~10n;Ord~2-Toptica~Wvl~368p1;Pwr~20n
Where the name of each source is used as the dictionary header (notice the ~
after their name)
We can even have a scanning source while the other source is stationary.
Lsr~Matisse~Wvl~From~737p7,To~737p9,Step~50p;Pwr~10n;Ord~2-Toptica~Wvl~368p1;Pwr~20n
Miscellaneous information#
Of course, we allow the user to input more information for both headers and subheaders. The header miscellaneous is
called Msc
. Only specific headers can hold more information in the Msc
subheader, and these are the Sources,
the Optical paths, RF-lines and the Scans.
Wrongful usage of conventions#
If a filename includes wrongful notation, performance is not guaranteed.
For the most common mistake, using _
between any value, we make sure to store all that misused information as a list.
Accessing filename metadata with FilenameInfo and FilenameManager#
Now that we know how to name our files, we can use the FilenameManager
for sort to easily access all of this
stored information. For each file, the FilenameManager creates a FilenameInfo
dataclass. Each filename header will be
converted to a class attribute. The user will not need to interface with the FilenameInfo
object, unless they need
some additional functionality.
As we said earlier, unit-based values will be stored as pint.Quantity
objects. If that fails, the values fall back
to strings. Lists are stored as lists, and subheaders within embedded Info
objects.
FilenameInfo#
Nevertheless, lets see how our example filename would like as a FilenameInfo
object. The following snippet shows how
to initialize a FilenameInfo
object from a filename.
>>> from qdl_zno_analysis.filename_utils.filename_info import FilenameInfo
>>> filename = '001_Smp~ZnO1_Lsr~Matisse-737p8-10n-2_Col~HWP~45deg-PnH~40u-Flt~LP380;BP370_MgF~5_Tmp~120m_Spt~From~n2u;3u-To~4u;8u-StepNo~61;26_MsT~ConfocalScanPL.csv'
>>> fni = FilenameInfo.from_filename(filename)
>>> fni
FilenameInfo(file_number=1, sample_name='ZnO1', lasers=SourceInfo(name='Matisse', wfe=<Quantity(737.8, 'nanometer')>, wavelength_vacuum=<Quantity(369.001625, 'nanometer')>, wavelength_air=<Quantity(368.9, 'nanometer')>, frequency=<Quantity(812442.109, 'gigahertz')>, energy=<Quantity(3.35999058, 'electron_volt')>, power=<Quantity(10.0, 'nanowatt')>, order=2, medium='Air'), magnetic_field=<Quantity(5.0, 'tesla')>, temperature=<Quantity(120.0, 'millikelvin')>, collection_path_optics=OpticsInfo(half_waveplate_angle=<Quantity(45.0, 'degree')>, pinhole=<Quantity(40.0, 'micrometer')>, filters=['LP380', 'BP370']), measurement_type='ConfocalScanPL', spot=ScanInfo(start=<Quantity([-2. 3.], 'micrometer')>, stop=<Quantity([4. 8.], 'micrometer')>, step=<Quantity([0.1 0.2], 'micrometer')>, step_no=array([61, 26]), direction=array([1, 1])))
You see that in the printed statement, only the fields that have been initialized are printed out. You can access each element you by calling the corresponding dataclass attribute:
>>> fni.filenumber
1
>>> fni.lasers
SourceInfo(name='Matisse', wfe=<Quantity(737.8, 'nanometer')>, wavelength_vacuum=<Quantity(369.001625, 'nanometer')>, wavelength_air=<Quantity(368.9, 'nanometer')>, frequency=<Quantity(812442.109, 'gigahertz')>, energy=<Quantity(3.35999058, 'electron_volt')>, power=<Quantity(10.0, 'nanowatt')>, order=2, medium='Air')
To access fni
as a dictionary:
>>> fni.to_dict()
{'file_number': 1, 'sample_name': 'ZnO1', 'lasers': {'name': 'Matisse', 'wfe': <Quantity(737.8, 'nanometer')>, 'wavelength_vacuum': <Quantity(369.001625, 'nanometer')>, 'wavelength_air': <Quantity(368.9, 'nanometer')>, 'frequency': <Quantity(812442.109, 'gigahertz')>, 'energy': <Quantity(3.35999058, 'electron_volt')>, 'power': <Quantity(10.0, 'nanowatt')>, 'order': 2, 'medium': 'Air', 'miscellaneous': None}, 'rf_sources': None, 'magnetic_field': <Quantity(5.0, 'tesla')>, 'temperature': <Quantity(120.0, 'millikelvin')>, 'excitation_path_optics': None, 'collection_path_optics': {'half_waveplate_angle': <Quantity(45.0, 'degree')>, 'quarter_waveplate_angle': None, 'polarizer': None, 'pinhole': <Quantity(40.0, 'micrometer')>, 'filters': ['LP380', 'BP370'], 'miscellaneous': None}, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': 'ConfocalScanPL', 'miscellaneous': None, 'spot': {'start': <Quantity([-2. 3.], 'micrometer')>, 'stop': <Quantity([4. 8.], 'micrometer')>, 'step': <Quantity([0.1 0.2], 'micrometer')>, 'step_no': array([61, 26]), 'direction': array([1, 1]), 'rate': None, 'duration': None, 'data_aggregation_method': None, 'miscellaneous': None}, 'other': None}
The dictionary includes all the attributes, initialized or not. To access the fni
as a normalized dictionary (
no embedded dictionaries):
>>> fni.to_normalized_dict()
{'file_number': 1, 'sample_name': 'ZnO1', 'lasers.name': 'Matisse', 'lasers.wfe': <Quantity(737.8, 'nanometer')>, 'lasers.wavelength_vacuum': <Quantity(369.001625, 'nanometer')>, 'lasers.wavelength_air': <Quantity(368.9, 'nanometer')>, 'lasers.frequency': <Quantity(812442.109, 'gigahertz')>, 'lasers.energy': <Quantity(3.35999058, 'electron_volt')>, 'lasers.power': <Quantity(10.0, 'nanowatt')>, 'lasers.order': 2, 'lasers.medium': 'Air', 'lasers.miscellaneous': None, 'rf_sources': None, 'magnetic_field': <Quantity(5.0, 'tesla')>, 'temperature': <Quantity(120.0, 'millikelvin')>, 'excitation_path_optics': None, 'collection_path_optics.half_waveplate_angle': <Quantity(45.0, 'degree')>, 'collection_path_optics.quarter_waveplate_angle': None, 'collection_path_optics.polarizer': None, 'collection_path_optics.pinhole': <Quantity(40.0, 'micrometer')>, 'collection_path_optics.filters': ['LP380', 'BP370'], 'collection_path_optics.miscellaneous': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': 'ConfocalScanPL', 'miscellaneous': None, 'spot.start': <Quantity([-2. 3.], 'micrometer')>, 'spot.stop': <Quantity([4. 8.], 'micrometer')>, 'spot.step': <Quantity([0.1 0.2], 'micrometer')>, 'spot.step_no': array([61, 26]), 'spot.direction': array([1, 1]), 'spot.rate': None, 'spot.duration': None, 'spot.data_aggregation_method': None, 'spot.miscellaneous': None, 'other': None}
In the normalized dictionary the attributes of sub-dataclasses are separated by .
.
FilenameManager#
The FilenameManager
can provide easy access to all these variables, with the added benefit of managing multiple
filenames at the same time. You can initialize the manager in three ways:
Directly, with a filename or a list of filenames:
>>> from qdl_zno_analysis.filename_utils.filename_manager import FilenameManager >>> filenames = ['001_Msc~Example1a_Tmp~1p2.csv', '001_Msc~Example1b_Tmp~1p2.txt', '002_Msc~Example2_Tmp~1p2.csv'] >>> folder = 'some_folder' >>> fnm = FilenameManager(filenames, folder)
By providing file numbers and file extensions:
>>> from qdl_zno_analysis.filename_utils.filename_manager import FilenameManager >>> folder = 'some_folder' >>> fnm = FilenameManager.from_file_numbers([1,2], ['csv'. 'txt'], folder)
This method searches the provided folder (defaults to ‘.’). It will find all files with file numbers of 1 or 2, even multiple identical files, as long as they have any of the file extensions in the filetype list. If the filetype list is empty, all files with the corresponding file number will be returned.
By providing a string to be matched with
Path.glob()
:>>> from qdl_zno_analysis.filename_utils.filename_manager import FilenameManager >>> folder = 'some_folder' >>> fnm = FilenameManager.from_matching_string('*Msc~Example*', folder)
This method searches the provided folder (defaults to ‘.’). It will find all files containing the string
'Msc~Example'
in the folder.
Of course, you can access all the filenames or just the valid filenames (once found in path - defaults to filenames if validation is not requested at initialization):
>>> fnm.filenames
[WindowsPath('001_Msc~Example1a_Tmp~1p2.csv'), WindowsPath('001_Msc~Example1b_Tmp~1p2.txt'), WindowsPath('002_Msc~Example2_Tmp~1p2.csv')]
>>> fnm.valid_paths
[WindowsPath('001_Msc~Example1a_Tmp~1p2.csv'), WindowsPath('001_Msc~Example1b_Tmp~1p2.txt'), WindowsPath('002_Msc~Example2_Tmp~1p2.csv')]
From here, you can access the FilenameInfo
objects directly if needed:
>>> fnm.filename_info_list
[FilenameInfo(file_number=1, temperature=<Quantity(1.2, 'kelvin')>, miscellaneous='Example1a'), FilenameInfo(file_number=1, temperature=<Quantity(1.2, 'kelvin')>, miscellaneous='Example1b'), FilenameInfo(file_number=2, temperature=<Quantity(1.2, 'kelvin')>, miscellaneous='Example2')]
Or can access the aforementioned dictionaries:
>>> fnm.filename_info_dicts
[{'file_number': 1, 'sample_name': None, 'lasers': None, 'rf_sources': None, 'magnetic_field': None, 'temperature': <Quantity(1.2, 'kelvin')>, 'excitation_path_optics': None, 'collection_path_optics': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': None, 'miscellaneous': 'Example1a', 'spot': None, 'other': None}, {'file_number': 1, 'sample_name': None, 'lasers': None, 'rf_sources': None, 'magnetic_field': None, 'temperature': <Quantity(1.2, 'kelvin')>, 'excitation_path_optics': None, 'collection_path_optics': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': None, 'miscellaneous': 'Example1b', 'spot': None, 'other': None}, {'file_number': 2, 'sample_name': None, 'lasers': None, 'rf_sources': None, 'magnetic_field': None, 'temperature': <Quantity(1.2, 'kelvin')>, 'excitation_path_optics': None, 'collection_path_optics': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': None, 'miscellaneous': 'Example2', 'spot': None, 'other': None}]
>>> fnm.filename_info_norm_dicts # in this case they are the same, since there are no embedded `Info` objects
[{'file_number': 1, 'sample_name': None, 'lasers': None, 'rf_sources': None, 'magnetic_field': None, 'temperature': <Quantity(1.2, 'kelvin')>, 'excitation_path_optics': None, 'collection_path_optics': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': None, 'miscellaneous': 'Example1a', 'spot': None, 'other': None}, {'file_number': 1, 'sample_name': None, 'lasers': None, 'rf_sources': None, 'magnetic_field': None, 'temperature': <Quantity(1.2, 'kelvin')>, 'excitation_path_optics': None, 'collection_path_optics': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': None, 'miscellaneous': 'Example1b', 'spot': None, 'other': None}, {'file_number': 2, 'sample_name': None, 'lasers': None, 'rf_sources': None, 'magnetic_field': None, 'temperature': <Quantity(1.2, 'kelvin')>, 'excitation_path_optics': None, 'collection_path_optics': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': None, 'miscellaneous': 'Example2', 'spot': None, 'other': None}]
You can also find all the (non)-repeating values between filenames:
>>> fnm.changing_filename_info_dict
{'file_number': [1, 1, 2], 'miscellaneous': ['Example1a', 'Example1b', 'Example2']}
>>> fnm.non_changing_filename_info_dict
{'sample_name': None, 'lasers': None, 'rf_sources': None, 'magnetic_field': None, 'temperature': <Quantity(1.2, 'kelvin')>, 'excitation_path_optics': None, 'collection_path_optics': None, 'exc_and_col_path_optics': None, 'rf_lines': None, 'measurement_type': None, 'spot': None, 'other': None}
Finally, you can find available filetypes, filenames by filetypes and available file numbers:
>>> fnm.available_filetypes
['csv', 'txt']
>>> fnm.filenames_by_filetype
{'csv': [WindowsPath('001_Msc~Example1a_Tmp~1p2.csv'), WindowsPath('002_Msc~Example2_Tmp~1p2.csv')], 'txt': [WindowsPath('001_Msc~Example1b_Tmp~1p2.txt')]}
>>> fnm.available_file_numbers
[1, 2]
With all of the above, I hope that handling filename metadata will become simple and easy!
The FilenameManager
class will be thoroughly utilized internally to ease the initialization of data-holding classes
in the rest of this package.