16. Intention and Remarks¶
Genesis
This package was programmed because of my personal need to fit multiple datasets together which differ in attributes defined by the measurements. A very common thing that is not included in numpy/scipy or most other fit programs. What I wanted is a numpy ndarray with its matrix like functionality for evaluating my data, but including attributes related to the data e.g. from a measurement. For multiple measurements I need a list of these with variable length. ==> dataArray and dataList.
As the used models are repeatedly the same a module with physical models was growing. A lot of these models are used frequently in Small Angle Scattering programs like SASview or SASfit. For my purpose the dynamic models as diffusion, ZIMM, ROUSE and other things mainly for protein dynamics were missing.
Some programs (under open license) are difficult to extend as the models are hidden in classes, or the access/reusage includes a special designed interface to get parameters instead of simple function calls. Here simple Python functions are easier to use for the non-programmers as most PhD-students are. Models are just python functions (or one line lambda functions) with the arguments accessed by their name (keword arguments). Scripting in Python with numpy/scipy is easy to learn even without extended programming skills.
The main dificulty beside finding the right model for your problem is proper mutidimensinal fitting including errors. This is included in dataArray/dataList using scipy.optimize to allow fitting of the models in an simple and easy way. The user can concentrate on reading data/ model fitting / presenting results.
Scripting over GUI
Documentation of the evaluation of scientific data is difficult in GUI based programs (sequence of clicking buttons ???). Script oriented evaluation (MATLAB, Python, Jupyter,….) allow easy repetition with stepwise improvement and at the same time document what was done.
Complex models have multiple contributions, background contribution, … which can easily be defined in a short script including a documentation. I cannot guess if the background in a measurement is const linear, parabolic or whatever and each choice is also a limitation. Therefore the intention is to supply not obvious and complex models (with a scientific reference) and allow the user to adopt them to their needs e.g. add background and amplitude or resolution convolution. Simple models are fast implemented in one line as lambda functions or more complex things in scripts. The mathematical basis as integration or linear algebra can be used from scipy/numpy.
Plotting
Matplotlib seems to be the standard for numpy/scipy users. You can use it if you want. If you try to plot fast and live (interactive) it is complicated and slow. 3D plotting has strong limitations. Another good plotting tool is gr.
Frequently I run scripts that show results of different datasets and I want to keep these for comparison open and be able to modify the plot. Some of this is possible in matplotlib but not the default. As I want to think about physics and not plotting, I like more xmgrace, with a GUI interface after plotting. A simple one line command should result in a 90% finished plot, final 10% fine adjustment can be done in the GUI if needed or from additional commands. I adopted the original Graceplot module (python interface to XmGrace) to my needs and added dataArray functionality. For the errorPlot of a fit a simple matplotlib interface is included. Meanwhile, the module mpl is a rudimentary interface to matplotlib to make plotting easier.
The nice thing about Xmgrace is that it stores the plot as ASCII text instead of the JPG or PDF. So its easy to reopen the plot and change the plot later if your supervisor/boss/reviewer asks for log-log or other colors or whatever. For data inspection zoom, hide of data, simple fitting for trends and else are possible on WYSIWYG/GUI basis. If you want to retrieve the data (or forgot to save your results separatly) they are accessible in the ASCII file. Export in scientific paper quality is possible. A simple interface for annotations, lines, …. is included. Unfortunately its only 2D but this is 99% of my work.
Speed/Libraries
The most common libraries for scientific computing in python are NumPy and SciPy and these are the only obligatory dependencies for Jscatter (later added matplotlib and Pillow for image reading). Python in combination with numpy can be quite fast if the ndarrays methods are used consequently instead of explicit for loops. E.g. the numpy.einsum function immediately uses compiled C to do the computation. (See this and look for “Why are NumPy arrays efficient”). SCIPY offers all the math needed and optimized algorithms, also from blas/lapack. To speed up, if needed, on a multiprocessor machine the module parallel offers an easy interface to the standard python module multiprocessing within a single command. If your model still needs long computing time and needs speed up the common methods as Cython, Numba or f2py (Fortran) should be used in your model. As these are more difficult the advanced user may use it in their models.
A nice blog about possible speedups is found at Julia vs Python. Nevertheless the critical point in these cases is the model and not the small overhead in dataArray/dataList or fitting.
As some models depend on f2py and Fortran code an example is provided how to use f2py and finally contribute a function in Jscatter. Extending/Contributing/Fortran
Some resources :
Development environment/ Testing
The development platform is mainly current Linux (Linux Mint/CentOs). I regularly use Jscatter on macOS. I regularly use it on 12 core Linux machines on our cluster. I tested the main functionallity (e.g. all examples) on Python 3.6 and try to write 2.7/3.x compatible code. I never use Windows (only if a manufacturer of an instrument forces me…) Jscatter works under Windows, except things that rely on pipes or gfortran as the connection to XmGrace and the DLS module which calls CONTIN through a pipe. As matplotlib is slow fits give no intermediate output.