It’s been an interesting week as far as reality goes; societies the world over continue to split at the seams, companies are downsizing in the wake of negative outlook and uncertainties, it’s never been a better time to go hiking a really long way away from other people; and to top it off, we’re all finding out just quite how obtuse yeast cultures can be.
In a misguided attempt to get away from the stress of my other in-job training, I took the opportunity to join the PyTorch: From Zero to GANs course on the Jovian data science forums. It’s free, headed by the experienced Aakash, the company’s founder.
Somewhat cajoled into joining by my enthusiastic colleagues, this was my first contact with machine learning, and with data science as an overall field.
As can be imagined from the course title, the work focuses on the PyTorch machine learning framework for Python. It’s open source, originally developed by the Facebook team, and an accessible jumping off point for people looking to learn a bit about the sector.
So we jumped in.
Initially, those familiar with NumPy’s arrays will feel right at home. The module boasts clear interoperability, and is also supports C++ though to perhaps diminished capacity.
As can be seen in the embed, any NumPy arrays (imported here as np) can be directly ported into the module. After which they can be edited or inserted into using either native commands (as shown with list slices) or using object property function calls in torch itself.
The flexibility this brings to the language is of immense use given the range of data which might be brought to a data science project for modelling.
Visualisation of multi-dimensional datasets can become a pain, so Torch’s inbuilt formatting was a welcome relief.
Taking a gander across the generation options, poisson and logspace caught my eye, and not at all because they’re about all I can remember from A-levels. Respectively used to transform input tensors to poisson distributions or stepped log spans, they go some way to showing the broad nature of the platform.
Far beyond my current understanding, the sheer range of generative and iterative options available through the documentation is very slightly intimidating.
It is clear a refresher on statistical analysis and data manipulation will be in order; for which the depths of YouTube, GitHub, and StackExchange will no doubt be plumbed.
On the manipulation side, the by-dimension ability to pull ordered top values caught my eye for the presentation value alone.
As shown in the example, the function is broadly applicable to complex data sets, and can be very specific in its output.
The ability to skim top data of given sets, as well as the flexibility offered by the
output kwargs will definitely be of use for running specific modelling on edge cases, or pulling sets for display.
Whilst it’s received a lot of attention recently, with as wide an appeal as major subreddits, traditionally data presentation and formatting has been sorely overlooked.
Rounding off my highlight of the basic functions with a brief look at linear interpolation, I found my first week of data science to be an overall enjoyable affair, and a welcome break from revising AWS content.
If you wish to explore the content, take a peak at the first Jovian hosted Jupyter Notebook.
This promises to be at least a 6 week dive into the industry, and I’ll be sure to keep this channel updated. Thank you to the readers (if there are any), and to the instructors who made this possible.
See you all next week.