Boxyvault - Github CI/CD

Intro

In this post I’ll talk about Boxyvault’s code migration to Github and Open Id Connect (OIDC) learnings that enables build and deployment process implementation with Github actions (GHA).

Repository Migration

In my previous post I described why I’m planning to migrate Boxyvault’s code repositories from AWS CodeCommit to Github. I’ve now officially completed this step. 😁
Initially I thought this would involve some complexity but it was straight forward, especially since I discovered that Github provides an import mechanism that does all the heavy lifting for you.

When creating a repository in Github, typically you provide some details such as repository name, description, inclusion of .gitigore etc. On the particular page I discovered an import link and decided to try it out. Here you have to specify the clone URL where the source repository resides, in my case AWS CodeCommit. Followed by the desired repository name as well as specifying if you want a private or public repository. Once all the details are provided you can hit the Begin import button which will prompt you for login details pertaining to you source repository. After completing these steps the import finished without issue.

Build & Deploy Migration

Up until this round of work, I’ve relied on a local build and deploy processes for deploying Boxyvault’s cloud infrastructure to AWS. Typically by running a bash script that uses my local AWS profile to execute CloudFormation templates.

Going forward, I’d like to improve on these process in the following ways:

  • Reduce manual errors related to the human element
  • Catch mistakes in my infrastructure code automatically (linting)
  • Orchestrate complex dependency chains without having to remember sequence or caveats
  • Deploy infrastructure changes automatically on code changes

The driving force behind migrating to Github was due to being able to use Github actions to accomplish all the above mentioned goals.

Is it safe to create AWS resources from GHA?

Short answer, it can be! It is up to us to follow Github and AWS best practices to ensure things are safe and don’t compromise our credentials or cloud accounts.

In order to communicate with AWS from GHA, we need to authorize the GHA workflow sessions and thus allow them to use AWS services. Github describes two methods that can be used to securely configure AWS access:

  • Store AWS secret id and key in Github secrets
  • Use an AWS OIDC connection to provide short lived access tokens

The majority of online recommendations state that the latter option is recommended due to these benefits:

  • Tokens are short lived
  • No need to rotate long lived security keys

I opted to follow this option and documented what I did to get my first GHA workflow interacting with AWS.

How do we establish an OIDC connection between GHA and AWS?

When the GHA runs, it will execute an assume role step. This step passes information to an AWS OIDC identity provider (IdP) for validation. If validation passes the GHA workflow session gains the same permission as the IAM role. This role has been associated with the particular OIDC connection. Then the GHA workflow can proceed with running various AWS commands. Once the workflow completes, this connection and associated authorization tokens are discarded due to the ephemeral nature of the GHA runners.

Here is a high level of required tasks:

  • Create and OIDC IdP
  • Create a role policy
  • Create a role and OIDC association
  • Attach the role policy to the role

Create an IAM OIDC IdP

In AWS we create an OIDC identity provider (IdP) that is configured to allow connections from Github’s public OIDC URL domain. i.e. https://token.actions.githubusercontent.com.

Github has a special identifier called a thumbprint that we also need to provide. i.e. as of writing: 6938fd4d98bab03faadb97b34396831e3780aea1

Create an IAM role policy

We create a policy that outlines a set of execution permissions, all the things we want our GHA workflow to be authorized to do. e.g. Creating or listing a CloudFormation stack. For now this policy is not associated with anything, but we’ll get to that soon.

Create an IAM role and OIDC association

Now we create an IAM role that will be assumed by the GHA workflow. We associate this role with the OIDC IdP we created in the previous step. This is done by declaring trusted entities for the role.

The trusted entities allow the sts:AssumeRoleWithWebIdentity service and scopes access to specific Github organizations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::AWS_ACCOUNT_PLACEHOLDER:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:sub": "repo:GITHUB_ORG_PLACEHOLDER/boxyvault-infra:ref:refs/heads/main",
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
}
}
}
]
}

Attach the role policy to the role

Lastly we attach the role policy to the new role, after which the role can be 1) assumed via OIDC in a GHA workflow and 2) perform various actions against AWS services.

Automation

Initially I did all of these steps manually while following a guide (linked below - references). But seeing that my goal for Boxyvault is to make it easy to setup, I knew this needed some automation.

I imagine this step to be one of the first things a Boxyvault admin/installer would have to do and thus some of it might be more manual. e.g. running a bash script. However, I wanted make sure it was way less complicated than following a detail configuration guide while performing steps on the AWS console. Therefore I wrote a bash script that will provision the role, permission policy and OIDC resources.

New section - oidc

I created a new directory in the boxyvault-infra repository. This directory will contain both scripts and policy templates.

Policy Templates

Initially I had a simple policy file that declared permissions, however I realized that other users would have to manually update this to reflect their AWS regions, account id and preferred resources names. Instead I’d like to capture all these details when the script is run an then automatically update the various policy files.

To do this, I created a template-policy file that has specific placeholder values. With some shell magic I copy the template files content, replace the content placeholder values with actual values and write out the update content to an “actual” policy file. The script then uses this newly create policy file while provisioning the required resources.

Stucture

  • oidc
    • configure.sh
    • infra-deployment-policy.json
    • oidc-policies-template.json
    • oidc-policies.json (git ignored)

Documentation

Because this is a very important step, so important that Boxyvault will not be provision-able if it’s not done first, I have to document the setup for future me and other users. I wrote up some draft details in a readme.md but soon realized that this is one of many similar complex scenarios and that a single readme file would not suffice. To address this need, I’ve added a new milestone for Boxyvault’s development in my task tracking process, aka, Notion. This entails creating a space or system where I can capture the gotchas and how-to’s as well as the details of how some of the complex things work.

I’ve seen people use separate Github repositories to act as a central documentation area that can link multiple components and repositories together in one space and think this might be a simple solution. Saying that, I will checkout some other options when I get there.

References

https://aws.amazon.com/blogs/security/use-iam-roles-to-connect-github-actions-to-actions-in-aws/
https://docs.github.com/en/rest/actions/oidc?apiVersion=2022-11-28
https://scalesec.com/blog/oidc-for-github-actions-on-aws/

Conclusion

Boxyvault is now operating out of Github, both repository management as well as with it CI/CD processes. As of now, the existing manual deployment steps I used previously need to all be migrated to new github action workflows. I have a working POC that uses OIDC to grant access to AWS and will use this example to build out all the build and deploy steps.

I’m excited to have these new capabilities and believe this will serve Boxyvault and it’s users very well.

Previous - Boxyvault Github migration | Stay tuned for next post

Boxyvault - Github migration

Intro

This week I will be focusing on an unexpected task, migrating Boxyvault’s code repositories from AWS CodeCommit to a private Github repository. This was unexpected as I had initially planned to proceed with connecting my first AWS Lambda with either real data in DynamoDB or wiring it up to an APIGateway endpoint.

This log will describe why I’ve made this decision and talk about the benefits I expect to gain from it. Additionally I’m noting down all interesting discoveries and know-how’s while I perform the actual technical bits so other can learn from it and I have something to refer to in the future when I forget.

Background

Boxyvault is slowly growing in both front-end and infrastructure code, both repositories are hosted in AWS CodeCommit at the moment. Originally I placed the code with AWS due to a lack of knowledge. You see I’m a big fan of Github and started my software engineering career using it frequently, but I never put anything valuable or sensitive in there. At the time any free tier account had to be public, thus for projects I wanted to be private I used AWS CodeCommit which allowed private repositories on the free tier account.

Many years later, I’ve changed my professional role from software engineer to infrastructure engineer and landed in a team that owens Github as a product for the company I work at. This means the team is skilled at doing complicated things with Github and Github actions (GHA). Quite recently in this team I was made aware that Github has changed their free tier accounts to allow private repositories.
Hooray!

This is great news, but it’s a lot of effort to migrate your repositories. Changing local git credentials, ensuring commits messages are retained, updating documentation and other engrained habits that require alteration. So there really needs to be a big enough trade off for me to actually do this migrations.

Current State

All of Boxyvault’s infrastructure is represented by AWS CloudFormation template files. Each template file has its own deployment script that looks up an AWS profile from my local computer and deploys the respective resources. To run these deployment scripts I execute then in a bash terminal and wait for an status output. Additionally I look at the AWS console to verify the script and resource template deployed as expected.

Another consideration is interdependencies between resources and the sequence in which they are provisioned. e.g. S3 buckets need to exist before a Lambda function’s code can be uploaded there and referenced in it’s own resource template. At the moment I am mindful of the resource deployment order, but as Boxyvault grows I will inevitably become unable to remember all the specific hierarchies and sequential steps.

Desired State

My vision for Boxyvault is get it up an running from scratch in a few minutes. Meaning that an engineer who clones the code can launch an instance of Boxyvault and start making changes soon after. I believe this will be made possible by configuring all infrastructure as code (IaC) and developing resources in a lightweight and decoupled manner.

The current structure of Boxyvault’s infrastructure already facilitates this goals to some degree, but can be made more robust and less prune to human error. This brings us to the WHY, why I want to go through the effort of migrating my repositories from AWS CodeCommit to Github. Github provides a free tier CI/CD mechanism named Github actions (GHA) that integrates directly with your code repository. Workflows are defined that orchestrate complex DevOps processes and can be actioned from a web browser. The workflow, infrastructure and application code is secure as Github utilizes best practices such as multi factor authentication.

Migration Phases

The migration will have two phases responsible for specific goals:

Lift and shift

This is the actual transfer of code from AWS CodeCommit to Github. In this phase I need to ensure that all git information such as commit history is persisted and that my local credential are configured correctly. Initially I will keep the code repositories in AWS CodeCommit as a fallback, but will consider archiving them to prevent accidental commits to the wrong origin.

Rework existing deployments

With the Github repositories in place I will create GHA workflows to represent the existing deployments. e.g. Identity, Lambda and S3. The current deploy scripts utilize my AWS profiles in order to provision AWS resources. During this phase I’ll likely create dedicated AWS IAM roles for GHA deployments while following best security practices.

Conclusion

I realized the need to do some foundational work that will start Boxyvault’s CI/CD journey on good footing. An excellent learning experience is also in the cards for me with both Github repository management and GHA being domains I can sink my teeth into. Ultimately getting to a point where I can reliable build and deploy my infrastructure (for free 😁) is the benefit I’m after, as I believe this will help fast track Boxyvault’s development

Previous - Boxyvault’s first Lambda function | Next - Boxyvault Github CI/CD

Boxyvault - The first Lambda function

Overview

This entry delves into what I discovered while implementing the first Lambda function for Boxyvault. A couple of interesting and unexpected things appeared that is worth sharing such as needing to provision AWS S3 buckets before being able to deploy Lambda functions.

The first Lambda function

What will the first Lambda function do? The objective is to retrieve file records that correspond to actual files stored in AWS S3.
However, we are not able to do this at the moment as much of the required components such as data storage do not exist yet. In order to make progress without these parts I’ll develop some of the foundation pieces such as the Lambda function code, infrastructure, permissions and mock return data. Other parts like the AWS DynamoDb and relevant data schema will be implemented at a later stage.

The function will eventually return data from DynamoDb and optionally reduce the result set if a filter query is provided. Three potential fields I foresee being used for filtering is tag, name and upload-date. But for the sake of progress I will not implement filtering of the mock data.

Tag one of many text strings associate with a file, e.g. un processed, holiday rotorua 2023

Name the name of the file being uploaded.

UploadDate is the date when the file was uploaded. Useful for showing recent default content on a dashboard.

How do we handle filter arguments

I will be invoking these Lambda functions via an API Gateway RESTful request.
This means we’ll utilize typical mechanisms such as query parameters to deliver the filter arguments to the Lambda function. e.g.

1
GET /files?limit={limit}&start-key={start-key}&tag={tag}

All Lambda functions consist of an entry point usually referred to as it’s handler function. Within this handler we are able to extract details about the calling operation, referred as the context and in this case it would contain the query parameters provided by the invoking endpoint URL.

In this way each Lambda function invocation will have the necessary argument values to filter which file record are to be returned.

Paging DynamoDb results in Lambda

I anticipate that every Boxyvault user will upload many more files than one web page can display. I want to address this with data paging instead of infinite scrolling due to the simplicity of it’s implementation. Additionally the amount of data on the web page will remain low and hopefully contribute to fast loading and overall snappy UI.

I will utilize two query parameters to orchestrate result paging.

  • limit is the maximum number of records to return.
  • startKey is the key of the item where the query should start. This is used for pagination.

Infrastructure requirements

My chosen approach for deploying Lambda functions is to use awscli and CloudFormation. Given I’ve deployed other AWS resources like this previously I assumed that I’d need a simple CloudFormation template with my Lambda function declaration. But soon after looking into the Lambda service required properties, I realized there was a bit more work to do.

Lambda functions consist of three parts:

  • IAM role that grants the function invocation access (can be invoked)
  • Resource declaration that creates the function in AWS
  • Code that the function runs when invoked

The first two is simple and can be declared in the CloudFormation template, the code section however is more involved. To link code to the Lambda function resource in the CloudFormation template, we first need to zip up the code which is done in a deployment script. Then we need to upload the zip file to an AWS S3 bucket. The file can then be referenced in the Lambda function’s properties within the CloudFormation template.

But wait, there’s more. We need to provision an S3 bucket before we can deploy a file.
Thus I’ve included a new s3 section to the Boxyvault-Infra repository which contains both a CloudFormation template declaring the S3 bucket details as well as a deploy script that initiates the provisioning process. The deployment will typically be run before the Lambda function deployment process as a pre requisite.

Lessons Learned

Expected Lambda handler format in CloudFormation

When developing the CloudFormation for a Lambda function you have to provide a value for the handler property. This value is to be formatted and consists of two parts separated by a dot .:

  • The handler file name. e.g. my_handler.py
  • The handler function name. e.g. my_handler_function

Handler = my_handler.my_handler_function

Directory structure required to invoke Lambda functions

The Lambda function’s code file containing it’s handler function needs to be located in the root directory

IAM Role and Policy circular dependency

When declaring the lambda:InvokeFunction policy as a policy in the Lambda function execution role you may encounter a circular dependency error. The solution is to create a standalone policy and link it to the Lambda role instead.

Conclusion

In our journey to add the first Boxyvault Lambda function, we’ve uncovered some insights. The need for provisioning AWS S3 buckets as a Lambda pre requisite has resulted in a new infrastructure section in the Boxyvault-Infra repository.

Prioritizing foundation pieces, Lambda code, infrastructure and mock data has set the stage for phased implementation. Filter arguments, handled via API Gateway RESTful requests, ensure we have flexibility within the handler function while extracting context details.

We’ve started thinking about data overload, a paging strategy using the limit and startKey parameters to ensure a responsive and smooth user experience.

As I continue building Boxyvault, I hope these foundational insights will lead to a scalable and efficient system.

Previous - Boxyvault Lambda function infrastructure | Next - Boxyvault Github migration

Code Preview 😁

I’m adding this basic Lambda function code here for those interested in a simple starting point.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import json
import logging

logging.getLogger().setLevel(logging.INFO)

data = [
{
"name": 'file One.pdf',
"uploadDate": '2023-09-22T00:00:00Z',
"tags": ['Holiday', 'Moving', 'Durban']
},
{
"name": 'file Two.jpg',
"uploadDate": '2023-09-21T00:00:00Z',
"tags": ['Umhlanga', 'Airport', 'Beach']
}
]


def get_all_files():
return data


def handle_get_request(path):
if path == '/files':
result = json.dumps(get_all_files())
logging.info(f'files returned: {result}')
return get_valid_http_response(result)

return get_invalid_http_response(404, 'Not Found')


def get_valid_http_response(response_data):
return {
'statusCode': 200,
'body': response_data,
'headers': {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Headers': '*',
'Access-Control-Allow-Methods': '*',
'Content-Type': 'application/json'
}
}

def get_invalid_http_response(code, message):
return {
'statusCode': code,
'body': json.dumps(message)
}


def lambda_handler(event, context):
http_method = event['httpMethod']
path = event['path']

if http_method == 'GET':
return handle_get_request(path)
else:
return get_invalid_http_response(405, 'Method Not Allowed')

Boxyvault - Lambda function origins

Intro

This week I started planning how I will structure and deploy Lambda functions. The concern for me is that it should be easy to modify and maintain both the Lambda’s code and infrastructure.

Thus far I’ve been using bash scripts to invoke the AWS CLI in order to create my infrastructure, but I’m quite curios to see if I can also get into some CI/CD tools, e.g. Github Actions or AWS’s code build tools. My two requirements are:

  • The CI/CD process or pipelines is also declared as code, nothing manually configured
  • Secrets or tokens are secure an has a well documented approach for setting up.

This is important to me because I don’t know if Boxyvault will be a Saas or a open source project, but both would benefit from having everything defined as code due to the flexibility and speed it will provide.

What follows is the loose structure I’m following for laying out the Lambda’s code and infrastructure:

Code Structure

1
2
3
4
5
6
7
8
9
# The source code for each function resides in a function specific directory
boxyvault-infra/lambda/getfiles
boxyvault-infra/lambda/uploadFile
boxyvault-infra/lambda/createTag

# Deploy script and infra template resides in the lambda root directory
# The template will describe both Lambda invocation roles and function resources
boxyvault-infra/lambda/deploy.sh
boxyvault-infra/lambda/template.yml

Deploy

Deploying a Lambda function requires the following steps:

  • Zip up the function code
  • Push the zip file to AWS S3 and capture necessary resources Id’s that links to the S3 object
  • Link the S3 Id to the infrastructure template
  • Deploy the Cloudformation resource via the AWS CLI

CI/CD Tools

I’ve not yet decide on what tool I’d like to use but have two preferences:

  • Github Actions
  • AWS Code Build/Deploy

Why would I need this, could I not just run the individual deploy scripts?
My main reason for considering CI/CD tools are:

  • Reliable deployments (pressing a button is less error prone than running a script with arguments)
  • Provides a way to orchestrate multiple deploy scripts
  • Allows me a unique upskilling opportunity that aligns with my day job

So what is next?

The next task I want to work on is to get a simple Lambda function working and returning dummy data. Once I have this, I can build out the Lambda resource and IAM roles template file. Finally I need to make a deploy script that uses awscli to provision the Lambda functions and their roles. Additionally all the steps listed in Deploy needs to be implemented.

Conclusion

  • Lambda function code is separated by function name, deployment and infrastructure files reside in the root lambda directory.
  • I’m considering the introduction of CI/CD tooling into the project but need to look at pros and cons of my options.
  • My next post will detail progress I’ve made on a dummy data Lambda function.

Previous - Boxyvault Project Planning | Next - The first lambda function

Boxyvault - Project Planning and Milestones

Intro

In this post I’ll talk about how I manage my personal projects and the initial milestones I’d like to reach while building Boxyvault

Why project planning is needed?

Well, it’s not needed, however as a software engineer working in the corporate environment I’ve seen the impact of not planning software project appropriately.

There are two things I’ve seen happen when project planning is not executed well.

Scope creep

Without proper boundaries or minimum viable product (MVP) definitions, a project can quickly gain new features.
These new features make sense during the projects initial phases but increase the likelihood of resulting in the project not reaching deadlines, disappointing users and failing due to withdrawal of stakeholder support.

Project planning should cap the features that are intended to be developed and clearly describe what will and won’t be considered for Beta versions.

Inefficient Development

When either the project planning tooling is lacking or the use of proper task management is not enforcement, chaos can ensue. In scenarios like this engineers use their own approaches to tracking progress and reporting back to stakeholders. Without a form of standardization information gets lost, duplicated or corrupted ultimately resulting in poor delivery performance.

Project planning should define how progress is tracked and reported on. This provides visibility on successes and potential blockers. This visibility is instrumental in allowing agile behavior during the projects development.

My personal planning

Since I’m working on a personal project some of typical project planning requirement are less important as I report to myself. However, being organized and having visibility of what to work on when is still highly valuable.

How

I use the Notion app to plan my projects. A simple page per project, Milestones and Tasks embedded databases helps create actionable task management.

Notion is great for this as it allow rich text editing, embedded URL links and uploading of files. The embedded database system is flexible and allows you to manage any type of data.

Milestones

So what milestones do I have for Boxyvault?

  • User Authentication (federated login to a Boxyvault front-end)
  • CRUD Tags (tags are custom meta data you attached to a file)
  • Upload And Tag Files
  • View And Delete Uploaded Files
  • Preview File Content (images and videos to show in the browser)

Tasks

How do Tasks tie into Milestones?

Milestones are generally complex functionality that require many individual parts to be built, integrated and tested. Tasks, represent these individual steps to reach the Milestone.

Tasks are linked to their respective Milestones making it simple to see how the task is contributing to the bigger picture. In Notion it’s also easy to utilize “Backlinks” to have tasks refer to one another when there is a need for making a dependency visible.

In my Tasks database I have a column called “Status” and this provides a mechanism for seeing what is pending, in-progress or completed.

Another great feature of Notion and the database system is that each record has an underlying Page representative. This means that if I need to add more context to a task, I can expand it’s page and populate it with any amount of text, links, images and syntax highlighted code snippets.

Previous - Boxyvault Intro | Next - Boxyvault Lambda Infra

Boxyvault - Personal cloud based file management system

Intro

In this post I’m going to share an idea that I’m working on and why it’s important to me.

Goals

  • Establish consistent development progress on the Boxyvault system
  • Practice technical writing
  • Start a habit of capturing dev logs

Boxyvault

Boxyvault is the codename I’ve chosen for this project.
The name intends to reflects the concept of a container that is a safe keep for valuable content.

So what is this Idea?

I want to use the AWS’s S3 service to store and manage my family photo’s and video’s. In order to interact with these stored files, I’m developing a front-end in React. Boxyvault will enable some custom functionality such as tagging, previewing and archiving of content.

Why not use Google Drive?

I am very cheap and enjoy free stuff and with this comes use of the default 15 gigs of storage Google Drive offers. Unfortunately I’ve maxed out this free storage very quickly as the size of my family content far exceeds this.
So I looked into the pricing tiers offered to upgrade my Google Drive storage and despite it being somewhat reasonable at lower sizes (100-200 gb) the jump to 2 tb (max) was too expensive. Additionally I foresee my 4k video’s needing more space in the future.

Surely you can find some alternatives?

What are my options here? I’ll list them as I saw it:

    1. Buy physical hard drives, enough to make backups. Remember this content is precious
    1. Google Drive with the most expensive tier and perhaps create more paid accounts.
    1. Look for alternative cloud based solutions and accept the cost
    1. Build my own system and rely on AWS S3 and Glacier to make it cost effective

I decided to develop a custom system (4) for the following reasons:

  • As a fullstack software engineer this is great practice for me
  • I’m interested to see if this can become and open source project or income source
  • A large project like this will give me many opportunities to practice technical writing

Next - Boxyvault Project Planning

Python Mastery - General: Introduction to PyCharm

Overview

We’ll quickly setup and test the PyCharm integrated development environment (IDE) made by JetBrains. PyCharm is the most popular code editor for Python.

Prerequisite: Installing Python

To use PyCharm, you need to have Python installed. If you haven’t installed Python yet you can find info on the official Getting Started page.

This post does not explain details about installing software on different types of opperating systems.

Install PyCharm

  1. Head over to the PyCharm page and click download.
  2. Select the operating system you’d like to install PyCharm on.
  3. Two editions are available for download.
    • Professional which is commercial
    • Community which is free and open source.
  4. Download and install the Community edition on your computer.

Basic usage

Once you launch the IDE, accept any default configurations. You should be presented with an option to create or open a project.

Choose New Project
Specify a output path and project name. e.g. C:\Hello-World and create.

You will see some basic files and folders created automatically. Below your Hello-World folder add a new Python file. e.g. app.py

Now add some sample code to app.py:

1
print("Hello World")

In PyCharm you will see a green play button. This is the Run button and it allows us to execute our Python code.
After you click this we should see the Hello World message being output in the window (Terminal Window) below.

Resources

Mosh does great job at taking you through this process:
Python for Beginners - Learn Python in 1 Hour

Python Power-Up: Your Ultimate Reference Guide and Library Lookup

General Python Libraries

  • NumPy: Numerical computing library that provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
  • Pandas: Data manipulation and analysis library that provides data structures and functions for efficiently handling structured data, such as data frames.
  • Matplotlib: Comprehensive plotting library for creating static, animated, and interactive visualizations in Python.
  • SciPy: Scientific computing library that builds on top of NumPy, providing additional functionality for tasks such as numerical integration, optimization, interpolation, linear algebra, and more.
  • Requests: Elegant and simple HTTP library for sending HTTP requests in Python.
  • BeautifulSoup: Library for parsing HTML and XML documents, extracting data, and navigating the parsed tree structure.

Web Development

  • Flask: Micro web framework for building web applications in Python with a simple and lightweight design.
  • Django: High-level web framework that follows the model-view-controller architectural pattern and includes a robust set of tools and features for web development.
  • SQLAlchemy: SQL toolkit and Object-Relational Mapping (ORM) library for Python that provides a set of high-level APIs for interacting with databases.
  • Tornado: Asynchronous web framework and networking library that emphasizes speed, scalability, and non-blocking operations.
  • Green: Python library for running tests and managing test environments with a focus on simplicity and speed.
  • Requestium: Library built on top of Requests that enhances the capabilities of web scraping and automation with features like browser-like behavior and automatic session management.

Data Science and Machine Learning

  • Scikit-learn: Machine learning library that provides a wide range of supervised and unsupervised learning algorithms, along with tools for model selection and evaluation.
  • TensorFlow: Open-source machine learning framework that enables the construction and deployment of large-scale neural networks for various tasks, including deep learning.
  • Keras: High-level neural networks API that runs on top of TensorFlow, providing a user-friendly interface for building and training deep learning models.
  • PyTorch: Open-source machine learning framework that supports dynamic computation graphs and is widely used for tasks like deep learning, natural language processing, and computer vision.
  • NLTK: Natural Language Toolkit library that provides tools and resources for working with human language data, such as tokenization, stemming, tagging, parsing, and more.
  • OpenCV: Computer vision library that offers a comprehensive set of functions and algorithms for tasks like image and video processing, object detection and recognition, and camera calibration.

GUI Development

  • Tkinter: Standard Python interface to the Tk GUI toolkit, allowing developers to create graphical user interfaces with widgets and windows.
  • PyQt: Python bindings for the Qt application framework, enabling the development of cross-platform desktop applications with rich graphical interfaces.
  • PySide: Python bindings for the Qt framework that provide an alternative to PyQt, allowing developers to create Python applications with a Qt-based GUI.

Data Visualization

  • Matplotlib: Comprehensive plotting library for creating static, animated, and interactive visualizations in Python.
  • Seaborn: Statistical data visualization library that provides a high-level interface for creating informative and visually appealing statistical graphics.
  • Plotly: Interactive plotting library that allows for the creation of interactive, web-based visualizations with features like zooming, panning, and hover interactions.
  • Bokeh: Python library for creating interactive visualizations and dashboards in web browsers, with a focus on providing high-performance, scalable graphics.

Game Development

  • Pygame: Library for building games and multimedia applications with Python, providing functionality for handling graphics, sounds, input devices, and more.
  • Arcade: Easy-to-use game development library that simplifies the process of creating 2D games in Python, with built-in support for graphics, physics, and user input.
  • Panda3D: 3D game engine and framework that allows developers to create immersive games and simulations with Python, providing a wide range of features and tools.

Automation and Scripting

  • Click: Command-line interface (CLI) creation kit that simplifies the process of building command-line applications with Python, providing options, arguments, and other CLI elements.
  • PyAutoGUI: Library for GUI automation and keyboard/mouse control, enabling developers to write scripts that automate tasks involving graphical user interfaces.
  • Selenium: Web browser automation tool that allows developers to control web browsers programmatically, enabling tasks like automated testing, web scraping, and web application interaction.
  • schedule: Library for scheduling Python functions to run at specific times, providing a simple and intuitive interface for managing recurring tasks and timed events.

Python Mastery - General: Introduction to Python IDLE

Overview

We’ll explore Python IDLE, an integrated development environment (IDE) that comes bundled with Python. Python IDLE provides a convenient way to write, edit, and execute Python code.

Prerequisite: Installing Python

To use Python IDLE on your Windows machine, you need to have Python installed. If you haven’t installed Python yet you can find info on the official Getting Started page.

Finding Python IDLE

  1. Press the Windows key on your keyboard or click the Start button.
  2. Type “IDLE” in the search bar.
  3. Select “IDLE (Python X.Y)” from the search results, where “X.Y” represents the version number of Python installed on your machine.

Creating and Running the Code Snippet

  1. Open a new file in Python IDLE by selecting “File” -> “New File” from the menu or pressing Ctrl+N.
  2. Copy and paste the following code snippet into the new file:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    import random

    # Generate a random number between 1 and 20
    secret_number = random.randint(1, 20)

    # Initialize the number of attempts
    attempts = 0

    print("Welcome to Guess the Number!")

    while True:
    # Prompt the user to guess the number
    guess = int(input("Guess a number between 1 and 20: "))

    # Increment the number of attempts
    attempts += 1

    # Compare the guess with the secret number
    if guess < secret_number:
    print("Too low!")
    elif guess > secret_number:
    print("Too high!")
    else:
    print(f"Congratulations! You guessed the number in {attempts} attempts.")
    break

  3. Save the file with a .py extension (e.g., guess_the_number.py) by selecting “File” -> “Save” from the menu or pressing Ctrl+S.
  4. Run the code by selecting “Run” -> “Run Module” from the menu or pressing F5.

The Python shell in IDLE will then display the output of the program and prompt you for input as necessary.

Microsoft's Responsible AI Principles for Building Ethical AI Systems

Introduction

The responsible development and deployment of AI systems is crucial to ensure fairness, reliability, security, privacy, inclusiveness, understandability, and accountability. In this article, we will summarize key principles for building ethical AI systems.

For a more comprehensive understanding of responsible AI, refer to the full overview here.

Fairness

  • AI systems should treat all people fairly.
  • Bias should be avoided, such as gender or ethnicity-based bias.
  • Unfair advantages or disadvantages to specific groups should be prevented.

Azure Machine Learning and fairness:

  • Azure Machine Learning has the capability to interpret models and quantify the influence of each data feature on predictions.
  • This helps identify and mitigate bias in models.

Another example is Microsoft’s implementation of Responsible AI with the Face service, which retires facial recognition capabilities that can be used to try to infer emotional states and identity attributes. These capabilities, if misused, can subject people to stereotyping, discrimination or unfair denial of services.

Reliability and safety

  • AI systems should perform reliably and safely.
  • Examples include AI-based software for autonomous vehicles and machine learning models for medical diagnosis.
  • Unreliable systems pose significant risks to human life.

Testing and deployment management for AI-based software:

  • Rigorous testing and deployment management processes are necessary for AI-based software development.
  • These processes ensure that the systems function as expected before release.

Privacy and security

  • AI systems should prioritize security and respect privacy.
  • Machine learning models rely on large volumes of data, including personal details that must be kept private.
  • Privacy and security considerations should continue even after the models are trained and the system is in production.
  • Both the data used for predictions and the decisions made from that data may be subject to privacy or security concerns.

Inclusiveness

  • AI systems should empower and engage everyone.
  • AI should bring benefits to all parts of society.
  • Factors such as physical ability, gender, sexual orientation, ethnicity, and others should not hinder access to AI benefits.

Transparency

  • AI systems should be designed to be easily understood by users.
  • Users should be fully informed about the purpose of the system.
  • Users should have knowledge of how the system works.
  • Users should be aware of the limitations of the system.

Accountability

  • People involved in designing and developing AI systems should be accountable.
  • Governance and organizational principles should guide the creation of AI-based - solutions.
  • Ethical and legal standards should be clearly defined and upheld.
  • Designers and developers should work within this framework to ensure compliance.

Resources

https://www.microsoft.com/en-us/ai/responsible-ai-resources?rtc=1
https://blogs.microsoft.com/on-the-issues/2022/06/21/microsofts-framework-for-building-ai-systems-responsibly/
https://learn.microsoft.com/en-za/training/paths/get-started-with-artificial-intelligence-on-azure/