Adam Nagy's blog: 2023

Monday, December 11, 2023

Load balancing strategies

Load balancing is crucial in distributing incoming network traffic across multiple servers or resources to ensure efficient utilization, optimize resource usage, and prevent overload on any single server. Several load balancing strategies exist, each suited for specific scenarios:

Round Robin: Requests are distributed sequentially among servers in a circular order. It's simple and ensures an equal distribution of load but might not consider the server's current load or capacity.
Least Connections: Traffic is directed to the server with the fewest active connections. This strategy ensures that the load is distributed to the least loaded servers, promoting better resource utilization.
Weighted Round Robin: Servers are assigned weights, specifying their capacity or processing power. Requests are then distributed based on these weights, allowing more traffic to higher-capacity servers.
IP Hashing: The client's IP address determines which server receives the request. This ensures that requests from the same client are consistently sent to the same server, aiding session persistence.
Least Response Time: Requests are directed to the server that currently has the shortest response time or the fastest processing capability. This strategy optimizes performance for end users.
Resource-based Load Balancing: Takes into account server resource utilization metrics (CPU, memory, etc.) and directs traffic to servers with available resources, preventing overload and maximizing performance.
Dynamic Load Balancing Algorithms: These algorithms adapt in real-time to changing server conditions. They can factor in various metrics like server health, latency, and throughput to dynamically adjust traffic distribution.
Content-based or Application-aware Load Balancing: Analyzes the content or context of requests to intelligently route traffic. For instance, it can direct video streaming requests to servers optimized for video processing.

GCP services with examples

Similarly to the previous post, writing about GCP as well.

Compute Engine:

Example: Similar to Amazon EC2, Compute Engine allows you to create and run virtual machines. You might use it to deploy and manage instances for various purposes like web hosting, application development, or machine learning tasks.

Cloud Storage:

Example: Storing and serving multimedia content for a content management system. Cloud Storage offers scalable object storage, ideal for hosting images, videos, backups, and large datasets used by applications.

Cloud SQL:

Example: Running a managed MySQL or PostgreSQL database for a retail application. Cloud SQL provides a fully managed relational database service, handling backups, replication, and maintenance tasks.

Cloud Functions:

Example: Implementing event-driven serverless functions for real-time data processing. You might use Cloud Functions to trigger actions in response to events like file uploads, database changes, or HTTP requests.

Cloud Firestore / Cloud Bigtable:

Example: Building a scalable database for a real-time chat application. Firestore offers a flexible, scalable NoSQL database for storing and syncing data across devices, while Bigtable is suitable for high-throughput, low-latency workloads like time-series data or machine learning.

Cloud Pub/Sub:

Example: Creating a message queuing system for handling data processing tasks. Pub/Sub provides reliable, scalable messaging between independent applications or microservices.

Cloud CDN (Content Delivery Network):

Example: Accelerating content delivery for a global news website. Cloud CDN caches content at Google's globally distributed edge points of presence, reducing latency for users accessing articles, images, and videos.

Cloud Dataflow:

Example: Processing and analyzing large datasets in real-time. Dataflow helps to build and execute data processing pipelines for tasks like ETL (Extract, Transform, Load), analytics, and batch processing.

Google Kubernetes Engine (GKE):

Example: Managing and orchestrating containerized applications at scale. GKE automates the deployment, scaling, and management of containerized applications using Kubernetes.

Virtual Private Cloud (VPC):

Example: Creating isolated networks for different projects or departments within a company. VPC allows you to define and control a virtual network, including IP ranges, subnets, and firewall rules.

AWS services with examples

It's always hard for me to remember all the abbreviations for all the AWS services, so I tried to collect the most popular ines in this blogpost.

Amazon EC2 (Elastic Compute Cloud):

Example: Imagine building a scalable web application. You can use EC2 to deploy virtual servers (instances) to run your application. You might use different instance types for web servers, application servers, and databases, scaling them based on demand.

Amazon S3 (Simple Storage Service):

Example: Storing and serving user-uploaded files for a social media platform. S3 provides durable object storage. You might store user profile pictures, videos, and other media files and serve them directly to users.

Amazon RDS (Relational Database Service):

Example: Hosting a relational database like MySQL, PostgreSQL, or SQL Server for an e-commerce site. RDS manages the database operations, allowing you to focus on your application without worrying about infrastructure management.

Amazon Lambda:

Example: Building a serverless backend for a mobile app. Lambda enables running code without provisioning or managing servers. You might use it to handle user authentication, process data, or trigger actions based on events.

Amazon DynamoDB:

Example: Implementing a highly scalable NoSQL database for a gaming application. DynamoDB offers low-latency data access and can handle massive amounts of traffic, making it suitable for gaming leaderboards or storing player data.

Amazon SQS (Simple Queue Service) and Amazon SNS (Simple Notification Service):

Example: Building a decoupled system for an e-commerce platform. SQS allows asynchronous communication between different components of the system, while SNS can be used to send notifications about orders or updates to interested parties.

Amazon CloudFront:

Example: Accelerating content delivery for a global video streaming service. CloudFront is a content delivery network (CDN) that caches content in edge locations worldwide, reducing latency for users accessing the video content.

Amazon Kinesis:

Example: Processing and analyzing streaming data from IoT devices. Kinesis allows you to collect, process, and analyze real-time data streams at scale, making it ideal for IoT applications, log processing, or real-time analytics.

Amazon ECS (Elastic Container Service) and Amazon EKS (Elastic Kubernetes Service):

Example: Orchestrating containerized applications. ECS and EKS help manage Docker containers at scale. You might use these services to deploy microservices for a distributed application architecture.

Amazon VPC (Virtual Private Cloud):

Example: Creating a private network within AWS. VPC enables you to launch AWS resources into a virtual network, providing control over the network configuration, including IP address ranges, subnets, and routing.

Message queues vs publish/subscribe

Message queues and publish/subscribe are both messaging patterns used in distributed systems to facilitate communication between different components or services. While they serve similar purposes, they have distinct characteristics.

Message Queue:

A message queue is a communication mechanism where messages are stored in a queue until they are consumed by a receiving component. It follows a point-to-point communication model, where a sender pushes a message into a queue, and a single receiver retrieves and processes it. Once a message is consumed, it's typically removed from the queue. Message queues often prioritize reliable delivery, ensuring that messages are not lost even if the receiver is temporarily unavailable.

Publish/Subscribe (Pub/Sub):

Pub/Sub is a messaging pattern where senders (publishers) distribute messages to multiple receivers (subscribers) without the senders specifically targeting any subscriber. Publishers categorize messages into topics or channels, and subscribers express interest in receiving messages from particular topics. When a publisher sends a message to a topic, all subscribers interested in that topic receive a copy of the message. Pub/Sub allows for scalable and flexible communication between components and enables a one-to-many or many-to-many messaging model.

Key Differences:

Communication Model:

Message Queue: Point-to-point communication between a single sender and a single receiver.
Pub/Sub: Many-to-many or one-to-many communication, where multiple subscribers receive messages from publishers.

Message Handling:

Message Queue: Messages are stored in a queue until consumed by a single receiver.
Pub/Sub: Messages are broadcasted to multiple subscribers interested in specific topics without being stored in queues.

Relationships:

Message Queue: Direct relationship between sender and receiver.
Pub/Sub: Decoupled relationship; publishers and subscribers are independent of each other.

Message Retention:

Message Queue: Emphasizes on ensuring that messages are not lost even if the receiver is temporarily unavailable.
Pub/Sub: Subscribers might miss messages if they are not actively subscribed when the message is published.

Wednesday, November 22, 2023

My #1 productivity hack - Google Calendar default email reminders

I'd like to share a super simple trick, that might or might not work for you - which is essential in my life to organize my personal and work events. All hail Google calendar default email reminder.

If you're juggling a busy schedule like I am, Google Calendar's default email reminders are a game-changer. Seriously, this feature saves me so much hassle. You can customize reminders for all your events, ensuring nothing slips through the cracks.

What I love most is how easy it is to set up. Just head to settings, tweak your preferences, and voila! You can get an email nudge whenever you need it, whether it's a day before or just an hour prior to your event.

Trust me, relying on these default reminders has made my life a whole lot easier. No more frantic manual setting of reminders for each event—I just set it and forget it. It's like having a personal assistant keeping track of everything for me.

Honestly, it's not just a notification feature; it's a productivity hack. It frees up mental space, letting me focus on what I need to do without constantly worrying about missing important stuff. Give it a shot; you'll thank yourself later!

Setting default email reminders proves immensely beneficial in managing a busy schedule. It eradicates the need to manually set reminders for each event, saving time and ensuring no event goes unnoticed. The simplicity of configuring these reminders simplifies the organizational process for users, fostering a more efficient workflow.

This feature fosters productivity by reducing the mental load of remembering every event. Users can rely on the system to prompt them at designated times, allowing them to focus on the tasks at hand without worrying about missing appointments or deadlines.

The only downside of this approach is that your email client can get pretty chatty. But that also means, you're living a busy life! So all in all, this is my best approach to manage all of these events (e.g. birthdays are there too haha), but if you know a better way to do it, let me know!

Learn more about it here!

Monday, October 9, 2023

Neural networks basics watchlist

Wednesday, October 4, 2023

My VSCode TypeScript prototyping setup

.vscode/launch.json

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
      {
        "type": "node",
        "request": "launch",
        "name": "Debug TS",
        "program": "${workspaceFolder}/index.ts",
        "preLaunchTask": "tsc: build - tsconfig.json",
        "outFiles": ["${workspaceFolder}/out/**/*.js"]
      }
    ]
  }

tsconfig.json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "commonjs",
    "outDir": "out",
    "sourceMap": true
  }
}

Sunday, March 5, 2023

JavaScript infinite streams

This post was heavily inspired by this post: Infinite Data structures in Javascript (author: Dimitris Papadimitriou)

After reading the article, I also wanted to have the bare minimum, and most simple implementation of a Stream - and surprisingly, it is very easy (in terms of code), but also, kinda complicated. Kind of the sweetspot I really like to learn about!

In this implementation, we add a filter method to the stream object, which takes a predicate function pred and returns a new stream object with only the elements that pass the predicate. If the current value passes the predicate, then the resulting stream will include the current value and the result of calling filter on the next property. Otherwise, the resulting stream will only include the result of calling filter on the next property. If there is no next property, then the emptyStream object is returned.

The resulting mappedStream is a stream of strings with the same structure as the original stream, but with each element converted to a string. The resulting filteredStream is a stream with only the elements that are greater than 1.

What's wrong with eager evaluation? (And why you should use fp-ts/Fluture for that)

Eager evaluation in JavaScript Promises can lead to several problems. Eager evaluation means that a Promise's executor function is executed immediately when the Promise is created, rather than when it is needed later on. Here are some potential issues with eager evaluation:

Increased resource usage: If the Promise's executor function performs a resource-intensive operation, eager evaluation can cause unnecessary resource usage. For example, if the Promise is used to fetch data from a server, eager evaluation would mean that the fetch operation is performed immediately, even if the data is not needed until later.

Unnecessary blocking: If the Promise's executor function performs a blocking operation (such as a long-running loop), eager evaluation can cause unnecessary blocking of the main thread. This can lead to unresponsive user interfaces and other performance issues.
Wasted work: If the Promise's executor function performs work that is not needed (for example, if it fetches data that is never used), eager evaluation can result in wasted work and unnecessary network traffic.
Race conditions: Eager evaluation can also lead to race conditions, where multiple Promises are created but only one of them is needed. This can result in unnecessary resource usage and can make code harder to reason about.

To avoid these problems, it's generally better to use lazy evaluation with Promises. In lazy evaluation, the Promise's executor function is only executed when the Promise is needed (for example, when it is passed to a then() method or when it is awaited in an async/await function). This approach allows for more efficient use of resources and can help prevent performance issues.

The fp-ts library provides several abstractions and functions that can help avoid the problems associated with eager evaluation in Promises. For example, the type Task is essentially a Promise wrapped in a function, allowing you to control when it's executed, therefore it's considered "lazy".

import { task } from 'fp-ts/lib/Task'

const fetchTask = task(() => fetch('https://example.com/data'))

If you call fetchTask(), only then will your HTTP call be executed.

Fluture

There are also other viable options, like Fluture, which is a Future implementation in Javascript. I'm not aiming to discuss Future-s here, but it might be mentioned, that it is a monadic interface that is lazy by it's nature - in a way it's similar to Promise, but more functional, with it's advantages.

OPENAPI discriminators

OpenAPI discriminators are a feature of the OpenAPI Specification (formerly known as Swagger) that allows you to define alternative schemas for different values of a property.

In other words, you can use a discriminator to specify different schema definitions based on the value of a property. For example, you may have an "animal" schema with a discriminator property called "type". The "type" property could have possible values of "cat" or "dog". You could then define different schema definitions for the "cat" and "dog" types.

Discriminators are especially useful for modeling inheritance in your API schema. By using discriminators, you can define common properties and behaviors for a group of related schemas, while still allowing for differences in their specific implementations.

The discriminator property is used to select the appropriate schema definition for a given value. The value of the discriminator property is typically found in the JSON payload of a request or response.

Let's say we want to define a schema for different types of animals, with common properties like "name" and "age", but with different properties based on the type of animal. We can use a discriminator to define alternative schemas based on the value of the "type" property:

components:

schemas:

Animal:

type: object

properties:

name:

type: string

age:

type: integer

discriminator:

propertyName: type

required:

- type

Dog:

allOf:

- $ref: '#/components/schemas/Animal'

- type: object

properties:

breed:

type: string

Cat:

allOf:

- $ref: '#/components/schemas/Animal'

- type: object

properties:

color:

type: string

In this example, we define a base schema called "Animal" with properties for "name" and "age", and a discriminator called "type" that will be used to select the appropriate schema for each animal. We also define two alternative schemas called "Dog" and "Cat", which extend the "Animal" schema with additional properties specific to dogs and cats.

If a request or response payload includes an animal with a "type" property of "dog", the schema definition for "Dog" will be used. Similarly, if the "type" property is "cat", the "Cat" schema will be used.

Here's an example payload for a dog:

{

"type": "dog",

"name": "Fido",

"age": 3,

"breed": "Golden Retriever"

}

And here's an example payload for a cat:

{

"type": "cat",

"name": "Whiskers",

"age": 2,

"color": "tabby"

}

In this way, discriminators allow you to define flexible, extensible API schemas that can adapt to a variety of use cases.

AsyncGenerator

In TypeScript, an AsyncGenerator is a special type of generator function that can be used to asynchronously generate a sequence of values.

Like a regular generator function, an AsyncGenerator is defined using the function* syntax, but with the addition of the async keyword before the function keyword:

async function* myAsyncGenerator() { // ...}

An AsyncGenerator function can use the yield keyword to return values one at a time, just like a regular generator. However, because it is an asynchronous function, it can also use the await keyword to pause execution until an asynchronous operation completes before continuing to the next yield statement.

Here is an example of an AsyncGenerator that asynchronously generates a sequence of random numbers:

async function* randomNumbers(count: number): AsyncGenerator<number> { for (let i = 0; i < count; i++) { // Wait for a random number to be generated asynchronously await new Promise(resolve => setTimeout(resolve, 1000)); }}

To use an AsyncGenerator, you can call it like a regular generator and iterate over the values it generates using a for-await-of loop:

async function printRandomNumbers(count: number) {

for await (const number of randomNumbers(count)) {

console.log(number);

}

This will asynchronously generate and print count random numbers, one per second.

Employee Stock Options

Stock option: the opportunity to buy a stock at a set price up until some date in the future.

So what does this actually mean for an employee? Let's say you are given 100 options for the price 15 USD for 5 years. That would be worth now (if you could sell it) 1500 USD-s. Let's say in five years the stock of the company goes up 5 USD-s, so now it's 20 USD-s. Because you got the options for 15 USD, and now it's worth 20 USD-s, the company practically gives you 5 USD-s (20-15) per options, so you have 100*5 = 500 USD profit!

Vesting refers to the process by which an employee earns ownership of an asset, such as stock options, over time. In the context of employee stock options, vesting typically means that an employee gains the right to exercise (i.e., purchase) a certain number of shares of their company's stock at a set price (known as the "strike price") over a predetermined period of time, known as the "vesting period."

The vesting period is usually several years long and is intended to incentivize employees to stay with the company and contribute to its success over the long term. As the employee meets certain milestones or stays with the company for a certain length of time, they earn the right to exercise more of their stock options. Once the options have vested, the employee has the choice to exercise them, usually by paying the strike price, and can then sell the shares on the open market or hold onto them.

Employee stock options are a common form of equity compensation used by companies to attract and retain talented employees. They are typically offered to employees at all levels of the organization, from executives to rank-and-file workers. The number of options granted and the terms of the vesting schedule vary from company to company and can depend on factors such as the employee's role, tenure, and performance. The hope is that by providing employees with a financial stake in the company's success, they will be more motivated to work hard and help the company achieve its goals.

Saturday, March 4, 2023

PromQL (Prometheus Query Language) notes

PromQL (Prometheus Query Language) is a query language used to retrieve and manipulate time series data stored in Prometheus. Mos important aspects of PromQL:

Selecting metrics: You can select metrics by their name or by using regular expressions. For example, up selects all metrics with the name up, while node_cpu.* selects all metrics with names starting with node_cpu.
Filtering metrics: You can filter metrics by their labels using curly braces {}. For example, up{job="prometheus"} selects all metrics with the name up and the label job equal to prometheus.
Aggregating data: PromQL provides a variety of functions to aggregate time series data. For example, sum() calculates the sum of values across multiple time series, while avg() calculates the average value across multiple time series.
Grouping data: You can group data by one or more labels using the by keyword. For example, sum(rate(http_requests_total{method="GET"}[5m])) by (status_code) groups the data by the status_code label.
Working with time: PromQL supports a variety of time-related functions, such as time() to get the current timestamp, offset() to shift the data by a certain time interval, and rate() to calculate the rate of change over a time period.

PromQL provides a powerful and flexible way to analyze and query time series data stored in Prometheus.

Consistent hashing via hash ring

Consistent hashing via hash ring is a technique used in distributed computing systems to evenly distribute data and workload across a group of servers, while also minimizing the impact of server failures and additions on the system.

In this technique, servers are arranged in a circular ring, with each server represented by a point on the ring. The ring is typically implemented using a hash function that maps keys to points on the ring. The keys in this context refer to the data or workload that needs to be distributed across the servers.

Source: https://medium.com/swlh/consistent-hashing-68c13951083e

To assign a key to a server, the hash function is applied to the key to obtain its corresponding point on the ring. The server whose point on the ring is the next highest to the key's point is then responsible for handling that key. If a server fails or is added to the system, only the keys that were previously assigned to that server need to be reassigned to a new server. The other keys remain with their previously assigned server.

By using this technique, the system can achieve good load balancing, as each server is responsible for handling an approximately equal portion of the keys on the ring. Additionally, the impact of server failures and additions is minimized, as only a small portion of the keys need to be reassigned when a server is added or removed.

Overall, consistent hashing via hash ring is a powerful technique for distributing workload and data in distributed systems, and has been widely used in many large-scale systems, such as content delivery networks and distributed databases.

Backend system design interview notes

Good resource: https://interviewing.io/guides/system-design-interview