👋 I’m Cedric Chee. I’ve been a software engineer, AI engineer, writer, and entrepreneur.

I code and write about it sometimes. I create system softwares and apps in Go/JS.

I do product engineering and web development at startups/consulting. I enjoy backend development.

At night, I tinker with systems programming in Rust.

Rethinking "Clean Code"

May 26, 2021 • 4 min read

I’ve written about this topic before.

Today, I’m seeing anti-“clean code” stuff topping social media again. This time, it’s about Robert C. Martin’s book “Clean Code”. I’m talking about this blog post, “It’s probably time to stop recommending Clean Code”.

I have actually read Clean Code. It’s not a perfect book. It’s not going to make anyone into a great programmer.

What I Discovered#

I’m going to quote some good points from an old (2020) /r/programming thread.

I’ve more or less given up on lists of rules for “clean code”. Every time I’ve proposed a list, someone creates some working code that assiduously follows every rule, and yet is a complete pile of crap. And yes, the someone is doing this in good faith.
Probably the only rule that really matters is: “use good judgement”.

Personally, I think the principles in Clean Code are very important. However, the book itself isn’t the best thing I’ve ever read, and attaching Uncle Bob’s name to it isn’t necessarily doing the subject matter a service
In my opinion, Sandi Metz’ blog and books (i.e. POODR) present the same principles as Clean Code but in a much more concise, clear fashion. If I had to pick two “required reading” books for every software developer, I absolutely think POODR and Code Complete (by Steve McConnel) would be on the top of the list.
I’ll be honest, reading POODR a few years ago felt like a wake-up call for me in terms of realizing just how much of a junior developer I am. There really is an art to designing abstractions, and if I ever end up doing imperative programming again, I’m going to try to do OO “the right way” this time.

I would personally recommend another Sandi Metz’ book, 99 Bottles of OOP - 2nd Edition. I have read and completed the exercises in this book. I liked the Flock principles being taught throughout this book to uncover abstractions (Not pre-mature/forced abstraction, not abusing OOP. Instead, practicing continuous refactoring with test to improve code. Test in this context is not necessary following strictly TDD style, which is good).

The author of that blog post suggested “A Philosophy of Software Design” (2018) by John Ousterhout. If you’re interested, I found these two blog posts and they have good reviews of that book.

Book Review by Johnz - Johnz explained as to why the he recommended it to other software engineers and developers. What caught my attention is his point on “Teaching Principles Over Rules”.
My Take (and a Book Review) by Gergely Orosz

Aside:

I’ve also seen the Semantic Compression idea from Casey Muratori, mainly this part:

Like a good compressor, I don’t reuse anything until I have at least two instances of it occurring. Many programmers don’t understand how important this is, and try to write “reusable” code right off the bat, but that is probably one of the biggest mistakes you can make. My mantra is, “make your code usable before you try to make it reusable”.’

Goodbye, Clean Code post by Dan.

I sure didn’t think deeply about any of those things. I thought a lot about how the code looked — but not about how it evolved with a team of squishy humans. … Don’t be a clean code zealot. Clean code is not a goal. It’s an attempt to make some sense out of the immense complexity of systems we’re dealing with.

That’s it for now. Till next time.

System Design Cheatsheet

May 20, 2021 • 8 min read

#system design

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps#

Clarify and agree on the scope of the system

User cases (description of sequences of events that, taken together, lead to a system doing something useful)
- Who is going to use it?
- How are they going to use it?
Constraints
- Mainly identify traffic and data handling constraints at scale.
- Scale of the system such as requests per second, requests types, data written per second, data read per second)
- Special system requirements such as multi-threading, read or write oriented.

High level architecture design (Abstract design)

Sketch the important components and connections between them, but don’t go into some details.
- Application service layer (serves the requests)
- List different services required.
- Data Storage layer
- eg. Usually a scalable system includes webserver (load balancer), service (service partition), database (master/slave database cluster) and caching systems.

Component Design

Component + specific APIs required for each of them.
Object oriented design for functionalities.
- Map features to modules: One scenario for one module.
- Consider the relationships among modules:
  - Certain functions must have unique instance (Singletons)
  - Core object can be made up of many other objects (composition).
  - One object is another object (inheritance)
Database schema design.

Understanding Bottlenecks

Perhaps your system needs a load balancer and many machines behind it to handle the user requests. * Or maybe the data is so huge that you need to distribute your database on multiple machines. What are some of the downsides that occur from doing that?
Is the database too slow and does it need some in-memory caching?

Scaling your abstract design

Vertical scaling
- You scale by adding more power (CPU, RAM) to your existing machine.
Horizontal scaling
- You scale by adding more machines into your pool of resources.
Caching
- Load balancing helps you scale horizontally across an ever-increasing number of servers, but caching will enable you to make vastly better use of the resources you already have, as well as making otherwise unattainable product requirements feasible.
- Application caching requires explicit integration in the application code itself. Usually it will check if a value is in the cache; if not, retrieve the value from the database.
- Database caching tends to be “free”. When you flip your database on, you’re going to get some level of default configuration which will provide some degree of caching and performance. Those initial settings will be optimized for a generic usecase, and by tweaking them to your system’s access patterns you can generally squeeze a great deal of performance improvement.
- In-memory caches are most potent in terms of raw performance. This is because they store their entire set of data in memory and accesses to RAM are orders of magnitude faster than those to disk. eg. Memcached or Redis.
- eg. Precalculating results (e.g. the number of visits from each referring domain for the previous day),
- eg. Pre-generating expensive indexes (e.g. suggested stories based on a user’s click history)
- eg. Storing copies of frequently accessed data in a faster backend (e.g. Memcache instead of PostgreSQL.
Load balancing
- Public servers of a scalable web service are hidden behind a load balancer. This load balancer evenly distributes load (requests from your users) onto your group/cluster of application servers.
- Types: Smart client (hard to get it perfect), Hardware load balancers ($$$ but reliable), Software load balancers (hybrid - works for most systems)

Database replication
- Database replication is the frequent electronic copying data from a database in one computer or server to a database in another so that all users share the same level of information. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others. The implementation of database replication for the purpose of eliminating data ambiguity or inconsistency among users is known as normalization.
Database partitioning
- Partitioning of relational data usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically).
Map-Reduce
- For sufficiently small systems you can often get away with adhoc queries on a SQL database, but that approach may not scale up trivially once the quantity of data stored or write-load requires sharding your database, and will usually require dedicated slaves for the purpose of performing these queries (at which point, maybe you’d rather use a system designed for analyzing large quantities of data, rather than fighting your database).
- Adding a map-reduce layer makes it possible to perform data and/or processing intensive operations in a reasonable amount of time. You might use it for calculating suggested users in a social graph, or for generating analytics reports. eg. Hadoop, and maybe Hive or HBase.
Platform Layer (Services)
- Separating the platform and web application allow you to scale the pieces independently. If you add a new API, you can add platform servers without adding unnecessary capacity for your web application tier.
- Adding a platform layer can be a way to reuse your infrastructure for multiple products or interfaces (a web application, an API, an iPhone app, etc) without writing too much redundant boilerplate code for dealing with caches, databases, etc.

Key topics for designing a system#

Concurrency

Do you understand threads, deadlock, and starvation? Do you know how to parallelize algorithms? Do you understand consistency and coherence?

Networking

Do you roughly understand IPC and TCP/IP? Do you know the difference between throughput and latency, and when each is the relevant factor?

Abstraction

You should understand the systems you’re building upon. Do you know roughly how an OS, file system, and database work? Do you know about the various levels of caching in a modern OS?

Real-World Performance

You should be familiar with the speed of everything your computer can do, including the relative performance of RAM, disk, SSD and your network.

Estimation

Estimation, especially in the form of a back-of-the-envelope calculation, is important because it helps you narrow down the list of possible solutions to only the ones that are feasible. Then you have only a few prototypes or micro-benchmarks to write.

Availability & Reliability

Are you thinking about how things can fail, especially in a distributed environment? Do know how to design a system to cope with network failures? Do you understand durability?

Web App System design considerations:#

Security (CORS)
Using CDN
- A content delivery network (CDN) is a system of distributed servers (network) that deliver webpages and other Web content to a user based on the geographic locations of the user, the origin of the webpage and a content delivery server.
- This service is effective in speeding the delivery of content of websites with high traffic and websites that have global reach. The closer the CDN server is to the user geographically, the faster the content will be delivered to the user.
- CDNs also provide protection from large surges in traffic.
Full Text Search
- Using Sphinx/Lucene/Solr - which achieve fast search responses because, instead of searching the text directly, it searches an index instead.
Offline support/Progressive enhancement
- Service Workers
Web Workers
Server Side rendering
Asynchronous loading of assets (Lazy load items)
Minimizing network requests (Http2 + bundling/sprites etc)
Developer productivity/Tooling
Accessibility
Internationalization
Responsive design
Browser compatibility

Working Components of Front-end Architecture#

Code
- HTML5/WAI-ARIA
- CSS/Sass Code standards and organization
- Object-Oriented approach (how do objects break down and get put together)
- JS frameworks/organization/performance optimization techniques
- Asset Delivery - Front-end Ops
Documentation
- Onboarding Docs
- Styleguide/Pattern Library
- Architecture Diagrams (code flow, tool chain)
Testing
- Performance Testing
- Visual Regression
- Unit Testing
- End-to-End Testing
Process
- Git Workflow
- Dependency Management (npm, Bundler, Bower)
- Build Systems (Grunt/Gulp)
- Deploy Process
- Continuous Integration (Travis CI, Jenkins)

Links

How to rock a systems design interview

System Design Interviewing

Scalability for Dummies

Introduction to Architecting Systems for Scale

Scalable System Design Patterns

Scalable Web Architecture and Distributed Systems

What is the best way to design a web site to be highly scalable?

How web works?

Adapted from vasanthk/System Design.md.

All credit goes to the rightful owner.

Hot Topics in Operating Systems

May 20, 2021 • 2 min read

#systems research #operating system

HotOS XVIII program will be great! We will get to see and hear new ideas in Operating Systems research on June 1 2021. It’s been a while for me. I think it will be good time to pause and take the chance to catch up and learn about how tech advances and new applications in OS research are shaping our computational infra. I don’t know where I hear this quip, “Always bet on Linux”. lol.

I think this would get me enjoying reading papers again (PDF published by SIGOps):

My Year In Review report since 2017

Jan 7, 2021 • 1 min read

#life #retro #productivity

All my “Year in Review” RescueTime reports from 2017 to 2020.

2020#

2020 full report

2019#

2019 full report

2018#

2018 full report

I have been using RescueTime since 2016 and a “premium” customer from 2017. However, 2017 report is missing as I couldn’t generate it from RescueTime website now.

The Why#

These reports are useful data for my monthly self retrospective session. By sharing this data publicly, I hope you can learn and get a sense about my productivity for my primary role as a software engineer.

Oh, I hope it’s not too late to wish everyone, a happy and productive 2021!

That’s all I have for today. Taa…

To Rust or Not

Jan 4, 2021 • 2 min read

#rust

A quick opinion.

Rust when:

Correctness is important – it provides more tools to help you write correct code and express invariants in a machine-checkable way.
Performance is important – either for single threaded programs or for those programs that benefit from concurrency or parallelism. Rust is a good option for some programs where there’s a clear hot loop. Rust is a perfect fit for workloads that have relatively flat CPU profiles where the performance bottlenecks are making memory allocations or similar.
Backward compatibility is important – the commitment to backward compatibility from the Rust authors means that you don’t get regular breakage simply from updating to a newer version of the language.

That doesn’t mean you’d always choose Rust.

Rust is not a good fit when:

It’s too much to give away the benefit from not requiring a compile step and making use of the ubiquity of a interpreter.
It’s difficult to justify using Rust for a typical web backend that’s mostly composing together various well-tested libraries to provide an API on top of a database.

Rust can makes hard things easy and easy things hard.

If you’re here and are interested in learning Rust, check out my Awesome Rust gist. I created this while I was learning Rust language in 2019. If you’re in a hurry or need a refresher, there’s a good post for that, “Learn Rust in half hour”.

First Taste of Generics in Go

Jul 19, 2020 • 1 min read

#go #programming

This friendly, down-to-earth tutorial explains what generic functions and types are, why we need them, how they work in Go, and where we can use them.

Generic functions in Go#

func PrintAnything(type T)(thing T) {
}

func main() {
    PrintAnything(int)(99)
}

GPT-3 Application Ideas

Jul 19, 2020 • 2 min read

#nlp #transformer #gpt-3

Part of my side project, I’ve been researching and curating a list of NLP resources focused on BERT, GPT, Transformer networks, and more for over two years.

GPT-3 (Generative Pretrained Transformer) came from the Transformer family.

This year OpenAI is back with new language model GPT-3 and is currently making wave around the Internet. It’s interesting to see what creative app ideas are possible using a bigger GPT-3 model. Below are a few random selection of such apps:

Code Generator#

A web UI layout generator: You just describe any layout you want, and it generates the React JSX code for you.

Generative Text#

Learn From Anyone: question-answering agent (teacher).
Emoji storytelling: Understand emojis and use them to describe movies.
Test prompts and all the AI-generated text results for those prompts.
Turing test
Generated tweets

More about the new kinds of tools and applications that people are building on GPT-3 API.

Effective Software Testing Practices

Jun 25, 2020 • 2 min read

#software testing

Writing down what you learn is key to your retention. Today I learned a bit on the wisdom of software testing and took some notes that I thought interesting enough to share.

I will not get into testing techniques this time. I will try to get more specific next time.

Find Important Bugs#

Test:

core functions before supporting functions. Core functions are critical and the top N things that the product does. It’s the functions that make the product what it is.
capability before reliability. Test whether each function can work at all before going deep into the examination of how any one function performs under many different conditions.
high-impact problems. Test the parts of the product that would do a lot of damage in case of failure.
common situations before niche situations.
the most wanted areas before areas not requested. This mean, any areas and for any problems that are of special interest to someone else.
things that are changed before things that are the same. Fixes and updates mean fresh risk.
common threats before rare threats. Test with the most likely stress and error situations.

Mindset#

Like to dispel the illusion that things work.
Critical thinking — critical examination of belief.
If you want to be a good tester, learn to think like one, not look like one.
Anticipate risks that the programmer missed — The more you learn about a product, and the more ways in which you know it, the better you will be able to test it.
Learn about systems thinking.
Intuition is often strongly biased.
Be an explorer.
What you think “it works” means might not match someone else’s definition.
Don’t confuse the test with the testing.
Manage bias.
Convince yourself that you are easy to fool.
When you know a product well, you make more assumptions about it, and you check those assumptions less often.
Don’t restrict yourself to being a steward of received wisdom; be the author of your own wisdom.

Ideas#

Use heuristics to generate ideas for tests. Examples:

Test at the boundaries.
Test every error message.
Test configurations that are different from the programmer’s.
Run tests that are annoying to set up.
Avoid redundant tests.

When Might Microservices Be a Bad Idea?

Jun 24, 2020 • 3 min read

#systems design #software design

When might microservices be a bad idea?#

Well, it’s mid of 2020. If you are in the software development field, you should somehow bump into posts and/or discussions that say microservices is an anti-pattern — more services, more pain. As confusing as it is, today, I steal some time from my usual day to try to dissect this topic.

So, I’ve watch this GOTO 2019 talk by Sam Newman on monolith decomposition patterns. It’s one of the best talks on the topic I’ve seen.

Monolith Decomposition Patterns#

Isolate the data
Release train
Horror, pain and suffering
- Microservices are not the goal — you don’t win by doing microservices.
- It’s so silly when people start comparing how many microservices you got.
Strangler fig applications (“wraps around” existing system)
- Incremental migration of functionality from one system to another.
Branch by abstraction
- Create an abstraction for the functionality to be replaced.
- You can also learn more by reading “Working Effectively with Legacy Code” book by Michael Feathers.
Parallel run
- Rather than calling either the old or the new implementation, instead we call both.
Decompose the database
- You can also learn more by reading “Refactoring Databases” book by Scott Ambler and Pramod Sadalage.
Partitions
- Split table

I’ve also read a wide-range of posts on this topic to get a better understanding.

The following key takeaways are taken from InfoQ Podcast:

Fundamentally, microservices are distributed systems. Distributed systems have baggage (complexity) that comes along with them. The best way to deal with this complexity is not to address it. Try to solve the problem in other ways before choosing to take an organization to microservices.
A common issue that large enterprises run into that might be a strong indicator for implementing microservices occurs when lots of developers are working on a given problem and they’re getting in each other’s way.
A useful structure to follow with microservices is to make sure each service is owned by exactly one team. One team can own more than one service but having clear ownership of who owns a service helps in some of the operational challenges with microservices.
A release train should be a stop in the journey towards continuous delivery. It’s not the destination. If you find that you can only release in a release train, you are likely building a distributed monolith.
There are challenges of operating microservices when the end customer has to operate and manage it. These challenges are part of why we’re seeing projects move from microservices to process monoliths.

I think, these takeaways can act as a good summary for the videos, talks, and articles I’ve seen.

References#

Monolith To Microservices book by Sam Newman.
How to break a Monolith into Microservices article by ThoughtWorks.

Summary of "Clean Code" by Robert C. Martin

May 30, 2020 • 5 min read

#programming #coding style #guide

A summary of the main ideas from the “Clean Code: A Handbook of Agile Software Craftsmanship” book by Robert C. Martin (aka. Uncle Bob).

Code is clean if it can be understood easily – by everyone on the team. Clean code can be read and enhanced by a developer other than its original author. With understandability comes readability, changeability, extensibility and maintainability.

General rules#

Follow standard conventions.
Keep it simple stupid. Simpler is always better. Reduce complexity as much as possible.
Boy scout rule. Leave the campground cleaner than you found it.
Always find root cause. Always look for the root cause of a problem.
Follow the Principle of Least Surprise.
Don’t repeat yourself (DRY).
Do not override safeties.

Design rules#

Keep configurable data (e.g.: constants) at high levels. They should be easy to change.
Prefer polymorphism to if/else or switch/case.
Separate multi-threading code.
Prevent over-configurability.
Use dependency injection.
Follow Law of Demeter. A class should know only its direct dependencies.

Understandability tips#

Be consistent. If you do something a certain way, do all similar things in the same way.
Use explanatory variables.
Encapsulate boundary conditions. Boundary conditions are hard to keep track of. Put the processing for them in one place.
Prefer dedicated value objects to primitive type.
Avoid logical dependency. Don’t write methods which works correctly depending on something else in the same class.
Avoid negative conditionals.

Names rules#

Choose descriptive and unambiguous names.
Make meaningful distinction.
Use pronounceable names.
Use searchable names.
Replace magic numbers with named constants.
Avoid encodings. Don’t append prefixes or type information.

Functions rules#

Small.
Do one thing and they should do it well.
Use descriptive names.
Prefer fewer arguments. No more than 3 if possible.
Have no side effects.
Don’t use flag arguments. Split method into several independent methods that can be called from the client without the flag.

Comments rules#

Always try to explain yourself in code. If it’s not possible, take your time to write a good comment.
Don’t be redundant (e.g.: i++; // increment i).
Don’t add obvious noise.
Don’t use closing brace comments (e.g.: } // end of function).
Don’t comment out code. Just remove.
Use as explanation of intent.
Use as clarification of code.
Use as warning of consequences.

Source code structure#

Separate concepts vertically.
Related code should appear vertically dense.
Declare variables close to their usage.
Dependent functions should be close.
Similar functions should be close.
Place functions in the downward direction.
Keep lines short.
Don’t use horizontal alignment.
Use white space to associate related things and disassociate weakly related.
Don’t break indentation.

Objects and data structures#

Hide internal structure.
Prefer data structures.
Avoid hybrids structures (half object and half data).
Should be small.
Do one thing.
Small number of instance variables. If your class have too many instance variable, then it is probably doing more than one thing.
Base class should know nothing about their derivatives.
Better to have many functions than to pass some code into a function to select a behavior.
Prefer non-static methods to static methods.

Tests#

One assert per test.
Fast.
Independent.
Repeatable.
Self-validating.
Timely.
Readable.
Easy to run.
Use a coverage tool.

Code smells#

Rigidity. The software is difficult to change. A small change causes a cascade of subsequent changes.
Fragility. The software breaks in many places due to a single change.
Immobility. You cannot reuse parts of the code in other projects because of involved risks and high effort.
Needless Complexity.
Needless Repetition.
Opacity. The code is hard to understand.

Error handling#

Don’t mix error handling and code.
Use Exceptions instead of returning error codes.
Don’t return null, don’t pass null either.
Throw exceptions with context.

Presented as cheat sheet by CosteMaxime.
The essence of “Clean Code” [PDF] - a different summary
How to write clean code? Lessons learnt from “The Clean Code” [article]
A summary of the fundamental principles of writing great code [article]

Adapted from wojteklu/clean_code.md.