The Problem Statement
Goals of This Post-Mortem
Specifically, I would like to capture stats to use as metrics for comparison against future attempts.
I would also like to mutate the challenge for future attempts, for improvement purposes.
Further, I would like to consider the four questions, Stephan:
- What did we do well, that if we don't discuss we might forget?
- What did we learn?
- What should we do differently next time?
- What still puzzles us?
I wish to answers these four questions on this attempt, as well as the process I am creating by repeatedly doing these problems and then post-morteming them.
Stats of Interest
Started: Sun Apr 23 08:30:00 EDT 2017
Ended: Mon Apr 24 08:30:00 EDT 2017
Time to complete task: 24 hours
Tests created for this attempt: 4
End-to-end tests (usable in future attempts): 3
Past test results: N/A
Current test results: all passed
Future test results: N/A
Final result:
- Passed speed constraint
- Failed system test
- Meaning unknown
- Possibly failed memory constraint
- Possibly failed implementation
Constraints for next attempt:
- Same constraints as in problem-statement except:
- Three hours allotted time to complete task
- Create larger suite of tests in general
- Test every edge condition
- Test every specified constraint of final executable
- Rewrite end-to-end test such that they run a vatic_code_test executable
- This will allow end-to-end tests to run against non-python attempts
What did we do well, that if we don't discuss we might forget?
Given the options, I decided to tackle the question in the language I knew best, and, if I had time, then port it to the language I didn't know as well.
I believe this was the correct decision.
I setup my environment pretty quickly, and was even using pylint throughout (caught small errors that might've wasted my time pretty quickly).
What did we learn?
If I am going to take using different editors and IDEs seriously, I have to:
- note whenever I want something to be different
- note bad habits
- make bad habits impossible
Otherwise, I might as well stick to vim.
Pandas is a cool library that I want to get better at.
Printing to stdout is slow! I've learned this multiple times now and I'm now going to create a list of common speed enemies.
Only solve a problem once it is proven to be a problem. Again, learned this multiple times now, but it's a hard habit to break. Specifically, I tried to optimize away rounding error before verifying it would be a problem, and it ended up making my program too slow.
What should we do differently next time?
Git Init
I need to start cultivating a checklist (some people call them recipes and I like that metaphor) for general startup purposes.
I've dabbled in them in the past, and I think it's a good idea to dabble in them again.
But that's really specifically to do something differently next time that I surprisingly forgot to do this time. I did not start my project with:
git init
I didn't even consider properly versioning my code until I was almost done with my first implementation.
Even if my instinct is to make quick prototypes that I delete (more on that later), there's nothing wrong with versioning those prototypes. The repo will be deleted just as easily as the code.
Break It Down, Prototype, Start-Over
The problem was well-defined. I was given input, I was given output.
The problem is not particularly complicated, but it does have layers and many steps.
I needed to define the sub-goals, prototype an answer, get something working, and then redo it so it would naturally be a bit cleaner.
The final product ended up being a lot messier than I would like, probably in part because of my failure to do this.
Super Starting Over
Six hours in, I submitted my first code to see what result I would get.
"Time Limit Exceeded"
I panicked. I reread the problem-statement, I noticed that I was allowed numpy and pandas, and so I started over with the goal of solving the problem using that library.
I'm super glad I did this, but I should not do it again. Not like how I did.
Noting the third-party libraries that are explicitly allowed is ideal, and specifically using those libraries attempts not the first one for the purpose of getting better at those libraries is also a good idea.
Using a new library on a problem I hadn't properly finished before when it was not a requirement was incorrect. Especially since I already had a solution and I hadn't tested it in any way to see how close to fast-enough it was.
The non-working solution I made in pandas ended up over 10x slower than my original attempt.
All I needed to do to make my original program fast enough was save printing to stdout until the very end, and use floats instead of decimal.Decimal.
Had I made the dataset to test the speed of my code first, and tinkered with things to see if I could speed things up, I would've found this out and had time to further debug my code.
What still puzzles us?
How do I make fast pandas code?
I really want to know. Pandas is a library that could be of signficant use to me in future projects, I think.
What test did I fail?
I think it's possible I exceeded the memory limit, but rereading the problem-statement makes me think perhaps I did liquidity wrong.
Things I Should Do in Future Post-Mortems
All right, this post-mortem is taking me long enough. In fact, it's already taken me too long.
Next post-mortem: time-limit of 30 minutes.
I let myself get distracted too often and it wastes a lot of time. I will stop at the 30-minute mark next time. Even if it's in mid-sentence.
If this happens, I'll put a "pencils down" at the end of the post, I guess.
I would also like to make the code I made available.
I'll put it up on github in some way, I think.
"Pencils down."