Daily Entry: April 28th, 2017

Second day of being bad about this.

I need to be more strict about doing a time block in the morning. I don't have to follow it, it's just a good means to say, "I have this much time, and if I am doing something other than I planned it better be more important because I won't have time for the thing I planned, probably."

I'm just going to give some stats and throw everything else to tomorrow.

Fri Apr 28 23:28:28 EDT 2017

Weight: 212 lbs

Exercise:

  • Ran half a mile
  • Danced 13 songs
  • Quick stretch routine
  • 3 pull ups
  • 4 hours of standing or so

Watched:

  • First episode of Better Call Saul
  • An episode of Bob's Burgers

Cooked:

  • Roasted red bell pepper hummus

Cleaned:

  • Kitty-litter (daily chore now)
  • Kitchen

Responsibilities:

  • Caught up on tickler
Fri Apr 28 23:30:45 EDT 2017

Good enough. Tomorrow morning needs to start with time blocking.

Daily Entry: April 27th, 2017

Thu Apr 27 21:58:26 EDT 2017

Today was a lazy day without proper logging for a variety of reasons.

Tomorrow I'll review both yesterday and today. Yesterday's review should be of reasonable lenght, today's will be super short. Today's review will not be its own post.

Find and Replace Problem-Statement

On Tuesday, April 25th, I realized that I had set the tags in each blog post incorrectly. I thought they were delimited via commans, when they were actually delimited via an unordered list with the '-' character used as a bullet.

This is a common find/replace problem that can be solved in a variety of ways. It's a good simple problem to regularly solve. Also, it is easily mutated with different find/replace challenges.

The Problem-Statement

A series of blog post files have their tags set incorrectly.

Each file has a list of tags set via:

1
2
3
4
5
---
title: Title
date: date
tags: tag1, tag2, tag3
---

When it should be:

1
2
3
4
5
6
7
8
---
title: Title
date: date
tags:
- tag1
- tag2
- tag3
---

Programmatically fix this incorrect setup.

Mutations

Do It Backwards

Find/replace in the reverse order.

Switch:

1
2
3
4
5
6
7
8
---
title: Title
date: date
tags:
- tag1
- tag2
- tag3
---

to:

1
2
3
4
5
---
title: Title
date: date
tags: tag1, tag2, tag3
---

Sed One-Liner

First attempt did this with two sed commands. It can definitely be done in one.

Attempts

[1]

Review of April 25th, 2017

Time-limit to complete post: 30 minutes.

Pencils down at 10:38:35 EDT 2017

Reading the Days Posts

Daily Entry

The formatting of this page is all weird for some reason. Oh, it's because there's a single-spaced indent on all lines. Making each line-break in the markdown filea a line-break in the page.

Fixed.

I used sed to solve a problem with how I incorrectly did tags in these diary posts.

This should probably be a problem statement.

I constantly distracted myself throughout the day. I'm doing it while writing this post, as well. I'll stop, walk away from the computer, and pace thinking about any given thing.

I do not have proper control over my brain right now.

Creating time-limit for this post.

I decided to separate stretching from other exercise, because I like to stretch before exercise, and when I exercise is less predictable because exercise starts via dancing with my wife.

This was a good decision.

I have started the habit of writing notes while watching TV. I'd like to try to start some proper analysis of what I watch.

Vatic Labs Problem-Statement

The first problem-statement of my training regimen. Hopefully the beginning of something very valuable to me.

Vatic Labs Post-Mortem

This will be of use to me next time I tackle the given problem.

Usefule stuff right now from this post:

  • I want to create a project creation recipe
  • I want to create a common speed bottleneck list

Stats

Exercise:

  • Ran 2 miles in over 20 minutes
  • Danced 9 songs in Just Dance
  • Did 4 pull-ups
  • Did a full stretch routine
  • 1-minute plank
  • 1-minute horse stance

Writing:

  • Wrote three diary posts
  • Wrote an argumentative comment on YouTube
  • Wrote a short update email to head-hunter

Watched:

  • Episode of Rick and Morty
  • 2 episodes of Elementary
  • 2 episodes of Iron Fist

Read:

  • Love Joy Feminism blog post
  • Pharyngula blog post
  • Random tweets
  • YouTube comment

Overall

Distractions limited output of the day. Need to work on conquering distractions.

Daily Entry: April 26th, 2017

Wed Apr 26 10:03:34 EDT 2017

So, now that I'm actively planning multiple posts per day, I suppose it is prudent to explain how the daily entry will work.

The daily entry will be the "first" post of the day. However, it'll be updated throughout the day, and I will announce in this post when I've switched to working on other posts.

Other posts will also be updated live to blog, because they aren't serious enough to work on in draft form and then properly publish. However, they will be written in a single go and only edited later if I need to correct mistakes.

Wed Apr 26 10:07:42 EDT 2017

I will now work on yesterday's review.

Wed Apr 26 10:17:13 EDT 2017

After I finish the review, I will make a "find/replace" problem-statement.

Wed Apr 26 10:22:46 EDT 2017

After "find/replace" problem, I will resurrect my 2015 timer. I may even end up using it for my own purposes.

Wed Apr 26 10:30:34 EDT 2017

After timer, create project creation recipe and create common speed bottleneck list.

Wed Apr 26 10:51:39 EDT 2017

Review done, and find/replace problem-statement and first post-mortem done.

Now it's time to resurrect the timer.

Wed Apr 26 11:13:47 EDT 2017

The timer has returned!

Wed Apr 26 11:34:24 EDT 2017

Time to work on the sample test on HackerRank from Two Sigma.

Wed Apr 26 13:21:49 EDT 2017

I am done with sample test (no good problems to keep for training regimen) and the actual Two Sigma test (two problems total, both good short problems).

Am creating problem-statement and saving my solutions before "finishing test".

Actually, I'm not allowed to copy-paste the text, I'll print it to pdf for now and write it in by hand later.

Wed Apr 26 13:35:14 EDT 2017

I'd like to put another hour in towards some form of training, but first I'm going to watch some stuff with family.

Vatic Labs Code Test Post-Mortem

The Problem Statement

Goals of This Post-Mortem

Specifically, I would like to capture stats to use as metrics for comparison against future attempts.

I would also like to mutate the challenge for future attempts, for improvement purposes.

Further, I would like to consider the four questions, Stephan:

  1. What did we do well, that if we don't discuss we might forget?
  2. What did we learn?
  3. What should we do differently next time?
  4. What still puzzles us?

I wish to answers these four questions on this attempt, as well as the process I am creating by repeatedly doing these problems and then post-morteming them.

Stats of Interest

Started: Sun Apr 23 08:30:00 EDT 2017

Ended: Mon Apr 24 08:30:00 EDT 2017

Time to complete task: 24 hours

Tests created for this attempt: 4

End-to-end tests (usable in future attempts): 3

Past test results: N/A

Current test results: all passed

Future test results: N/A

Final result:

  • Passed speed constraint
  • Failed system test
    • Meaning unknown
    • Possibly failed memory constraint
    • Possibly failed implementation

Constraints for next attempt:

  • Same constraints as in problem-statement except:
    • Three hours allotted time to complete task
    • Create larger suite of tests in general
      • Test every edge condition
      • Test every specified constraint of final executable
      • Rewrite end-to-end test such that they run a vatic_code_test executable
        • This will allow end-to-end tests to run against non-python attempts

What did we do well, that if we don't discuss we might forget?

Given the options, I decided to tackle the question in the language I knew best, and, if I had time, then port it to the language I didn't know as well.

I believe this was the correct decision.

I setup my environment pretty quickly, and was even using pylint throughout (caught small errors that might've wasted my time pretty quickly).

What did we learn?

If I am going to take using different editors and IDEs seriously, I have to:

  • note whenever I want something to be different
  • note bad habits
    • make bad habits impossible

Otherwise, I might as well stick to vim.

Pandas is a cool library that I want to get better at.

Printing to stdout is slow! I've learned this multiple times now and I'm now going to create a list of common speed enemies.

Only solve a problem once it is proven to be a problem. Again, learned this multiple times now, but it's a hard habit to break. Specifically, I tried to optimize away rounding error before verifying it would be a problem, and it ended up making my program too slow.

What should we do differently next time?

Git Init

I need to start cultivating a checklist (some people call them recipes and I like that metaphor) for general startup purposes.

I've dabbled in them in the past, and I think it's a good idea to dabble in them again.

But that's really specifically to do something differently next time that I surprisingly forgot to do this time. I did not start my project with:

git init

I didn't even consider properly versioning my code until I was almost done with my first implementation.

Even if my instinct is to make quick prototypes that I delete (more on that later), there's nothing wrong with versioning those prototypes. The repo will be deleted just as easily as the code.

Break It Down, Prototype, Start-Over

The problem was well-defined. I was given input, I was given output.

The problem is not particularly complicated, but it does have layers and many steps.

I needed to define the sub-goals, prototype an answer, get something working, and then redo it so it would naturally be a bit cleaner.

The final product ended up being a lot messier than I would like, probably in part because of my failure to do this.

Super Starting Over

Six hours in, I submitted my first code to see what result I would get.

"Time Limit Exceeded"

I panicked. I reread the problem-statement, I noticed that I was allowed numpy and pandas, and so I started over with the goal of solving the problem using that library.

I'm super glad I did this, but I should not do it again. Not like how I did.

Noting the third-party libraries that are explicitly allowed is ideal, and specifically using those libraries attempts not the first one for the purpose of getting better at those libraries is also a good idea.

Using a new library on a problem I hadn't properly finished before when it was not a requirement was incorrect. Especially since I already had a solution and I hadn't tested it in any way to see how close to fast-enough it was.

The non-working solution I made in pandas ended up over 10x slower than my original attempt.

All I needed to do to make my original program fast enough was save printing to stdout until the very end, and use floats instead of decimal.Decimal.

Had I made the dataset to test the speed of my code first, and tinkered with things to see if I could speed things up, I would've found this out and had time to further debug my code.

What still puzzles us?

How do I make fast pandas code?

I really want to know. Pandas is a library that could be of signficant use to me in future projects, I think.

What test did I fail?

I think it's possible I exceeded the memory limit, but rereading the problem-statement makes me think perhaps I did liquidity wrong.

Things I Should Do in Future Post-Mortems

All right, this post-mortem is taking me long enough. In fact, it's already taken me too long.

Next post-mortem: time-limit of 30 minutes.

I let myself get distracted too often and it wastes a lot of time. I will stop at the 30-minute mark next time. Even if it's in mid-sentence.

If this happens, I'll put a "pencils down" at the end of the post, I guess.

I would also like to make the code I made available.

I'll put it up on github in some way, I think.

"Pencils down."

Daily Entry: April 25th, 2017

Tue Apr 25 10:45:51 EDT 2017

I have posted the problem I worked on two days ago, as well as the company that I was doing it for.

It was going to be a post-mortem, but considering this will be a problem I revisit in the future, I have decided to post the problem itself first, and tag it with training regimen.

I have also realized that I've been doing tags incorrectly so far. Which means it is time to google how to use sed for exactly some specific thing.

Woo.

1
2
3
4
# Fix every tag but the first one
sed -i '/^tags:/s/,\ /\n-\ /g' *
# Fix the first tag
sed -i '/^tags:/s/ /\n-\ /g' *
Tue Apr 25 11:10:00 EDT 2017

Tags fixed.

Woo.

Tue Apr 25 11:23:02 EDT 2017

Really need to start post-mortem, as I also have another code-test today (and hopefully one or more additional training regimens).

Tue Apr 25 12:23:27 EDT 2017

I really needed to stretch first, and I did a lot of pacing inbetween.

Tue Apr 25 12:29:36 EDT 2017

Forgot about foam rolling.

Tue Apr 25 22:12:16 EDT 2017

I did the things. That post is above this one, so you may not notice that I went and updated this post more, Stephan. I'll mention it in the review of the day. Which I think will be separate posts now.

My legs are super thanking me right now. Good decision there.

Tue Apr 25 15:30:40 EDT 2017

Did some watching of TV with family. Now I'll do post-mortem.

Tue Apr 25 15:52:24 EDT 2017

The wife has asked that I resurrect my timer for her own purposes. I will do so after the post-mortem and sample test.

Tue Apr 25 16:37:48 EDT 2017

Post-mortem "done".

Vatic Labs Code Test Problem-Statement

On Sunday, April 23rd, around 0830 EDT, I started a coding test as a part of an application to Vatic Labs.

As far as I can tell, having read everything pretty carefully and thoroughly, I have made no agreement to not discuss or publish the problem. I am repeating the problem here for the purposes of saving it to my training regimen.

This shall be the first of many, I hope.

There will be no commentary in this post, but future commentary on this problem will refer to this post.

The Problem-Statement.

Objective

Your program will analyze quotes and trades and output paired trades. A quotes file describes the prices of several stocks over time, and a trades file lists the inidividual transcations a hypothetical trading firm makes in those stocks.

Quotes File

The quotes file contains:

1
2
3
4
5
6
TIME,SYMBOL,BID,ASK
1,ABC,10.05,10.06
1,DEF,35.95,35.99
2,GHI,76.34,76.42
3,ABC,10.06,10.07
5,DEF,35.97,36.03

Each line represents a quote update (i.e., BID and ASK prices) of the given SYMBOL, and is effective from the given TIME until there is another update for the same SYMBOL. In the above example, the BID and ASK of 10.05 and 10.06 respectively for symbol ABC is valid for times 1 and 2 but the values change to 10.06 and 10.07 beginning at time 3. TIME will always be an int presented in chronological order.

The BID represents the highest price at which market participants are willing to buy and the ASK represents the lowest price at which market participants are willing to sell. The ASK is always strictly greater than the BID.

Trades File

The trades file contains:

1
2
3
4
5
TIME,SYMBOL,SIDE,PRICE,QUANTITY
2,ABC,B,10.06,500
4,DEF,S,35.99,200
4,ABC,S,10.07,200
5,ABC,S,10.07,300

Each line represents a single trade made by a hypothetical firm. A trade contains the SIDE (whether the firm B(ought) or S(old) the SYMBOL), the PRICE of the transaction, and QUANTITY of shares executed. Again, TIME in this file is chronological.

Task Description

For each trade, determine the prevailing bid and ask of the execution, as well as the LIQUIDITY tag of the execution. For a B(uy) execution, the LIQUIDITY is P(assive) if the price is at or below the BID or A(ggressive) if the price is at or above the ASK. For a S(ell) execution, the LIQUIDITY is P(assive) if the price is at or above the ASK or A(ggressive) if the price is at or below the BID. As an example, the first ABC and DEF trades above have LIQUIDITY = A and P respectively.

Form opening-closing trade pairs in a first-in-first-out (FIFO) manner.

  • Every symbol begins with 0 inventory. The first execution will always be an opening trade as it opens new inventory. With ABC, we see an opening trade that creates net inventory = 500 at time 2.
  • If trades occur which increase the magnitude of net inventory (e.g., if another buy were to occur for ABC after the first one) then these new trades are also opening trades and should be maintained in a FIFO structure.
  • Any trade that reduces the magnitude of net inventory (e.g., the 200 share sell at time = 4) will be a closing trade and will pair off with the first available opening trade. Whenever this occurs, a "paired trade" record should be generated in the output file. The magnitude paired off will be the minimum of the shares executed by the paired opening and closing trades. In this example, this paired quantity is 200 shares since 200 < 500.
    • If the closing trade is smaller than the paired opening trade, then the opening trade's open inventory should be reduced but the trade should maintain its FIFO position. In this example, the first opening ABC trade now holds 300 shares remaining.
    • If the closing trade is equal in size to the paired opening trade, then the opening trade is completely consumed.
    • If the closing trade is bigger than the paired opening trade, then it consumes the entire opening trade and proceeds to pair against the next opening trades. Note that this means a single closing trade can pair against multiple opening trades. If one closing trade closes 10 opening trades, then this creates 10 separate "paired trades."
    • Finally, if the closing trade is bigger than the entire open inventory, whatever is left over from the closing trade actually "flips" the inventory and creates a new opening trade on the opposite side. It will wait until a future closing trade to pair against it. For example, a single Sell trade could close five Buy trades as well be itself an opening Sell trade.

For each paired trade, compute the profit and loss (PNL), which is the product of the paired quantity and the per-share PNL (difference between sell price and buy price).

Write all "paired trades" (in any order) to standard output.

Notes

  • The firm can sell short in any SYMBOL, i.e., take a negative position.
  • If a quote update and trade happen at the same time, the quote update takes precedence.
  • There may be some nonzero inventory at the end, i.e., not all trades will be paired. Only print paired trades.
  • You may assume that all prices for testing will have at most two decimal places.
  • Optimize the code for speed without sacrificing readability.

Output Paired Trades

The output you produce should resemble:

1
2
3
OPEN_TIME,CLOSE_TIME,SYMBOL,QUANTITY,PNL,OPEN_SIDE,CLOSE_SIDE,OPEN_PRICE,CLOSE_PRICE,OPEN_BID,CLOSE_BID,OPEN_ASK,CLOSE_ASK,OPEN_LIQUIDITY,CLOSE_LIQUIDITY
2,4,ABC,200,2.00,B,S,10.06,10.07,10.05,10.06,10.06,10.07,A,P
2,5,ABC,300,3.00,B,S,10.06,10.07,10.05,10.06,10.06,10.07,A,P

Constraints

The number of entries in the quotes file is less than 5,000,000.

The number of entries in the trades file is less than 5,000,000.

Time Limit: 20 secs (C++), 60 secs (Python)

Memory Limit: 1 GB

Submission Instructions

Please submit your code in a single compressed tarball (.tgz or .tar.gz) or zip archive within 24 hours of beginning the coding exercise. Your code will be graded based on correctness, efficiency, and style. We expect the coding and implementation to take between 2-4 hours. Please take as much of the remaining time as you want to test, clean, and document your work appropriately.

Upon submitting, the online judge will compile your code and run your program against a series of large test cases. Compilation erros will be made available to you. If your program terminates unexpectedly, you will be provided with the exit code. When your submission is accepted by the online judge, you have completed the assignment and a human grader will evaluate your implementation. You are allowed an unlimited number of submissions in the 24-hour time span.

Grading

C++

We will compile your code using GCC 4.8.2 and run your binary on 64-bit Ubuntu 14.04 LTS. If you choose to develop on an environment different from ours, you are responsible for writing your solution in a portable fashion.

The online judge will extract and compile your code via

1
g++ -std=c++11 -O3 -o vatic_code_test [all .cpp and .cc source files]

Your binary will be run via

1
./vatic_code_test [quotes file] [trade file]

Python

We will run your code using Python 2.7 on 64-bit Ubuntu 14.04 LTS

1
python vatic_code_test.py [quotes file] [trade file]

Please ensure that your script is runnable from the archive root, not from within a subdirectory. Most packages in the Python Standard Library will be available for your use. In addition, we have installed numpy (1.11.1) and pandas (0.19.0). No other third-party packages are allowed.

Attempts

[1]

Daily Entry: April 24th, 2017

Mon Apr 24 11:12:32 EDT 2017

Yesterday did not go according to plan.

It took me 6 hours to get some semblance of working code, and that code didn't run fast enough. I then reread the problem statement, installed the available third-party libraries I could use, and redid the work whilst starting to grok a new library.

When I got that code working hours later... it was an order of magnitude slower than the original code.

I'm pretty excited to learn how to use pandas, and that I made a somewhat working solution with a library I didn't know existed the day before felt pretty good. But making it fast I do not know how to do.

So, this morning, around 0530, 3 hours before the test expired, I went about creating better fake data to test how fast my code ran.

I then tweaked a few things to my original code and suddenly it was fast enough.

Those things were:

  • Printing to stdout in one chunk at the end of the program instead of at each calculated line
    • I should really know better at this point to not do that but I keep forgetting
  • Switching using Decimal objects back to float
    • I didn't need the precision of the Decimal objects, I was being silly using them

Upon submitting the faster code, I no longer got a time-out error, I got a "Failed system test" error. I made some tests to see what I may be doing wrong but couldn't find anything. I may have exceeded memory limitations, though I think the online judge would've told me. I couldn't get my setup to spit out wrong answers with the input I created (though the pandas version certainly did). So, I gathered all the resources I developed in solving the problem, and zipped it all up for one final submission, so that a human grader could get a better idea of everything I did to tackle the problem, if they are so inclined.

Mon Apr 24 11:23:17 EDT 2017

I am so glad I did that coding test, and that I'm starting the interviewing process. Yesterday was engaging, fun, a wake-up call, and inspiration on how to start seriously developing my craft.

Mon Apr 24 11:28:40 EDT 2017

I have, for a while now, been wanting to start developing a list of programming problems to solve regularly. I have not been cultivating any such list.

Yesterday I have secured my first problem that I actively plan to solve regularly, and I'll be copying the problem here later for a variety of reasons.

Today, and tomorrow, I'll be doing a test for a different company I may be interviewing with. Rather, they have a sample test, and then the real test. The real test I will have 3 hours of time to complete. It looks like it'll be multiple questions, including some multiple choice. From here, too, I will probably gather some problems to add to my training regimen.

Also, this test will be through hackerrank, which I will devote some non-zero time to everyday during my funemployment, from now on.

Mon Apr 24 11:34:28 EDT 2017

A long time ago, I lamented to a friend of mine how I felt I was too slow at progress in my profession. He thought on what I said a bit, and asked me, "You like speedrunning, right? Why don't you speedrun work?"

This is a thought that has stuck with me for a long while. And it is an idea I have tried to implement, both with success and failure.

Regularly solving the same set of problems, from scratch, is a very good application of this thought.

With regular new problems, consistently being fast is difficult. I have things I need to learn and research, even relearn and re-research. Time gets eaten up at unexpected places, and the feeling of a "speedrun" is lost. It becomes a grind again. Regularly solving problems I've tackled before, however, and tracking time to completion, gives me a reasonable metric to gauge improvement or decay, and periodically adding new problems to this system allows me to practice getting fast at various types of problems that I can expect to run into again and again.

Thus, I should get faster at random programming projects and problems that come my way.

Mon Apr 24 11:47:01 EDT 2017

This is something I've said before, but getting started was difficult. I didn't know what problems to cultivate, and had no problems already to look at for inspiration on what else I wanted. Beginning the job application process has solved this problem for me. Problems given in interviews are a perfect starting point, and actually doing interviews forces the motivation necessary to do the problem the first time. No looking for other problems because reasons, the random problem given to me is the one I had to solve.

Exactly what I needed to get started.

I think.

I hope.

Mon Apr 24 11:50:46 EDT 2017

Anyways, I want to do a review of yesterday, and in that review, I want to do a post-mortem. I think what I'll do is write a special post just devoted to yesterday's review.