Compact facial landmark sequences have been proven to characterize the complex dynamics associated with facial expressions effectively. We formulated driver drowsiness detection under the multivariate time series classification (MTSC) paradigm. Assigning drowsiness labels to short time intervals is sensitive enough to recover drivers' vigilance. To accommodate the constrained computational demand on embedded devices, we propose DrowsyNet, a dilated LSTM-FCN, to tackle MTSC by featuring its dilated LSTM (Long Short-Term Memory) and dilated FCN (Fully Convolutional Network) with low computational complexity. The dilated LSTM focuses on modeling long-term dependencies, while the dilated FCN specializes in extracting interactive patterns from multivariate dimensions. Our empirical evidence indicates that facial landmark sequences of length 8 are sufficient to sort the driving behaviors into desired class labels unambiguously. Quantitative analysis demonstrates that DrowsyNet effectively classifies facial landmark sequences by achieving an impressive classification accuracy of 86.90% and efficiently yields an inference speed of 15 frames per second on Firefly RK3399pro embedded boards. Besides, the optimal facial landmark quantities and their relative feature importance were empirically determined. Finally, class activation maps (CAMs) visually confirm the most contributing regions on facial landmark sequences for driver drowsiness detection.