yennanliu / til

Today I Learned

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

til

Today I Learned

  • Collection & record my daily learning.
  • Tech + product + business.

PROGRESS

20240126

20240120:

20240119

20240117

20240114

20240113

  • Java 受檢例外(Checked Exception), 執行時期例外(Runtime Exception)

    • 受檢例外(Checked Exception)

      • 在某些情況下例外的發生是可預期的,例如使用輸入輸出功能時,可能會由於硬體環境問題,而使得程式無法正常從硬體取得輸入或進行輸出,這種錯誤是可預期發生的,像這類的例外稱之為「受檢例外」(Checked Exception),對於受檢例外編譯器會要求您進行例外處理,
    • 執行時期例外(Runtime Exception

      • 像 NumberFortmatException 例外是「執行時期例外」(Runtime exception),也就是例外是發生在程式執行期間,並不一定可預期它的發生,編譯器不要求您一定要處理,對於執行時期例外若沒有處理,則例外會一直往外丟,最後由 JVM 來處理例外,JVM 所作的就是顯示例外堆疊訊息,之後結束程式。
    • Thoughts

      • 如果您在方法中會有例外的發生,而您並不想在方法中直接處理,而想要由呼叫方法的呼叫者來處理,則您可以使用 "throws" 關鍵字來宣告這個方法將會丟出例外,例如 java.ioBufferedReader 的 readLine() 方法就聲明會丟出 java.io.IOException。使用 "throws" 聲明丟出例外的時機,通常是工具類別的某個工具方法,因為作為被呼叫的工具,本身並不需要將處理例外的方式給定義下來,所以在方法上使用"throws"聲明會丟出例外,由呼叫者自行決定如何處理例外是比較合適的,您可以如下使用 "throws" 來丟出例外:
    • Ref

20240112

20240109

20240107

20240106

20240103

  • InnoDB
    • 是MySQL和MariaDB的資料庫引擎之一,最初由MySQL AB發行。InnoDB由Innobase Oy公司所開發,2006年五月時由甲骨文公司併購。與傳統的ISAM與MyISAM相比,InnoDB的最大特色就是支援了ACID相容的事務(Transaction)功能,類似於PostgreSQL
    • wiki
    • Other Mysql engine : MyISAM

20231230

20231225

  • Cache 三大現象
    • 緩存穿透
    • 緩存擊穿
    • 緩存雪崩

20231222

20231220

20231218

// java

@Component
public class ZKClient{

   @PostConstruct
   public void init(){
	}
}

20231212

20231211

20231209

20231206

  1. mybatis 帶入變數方式? 差別
  • #{}和${}的區別是什麼? #{}是預編譯處理,${}是字元串替換。 Mybatis在處理#{}時,會將sql中的#{}替換為?號,調用PreparedStatement的set方法來賦值; Mybatis在處理${}時,就是把${}替換成變數的值。 使用#{}可以有效的防止SQL註入,提高系統安全性。 https://www.zendei.com/article/70565.html
  • 批量插入語法?
  • Mybatis VS Hibernate ? Hibernate屬於全自動ORM映射工具,使用Hibernate查詢關聯對象或者關聯集合對象時,可以根據對象關係模型直接獲取,所以它是全自動的。而Mybatis在查詢關聯對象或關聯集合對象時,需要手動編寫sql來完成,所以,稱之為半自動ORM映射工具。
  1. redis, Zookeeper 實現分散式鎖

  2. websocket實現原理

  1. 如何偵測死鎖? 看什麼metrics ? cmd ?

  2. 實現 thread 方式? 如何讓資源獨享?

  3. java網路框架? Netty運作方式?

  4. 分庫分表方式?

  5. http 連線斷開 步驟? (client <-> server)

    • 4 hands shake
    1. client 發起請求
    2. server 接受請求
    3. server 斷開連接
    4. client 斷開連接
    
  6. redis 支持數據結構?

  7. 慢查詢? 如何優化?

  8. String, StringBuilder, StringBuffer差別, 使用場景? 哪ㄧ個可以用在thread安全?

  • https://www.runoob.com/w3cnote/java-different-of-string-stringbuffer-stringbuilder.html

  • https://www.readfog.com/a/1633579016528171008

  • https://c.biancheng.net/view/5822.html

  • String VS StringBuffer 主要性能區別:String 是不可變的對象, 因此在每次對String 類型進行改變的時候,都會產生一個新的String 對象,然後將指針指向新的String對象,所以經常改變內容的 字串最好不要用String ,因為每次產生物件都會對系統效能產生影響,特別當記憶體中無引用物件多了以後, JVM 的GC 就會開始運作,效能就會降低。

  • 使用 StringBuffer 類別時,每次都會對 StringBuffer 物件本身進行操作,而不是產生新的物件並更改物件引用。 所以多數情況下推薦使用 StringBuffer ,特別是字串物件經常改變的情況。

  • use case - 如果要操作少量的數據,用String - 單線程操作大量數據,用StringBuilder - 多線程操作大量數據,用StringBuffer。

  1. java 如何實現比較二個String是否相等 ?

20231205

20231129

20231128

20231127

20231124

20231123

20231122

// example

// -------------------
// 1) PathVariable
// -------------------

@GetMapping("/foos/{id}")
@ResponseBody
public String getFooById(@PathVariable String id) {
    return "ID: " + id;
}

-> endpoint
http://localhost:8080/spring-mvc-basics/foos/abc
----
ID: abc

// -------------------
// 2) RequestParam
// -------------------

@GetMapping("/foos")
@ResponseBody
public String getFooByIdUsingQueryParam(@RequestParam String id) {
    return "ID: " + id;
}


-> endpoint

http://localhost:8080/spring-mvc-basics/foos?id=abc
----
ID: abc

20231121

20231118

20231117

20231116

20231114

20231113

20231112

20231107

20231106

20231104

20231103

20231009

20231008

20231006

20231005

20231001

20230903

20230901

20230819

20230816

20230814

20230811

20230809

20230808

20230804

20230802

20230731

20230726

20230723

<!-- maven.xml -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.2.2</version>
        <configuration>
          <createDependencyReducedPom>false</createDependencyReducedPom>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

20230722

20230721

20230717

20230715

20230710

20230705

20230630

20230627

20230626

Flag value	Description
-p 8080:80	Map TCP port 80 in the container to port 8080 on the Docker host.

20230626

20230623

20230620

20230619

20230616

20230609

20230607

20230605

20230601

20230530

20230529

20230526

20230524

20230524

20230523

20230519

20230517

20230515

20230513

20230509

20230508

20230505

20230430

20230428

  • Python
    • How To Use the __str__() and __repr__() Methods in Python
      • https://www.digitalocean.com/community/tutorials/python-str-repr-functions
      • The str() method returns a human-readable, or informal, string representation of an object.
      • The repr() method returns a more information-rich, or official, string representation of an object. This method is called by the built-in repr() function. If possible, the string returned should be a valid Python expression that can be used to recreate the object.
      • Note that str() and repr() return the same value, because str() calls repr() when str() isn’t implemented.
       # python
       # implement a class with __repr__()
       
       class Ocean:
      
           def __init__(self, sea_creature_name, sea_creature_age):
       	self.name = sea_creature_name
       	self.age = sea_creature_age
      
           def __str__(self):
       	return f'The creature type is {self.name} and the age is {self.age}'
      
           def __repr__(self):
       	return f'Ocean(\'{self.name}\', {self.age})'
      
       c = Ocean('Jellyfish', 5)
      
       print(str(c))
       print(repr(c))

20230424

20230420

20230417

20230319

20230316

20230315

20230314

20230313

20230312

20230310

20230309

20230304

20230301

20230226

20230225

  • Spring boot
     // java
     @TableField(exist = false)
     private List<CategoryEntity> children;

20230222

20230214

20230212

  • Map Reduce
    • Reduce
     // syntax:
     // array.reduce(function(total, currentValue, currentIndex, arr), initialValue)
     // or
     // array.reduce(callback[, initialValue]);
     function(total, currentValue, index, arr): It is a required parameter used to run for each array element. It contains four parameters which are listed below:
     - total: It is the required parameter used to specify an initialValue or the previously returned value of a function.
     - currentValue: It is the needed parameter and is used to determine the value of a current element.
     - currentIndex: It is the optional parameter used to specify an array index of the current element.
     - arr: It is the optional parameter used to determine an array object the current element belongs to.
     	initialValue: The optional parameter specifies the value to be passed to the function as an initial value.
    
     // javascript
     // example:
     const data = [5, 10, 15, 20, 25];
    
     const res = data.reduce((total,currentValue) => {
       return total + currentValue;
     });
    
     console.log(res); // 75

20230210

20230209

20230208

20230207

20230205

20230203

20230201

20230126

20230125

**/mvnw
**/mvnw.cwd
**/.idea
**/.mvn
**/.iml
**/.cmd
**/target/
.idea

20230124

20230121

20230114

20230108

20221213

20221211

20221130

20221129

20221115

20221114

20221111

20221105

20221105

20221104

  • Spring boot
     // java
     @Target({METHOD, FIELD, ANNOTATION_TYPE, CONSTRUCTOR, PARAMETER}) // TODO : double check it
     @Retention(RUNTIME)
     @Documented
     @Constraint(validatedBy = {EnumValue.EnumValueValidator.class})
  • Java
    • Class<?>

20221102

20221029

20221028

20221024

20221023

20221022

20221019

20221013

20221004

20221003

20220929

20220927

20220919

20220915

  • Spring boot
    • this VS self

20220915

20220914

20220912

20220907

20220904

20220903

20220901

20220830

20220822

20220816

20220814

20220811

20220809

20220806

20220805

20220804

20220803

// traditional
Person person = new Person();
person.setName("wang");
person.setSex("male");
person.setEmail("123@XXX.com");
person.setDate(new Date());
person.setAddr("NY");

// with @Accessors(chain = true)
Person person = new Person();
person.setName("wang").setSex("male").setEmail("123@xxx.com").setDate(new Date()).setAddr("NY");

20220802

import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.AmazonS3URI;

20220731

20220728

20220727

20220726

20220724

20220722

20220719

20220718

20220716

20220715

20220709

20220708

20220705

20220703

20220701

20220629

20220627

20220624

20220622

20220621

20220617

20220615

20220614

20220606

20220604

20220603

20220601

20220531

20220530

20220528

20220527

20220526

20220523

20220522

20220521

20220519

20220514

20220426

20220425

20220418

20220406

20220401

  • DB
    • Hbase
    • dynamoDB
    • column based VS row based storage

20220323

20220322

20220321

  • Java
    • JVM error handling
    • how to config different apps run with different conf in SAME JVM
      • different spring aps run in the same JVM for example

20220314

20220313

20220223

20220209

20220208

20220207

20220125

20220124

20220120

20220115

20220105

  • Spark
    • write to HDFS setting
      • https://spark.apache.org/docs/2.3.0/configuration.html
      • https://www.cnblogs.com/chhyan-dream/p/13492589.html
      • If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that should be included on Spark’s classpath:
        • hdfs-site.xml, which provides default behaviors for the HDFS client.
        • core-site.xml, which sets the default filesystem name.
      • The location of these configuration files varies across Hadoop versions, but a common location is inside of /etc/hadoop/conf. Some tools create configurations on-the-fly, but offer a mechanism to download copies of them. To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/spark-env.sh to a location containing the configuration files.

20211221

20211215

20211208

20211207

  • Flink
    • Rolling policy
    • file sink cycle
    • conf checks

20211203

20211110

20211109

20211108

20211026

20210923

20210912

20210908

20210722

  • Scala
    • generic type
    • upper/lower bound

20210720

  • Java
    • mini progrject : Employer system
  • Scala
    • flat map transform to for
    • design pattern
      • proxy
      • decorator
  • Flink
    • SQL, Table API
    • status programming
      • exactly once

20210717

  • Java

  • Flink

    • Exactly one when sink
      • Idempotent writes (冪等寫入)
      • Transactional write (事務寫入)
        • Either all success or all fail
        • DB ACID
  • Spark

    • make Spark CAN coneect to remote HIVE
      • put core-site.xml .... in main/resources ->
  • Scala

    • RMI in Scala
    • FP map filter and remove

20210716

  • Java
    • Hadoop filesystem for HDFS IO

20210703

  • Java
    • SPRING VS Sprting MVC VS SPRING BOOT
    • Spring IOC
    • Spring DI
      • Dependency Injection
      • ref1
    • java pojo
  • Scala
    • Design pattern : factory
    • Design pattern : abstract factory

20210702

20210629

20210624

20210612

20210531

  • Flink
    • keyedStream and its op
    • datastream -> keyedStream
    • datastream op

20210530

  • Scala
    • AKKA mini project : yellow chicken messenger
      • AKKA internet programming (via pcpip)
      • closure, curry review
  • Java
    • abstract class, method, examples
    • polymorphism, downcasting review
  • Spring framework
    • search twitter via controller
    • code review
  • Flink
    • DataStream API : basics
    • DataStream API : transformation
    • DataStream API : aggregation
    • user defined source
  • Hadoop
    • file IO upload (via java client)
    • file IO download (via java client)
    • check file or directory (via java client)
  • Django
    • ListView, DetailView

20210517

  • Flink
    • slot
    • parallelism
    • combine 2 "missions" into one mission : if
      • one to one
      • parallelism are the same
      • ref1
      • ref2
    • job DAG in taskmanager, workmanager, actual implementation step
  • Spark
    • aggregatedBykey -> foldedBykey -> reducedBykey
  • Java
    • block : more examples (static block, regular block)

20210516

  • Hadoop
    • java client app : more file IO demos

20210515

  • Hadoop
    • java client app : file IO, file delete, repartition
  • Spark
    • reducebyKey VS groupby
    • map source code
  • Scala
    • AKKA intro
    • AKKA factory
    • AKKA actror
    • async
  • Java
    • singleton use cases
    • "餓漢式" VS "懶漢式" and its demo code

20210512

  • Flink
    • Rolling policy
      • Row-encoded Formats
        • Custom RollingPolicy : Rolling policy to override the DefaultRollingPolicy
        • bucketCheckInterval (default = 1 min) : Millisecond interval for checking time based rolling policies
      • Bulk-encoded Formats
        • Bulk Formats can only have OnCheckpointRollingPolicy, which rolls (ONLY) on every checkpoint.
      • ref1
      • ref2
      • ref3
      • ref4
  • Hadoop
    • distcp command argument

20210511

  • Scala
    • build.sbt shadow dependency when assembly to jar

20210510

  • Java
    • static intro
    • static method, use example, use case
  • Spark
    • zip
  • Hadoop
    • java client install, intro

20210509

  • Django
  • Flink
    • submit jobs
    • stand alone VS yarn
    • stand alone VS yarn architecture
    • Note : only stand alone mood has flink UI (or will use yarn UI)
    • flink CLI
    • core cocept : task manager, job manager, resource manager task slot... ( may different in stand alone VS yarn mood)

20210508

  • AWS EMR
    • basics : master node, task node, worker node ..
    • how namenode, datanode installed in EMR clusters
    • minimum requirement for a working EMR clusters
    • hive : basics
    • hive 1.x over mapreduce VS hive 2.x on tez
    • beestream

20210507

  • HDFS
    • more basic commands :
      • check file size : hdfs dfs -du, hdfs dfs -du -h, hdfs dfs -du -h -s
      • file permission : -chgrp, chmod, -chown
    • HDFS RM API
  • Spark
    • union, intersect, Cartesian product

20210506

  • Flink
    • save kafka event to HDFS

20210505

  • Flink
    • process from socket
    • process from kafka
    • process from socket and save to HDFS
    • submit job command to local job manager
    • stand alone mood VS job manager- task manager - worker mode
  • Spark
    • source code : repartition VS coalesce
    • source code : filter
    • source code : distinct
    • process stream from multiple kafaka topic and save to different HDFS bucket

20210503

  • Java
    • class Encapsulation
  • Spark
    • RDD partition, map, flatMap source code go through
  • Hadoop
    • hdfs architecture
      • basic
      • HA
    • data block & size -> default block size : 128 MB
    • common hdfs issues
    • factors affect HDFS IO speed
      • partition
      • block size
      • file counts
      • hard disk speed (data transmission)
      • metastore

20210501

  • DynamoDB
    • read capacity unit (RCU)
    • write capacity unit (WCU)
    • architecture
    • index, secondary index
    • sorting key
    • partition
    • read/write consistency
    • basic commands

20210430

  • Scala
    • mini project : customer system - modify/delete customer
  • Java
    • unit-test intro
    • toString, equals re-write
  • Django
    • user permission, comment permission
    • local auth, comment auth

20210429

  • Spark
    • mapPartition - define partition explicitly
    • "nearby rules" ( mapping with anonymous func)

20210427

20210426

  • Spark
    • add watermark to stream strcture df
    • load stream with schema
  • Scala
    • mini project : customer system - adding customer
  • Java
    • == VS equals
    • re-write equals
  • Hadoop
    • hadoop source code intro
    • compile Hadoop source code
  • Flink
    • submit task, and test

20210425

  • Java
    • == intro
    • equals intro

20210424

  • Java
    • object's finalize() method
    • java's gc (garbage collection) mechanism
  • Spark
    • spark core source code visit
    • ways create RDD
    • defince RDD partition explicitly
  • Hadoop
    • sync time within clusters

20210421

20210418

  • Hadoop
    • Thing to note when lanuch hadoop cluster in "distributed" mood

20210417

  • Django
    • form model (generate form from Django class)
    • login auth
  • Scala
    • DatetimeUtils
  • Java
    • polymorphism examples
  • Spark
    • stand alone VS yarn VS local
    • spark yarn mood job history config setup

20210416

  • Java
    • polymorphism intro
  • Scala
    • "control abstraction"

20210415

  • Spark
    • case class -> RDD -> df (?)
    • Array -> RDD -> df
    • df -> Parquet (append mood)

20210413

20210410

  • Django
    • form interact with views, urls and DB
  • Scala
    • Currying Function
    • closure
  • Java
    • steps by stpes : children class instantiation
  • Spark
    • SparkYarnCluster running mode intro

20210409

  • MapReduce
    • MapReduce OOM exception (out of memory)
  • Hadoop Streaming
  • Java
    • super call attr, methods...
    • super call constructor
  • Spark
      • SparkYarnStandAlone running mode intro

20210408

20210407

20210406

  • Java
    • override details

20210405

  • Scala
    • anonymous function
  • Java
    • debug in Eclipse
    • debug in Eclipse in a project
  • Saprk
    • spark stand alone architecture
    • spark stand alone env setup/build

20210404

  • Scala
    • partialFunction
  • Django
    • model
    • admin app

20210401

20210331

20210330

  • Scala
    • pattern matching "inner" expression : case first::second::rest => println(first, second, rest.length)
  • Java
    • mini project : CMutility
      • project summary
  • Hadoop
    • scp
    • sudo chown give file permission from root to user : code
  • Docker support file system

20210329

  • Scala
    • case class
  • Java
    • mini project : CMutility
      • "CustomView" delete client
  • Distcp
    • what if file already existed in the "destination path" ?
      • https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
      • By default, files already existing at the destination are skipped (i.e. not replaced by the source file). A count of skipped files is reported at the end of each job, but it may be inaccurate if a copier failed for some subset of its files, but succeeded on a later attempt.
    • atomic commit
      • https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
      • -atomic {-tmp <tmp_dir>}
      • -atomic instructs DistCp to copy the source data to a temporary target location, and then move the temporary target to the final-location atomically. Data will either be available at final target in a complete and consistent form, or not at all. Optionally, -tmp may be used to specify the location of the tmp-target. If not specified, a default is chosen. Note: tmp_dir must be on the final target cluster.

20210328

  • Scala
    • var match pattern
    • for loop match pattern
    • Nest class (inner, outer) review
  • Java
    • mini project : CMutility
      • "CustomView" delete/modify client
  • Django

20210326

  • Scala
    • pattern match with tuple
  • Java
    • mini project : CMutility
      • "CustomView" development
  • Flink
    • env set up (config, scripts) intro

20210325

20210324

20210323

20210322

  • Scala
    • value with pattern match
  • Spark-streaming
    • updateStatusBykey more examples

20210321

  • Airflow
    • dynamic workflows in DAG
  • Scala
    • pattern match "daemon"
    • pattern match more examples
  • Java
    • import
    • MVC more understanding

20210320

  • Scala
    • GENERIC CLASSES
    • match intro (pattern match)
  • Java
    • package intro
    • MVC intro
  • Spark-streaming
    • transform
    • updateStatusBykey

20210319

20210318

  • Scala
    • group op : stream, view, concurrent
  • Java
    • this example, this call constructor

20210311

  • Java
    • Encapsulation basic usage
  • Scala
    • flatMap, filter (functional programming)
  • Spark
    • executor memory
    • executor OOM
    • groupBykey
    • cache VS persist

20210310

  • Java
  • Scala
    • Map operation (functional programming)
    • high order function intro
      • ref
      • Functions that accept functions

20210309

  • Hive
    • make db, create table, load jar, load data, add partition : ref code
  • Bash
  • Scala
    • set
  • Java
    • Encapsulation implementation (getter, setter)

20210308

  • HDFS
    • filter : exclude files with pattern when copy via distcp
  • Java
    • anonymous object implementation

20210307

20210306

  • Scala

    • either : left, right

    • option : some, none

  • Spark-streaming

    • digest from kafka (low level api)
  • Hadoop

    • RM : resource manager : manage resources : ref
    • NM : node manager : manager for single node : ref

20210305

20210304

  • Scala
    • Queue : basic ops
  • Spark
    • spark read ORC data : ref
# pyspark
orc_data = spark.read.orc(orc_path)
orc_data.createOrReplaceTempView("orc_table")

20210303

  • Scala
    • List basics ops 1-3
    • tuple
    • Scala object <--> Java object
  • Java
    • recursion
    • method pass dynamic param

20210302

  • Scala
    • apply re-visit
    • case class VS case class instance
  • HDFS
    • stale datanode

20210301

  • Java
    • value transfer : basic data structure
    • value transfer : reference class/array
  • Scala
    • Java collections <--> Scala collections
    • 1-D, N-D (dimension) array
    • tuple
    • list
    • update list method : (1), (2)

20210228

  • Scala
    • dynamic array
    • 1-D (dimension) array
    • immutable, mutable relation

20210225

  • Java

    • Lambda function
    • array class in-memory
  • Scala

    • immutable and mutable
    • immutable and mutable layer

20210224

20210223

  • Scala

    • companion
    • Object VS class : ref
  • Java

    • static method/value....

20210221

  • Flink
    • flink save to HDFS
    • flink api with scala 2.12.X (to fix)
  • Luigi
    • allocate workers to jobs
  • Airflow
    • default config, init DB get DAG reloaded
  • Java
    • object, class in-momory
    • Spring RESTful
  • Scala
    • implicit value
    • implicit class
    • implicit method
    • implicit transformation
    • "class in class"
  • SBT
    • allocate more resources on scala/sbt build server : ref

20210217

20210215

  • Java
    • class im-memory -ref
    • class basics

20210214

20210210

  • Hadoop
    • Hadoop rebalancing
    • Hadoop NN active, standby (HA)
    • Hadoop config
    • Hadoop pseudo mode
    • HDFS formatting
    • Hadoop MR (map reduce) job (wordcount)
    • Hadoop check logs
  • Scala
    • trait (sth similar to java interface)
    • trait basics, trait "dynamic import"
    • trait implementation
  • Java
    • interface
  • Git

20210201

  • Scala
    • Companion, Singleton
    • Anonymity sub class
    • abstract class
  • Hadoop
    • HDFS trash
    • "small files" in Namenode
    • copy files
  • Java
    • Bit operation (>>, <<, ..)
    • logic operation (||, &&, |, &, ^, ...)

20210129

  • Hadoop
    • kerberos, core-site.xml...
  • Airflow
    • ssh to local machine (via insert setting to connections table in DB)
    • example
  • Scala
    • super method, re-write method

20210125

  • Hadoop
    • hadoop streaming concept
    • hadoop streaming arguments
    • hadoop streaming output
    • hadoop streaming avoid key as prefix :
    • ref1
  • Scala
    • super method in class
    • transform class type
    • rewrite method
  • Spark streaming
    • left, right join

20210124

  • Hadoop kerberos
  • Hadoop realms

20210123

  • Java Visibility of Variables and Methods
    • Visibility modifiers
      • default, public , protected, private
    • ref1
    • ref2
    • ref3

20210120

  • Scala

    • import packages
    • OOP design 1st part
  • Hadoop

    • namespace intro
    • connection between namenode, datanode
    • set up namespace in datanode
    • white, black list
  • Spark stream

    • join stream
  • Airflow

    • docker-compose airflow

20210114

20210112

  • Scala
    • constructor parameter, attribution
    • @BeanProperty
    • Scala class create steps
  • Java
    • basic data type revisit : char, double, float...
    • variable
    • operator

20210111

  • Spark-streaming
    • joining streaming to static source

20210110

  • System design
    • GFS (google file system)
    • big table
  • Scala
    • constructor
  • Java
    • constructor
  • Hadoop
    • haddop 1 VS hadoop 2
    • hadoop version
    • hadoop ecosystem (in layers)
    • hadoop architecture

20210108

  • Java
    • JDK, JRE, JVM
    • JDK : java development kit (for java program development), including JRE.
    • JRE : java runtime environment (offer environment for java program running), including JVM.
    • JVM : java virtual machine (the virtual machine that runs java program).
    • summary:
      • JDK = JRE + development kit ( e.g. javac ..)
      • JRE = JVM + JAVA SE library
    • ref
  • Scala
    • default value, class, class Polymorphism, more OOP
  • Spark-streaming
    • sliding window VS tumbling window
  • Hadoop
    • NameNode (nn) : storage metadata for data, e.g. : created_time, doc name, doc structure, partition, access level ..
    • DataNode (dn) : storage data (data block) and data block information
    • Secondary NameNode (2nn) : monitor hadoop backend processing, do snapshot on hadoop data

20210107

  • Scala
    • class attribute value, class member value
    • default value must with explicit type
  • Java
    • getter, setter :
      • the way encapsulation for field in the object.
      • The public access interface for private values/field interaction
      • In OOP, we want to keep some values "private", prevent them from changed by others, so we use setter set up the values, and getter get the values
      • ref
      • ref2
  • Hadoop
  • Spark-streaming
    • watermarking on windows
    • watermarking on output modes (append..)

20210106

  • Spark-streaming
    • watermark
    • watermark with window
  • Scala
    • error handling
    • exception
      • try-catch-finally
    • Scala OOP
      • everything in Scala is "object"

20210105

20210104

  • Hadoop
    • klist
    • kinit
    • hadoop client connect to clusters
    • hadoop compress
  • Flink
    • value broadcast
    • cache
    • distributed cache
    • ref code

20210103

20210102

  • Scala
    • recursion
    • function
      • without return : Unit
      • type inference
      • case class
  • Spark-streaming
    • working with kafka
    • kafka stream serialization & deserialization
    • kafka AVRO sinks
    • kafka AVRO sources
    • Stateless VS Stateful
  • Hive
    • general intro
    • architecture

20201226

  • Java
    • Ternary Operator
  • Scala
    • Control logic

20201225

  • Spark
    • Streaming basic
    • Streaming config
    • Streaming from port

20201224

  • Scala

    • Implicit
      • implicit is the way that you dont need to pass parameters explicitly in functions in Scala, but Scala will be able to find them from the implitict scope once you defined them. Use implicit can make your function more general and easy to import/deal with different cases per pattern
      • ref
    • Scala control logic (if else..)
  • HDFS

    • compression HDFS file
  • Flink

    • java code exmaple
    • pipeline workflow
    • clusters build doc
    • build project with maven
  • Airflow

    • work with macro, timestamp

20201223

20201222

  • Alluxio
  • HDFS
    • compression
    • file type
  • Scala
    • akka intro : ref

20201217

  • git

    • git stash
    • git stash list
    • git stash pop stash@{2}
    • git stash pop ( = git stash pop stash@{0})
    • git stash apply stash@{0} ( = git stash pop stash@{0} )
    • git stash drop stash@{0}
    • ref
  • hadoop distcp arguments

  • Scala

    • implicits
    • partial function
    • partial apply function

20201216

20201211

  • Hadoop
    • hadoop distcp
      • atomic
      • update
      • replace
  • SBT
    • sbt-docker
    • sbt publish
  • Scala
    • AppConfig
    • Configfactory

20201210

20201209

  • Kafka
    • consumer low level API
    • source code go through
  • Scala
    • Any, AnyValue, AnyRef
    • implicit transform
    • can give "low level" dtype to "high level" dtype, but not vice versa
  • Hive
    • alter table command
    • update schema
    • external VS internal table
    • ddl build, alter table
  • sbt
    • build-info
    • sbt version
    • sbt assembly, sbt plugin

20201125

  • Scala
    • comment -> auto generate API doc
    • var, val point to storage space
    • how scala use/re-write part of java lib as well as write itself one
    • sbt publish
  • Hive
    • repartition
  • Apache ORC
    • the smallest, fastest columnar storage for Hadoop workloads.
  • Jenkins
    • check .git when specific branch is merged/... then run

20201124

  • Spark
    • dataframe concat 2 / multitple columns
    • saveToTable/save partition by list of columns
    • data skew consideration (per executor)
  • Hive
    • external table
    • partition
    • create table from parquet file
  • Airflow
    • run hive distcp
    • run spark

20201124

20201116

  • Kafka
    • high level API
    • low level API
    • re-write method
    • re-load method
    • Java API consumer
    • Java API producer
  • Zookeeper
    • zookeeper file structure (storage for meta)
  • Apache Flume
  • Apache Nifi
  • Spark
    • RM (resource manager)
    • AM (application manager)

20201115

  • Kafka
    • Java API consumer source code
    • Java API producer source code

20201111

  • Hive

    • save partitioned hive tabl
    • insert file/HDFS file to existing hive table
  • Spark

    • set up metastore, warehouse path for hive IO
    • write df to hive with option

20201107

  • Redis
  • Java
    • project naming
      • "domain name inverse" + "project name" + "module" + "program type"
      • example:
        • com.yen + bigdata.spark + services + aaa.java
      • "module" : controller, service, bin...
    • Multi-threading
    • Multi-process
      • thread
      • runnable
      • callable
    • process cycle
      • NEW
      • RUNNABLE
        • READY
        • RUNNING
      • BLOCKED
      • WATING
      • TIMED_WATING
      • TERMINATED
    • process priority
    • process sleep
    • process yield
  • Flink
    • Watermark
      • ordering stream
      • non-ordering stream
      • multi-thread stream
    • Window

20201106

  • Luigi
    • luigi.Task
    • luigi basic concepts review
    • luigi get arg, config...
  • Hadoop
    • discp
  • Hive
    • hive partitioned table
    • spark save to hive partitioned table
  • Docker
    • spark/hadoop physical/pseudo memory using setting
  • Spark
    • run via yarn/client...
  • Python args

20201030

20201028

  • Shell
    • run 1 shell func inside the other shell func
  • Spark
    • Spark-submit tuning : memory usage caculation
    • network traffic
      • more data size -> more traffic -> cost more time

20201027

  • Spark
    • Spark-submit config
    • Spark-submit tuning
    • Spark-submit with different env
  • Java
    • build project with maven
    • maven commands
    • pom.xml set up
    • add dependency in pom.xml
    • unit test in java

20201026

  • Kafka
    • partition mode
    • offset background concept
    • load data with load

20201020

20201018

  • Flink
    • Broadcast (in DataStream/Flink)
  • Java
    • Random access files
    • serialize/deserialize
      • transform data into binary for
        • transmission
        • storage
    • Buffer
    • NIO

20201017

  • Scala

  • Distributed system

    • Load balancer -> "register center", e.g. Zookeeper
    • Kafka
  • Java

    • File IO
    • stream transformation
    • Spring framework
  • Flink

  • Zookeeper

    • as "information center, can do some cache"
    • python zookeeper client :
    • Load balancer -> Zookeeper

20201014

  • Python
    • multi - processing
      • tuning
      • get pid, parent pid..
      • start, join
      • example
  • HDFS
    • REST http API
    • webhdfs check
  • Hadoop
    • kerberos
    • core-default.xml
    • connection auth

20201008

  • HDFS
    • client connect to HDFS (file IO)
      • webhdfs -> simpe file move/OP
      • pyarrow -> lot of files, heavy OP, or serializatio
    • Java connect to HDFS

20201006

  • Java
    • Stream
    • Array List
  • Flink
    • Batch, Sink API
  • Spark
    • custom Spark shell port in config
  • Kafka
    • acks
      • acks=0 —the write is considered successful the moment the request is sent out. No need to wait for a response.
      • acks=1 — the leader must receive the record and respond before the write is considered successful.
      • acks=all — all online in sync replicas must receive the write. If there are less than min.insync.replicas online, then the write won’t be processed.

20201004

  • Kafka
    • Asyc producer (sending msg)
    • serializer - deserializer
  • Flink
  • Java
    • Map, HashMap, HashMapTree...

20200930

  • HDFS
    • hdfs copy, create directory, check file size
    • do above OP via python (subprocess, queue...)

20200929

  • Scala
    • Try orElse getOrElse
    • Try catch excaption
    • Try[Unit]
    • Any
  • HDFS
    • file op, compress, ls, mv...
  • Python
    • subprocess
      • check_output
    • persist-queue
      • persist-queue implements a file-based queue and a serial of sqlite3-based queues

20200928

  • RabbitMQ
    • Scala examples
    • RPC (Remote procedure call) example
  • API
    • GRPC
    • REST VS GRPC VS GRAPHQL VS OPENAPI...
  • Java
    • for each
    • HashSet
    • MapSet
    • Map
    • sorting
  • Git

20200927

  • Flink
    • config on Hadoop, Yarn
    • run flink via Hadoop, stand alone
    • scala Flink shell
  • Hadoop
    • Build hadoop namenode-datanode
    • set up zookeeper
    • think about HA

20200926

  • Kafaka
    • kfaka stream
      • join
      • transformation
      • group by
    • kafka unit test

20200924

  • RabbitMQ
    • config
    • inport/export queue setting
  • Elasticsearch
    • Define logging dtype
    • dtype

20200923

  • RabbitMQ
    • intro
    • simple sender, receiver model
    • working queue
    • broker (exchanger)
    • publish/subscribe

20200920

  • Flink
    • cluster mood
      • standalone
      • Flink on yarn
    • Flink works with HDFS/yarn..

20200919

  • Java
    • System & Runtime class
    • Math & Random class
    • Java basic data strcture class (byte char short long int Boolean float double )
  • Scala
    • logger (log4j)
  • Kafka
    • Partitioner

20200918

  • Scala
    • date format
    • long
    • log4j
    • joda-time
    • json4s

20200915

20200913

  • ES
  • Java
    • garbage collection (GC)

20200911

  • Java
    • Error handling
      • error
      • exception
    • try{} catch{Exception e}
    • throws
    • finally

20200910

  • Scala
    • Some
    • Future (in JS as well ?)
    • Finagle (lib for API/http call)
    • import, re-write class, trait, method...
    • Getters, Setters

20200909

20200908

  • Scala
  • RabbitMQ
  • Spark
    • regex expression in spark

20200907

  • Scala
    • case class
    • Sealed Class
    • import from compiled jar
    • json4s
  • Spark
    • send df to ES

20200906

  • Flink
    • intro

20200905

  • Java
    • Lambda internal class
    • Lambda function
    • functional programming

20200904

  • Dev-op
    • Ansible
  • IntelliJ
    • ctrl + ctrl (in IntelliJ console) => find "main" script
  • Scala
    • Twitter-server
  • SBT
    • sbt run

20200903

  • Git

    • git fetch VS git clone
      • git clone = git fetch + git merge
      • git fetch : only copy files from remote branch to local branch, NO MERGE
      • git clone : copy files from remote branch to local branch, AND MERGE
    • git merge
    • git cherry-pick
    • git rebase
    • examples
  • Java

    • Polymorphism
  • Scala

    • implicit
  • BQ

    • AMZ Leadership Principles

20200830

  • Scala
    • Implicit
  • Dev-op
    • Ansible playbook
  • Invest
    • Stock exposure

20200829

20200828

20200827

  • Spark
    • load parquet

20200826

  • Java
    • object class
    • rewrite method
    • final
  • Scala
    • UpperCass
    • option
    • find
    • Some
    • exists
    • contains
    • isDefined

20200825

  • Git
    • git rebase
    • git rebase --continue
    • git rebase --abort
    • git pull = git clone + git merge

20200823

20200822

About

Today I Learned