When making the following call:
PCollection<KeyValue> data1 = pipeline.read(source1);
PCollection<KeyValue> data2 = pipeline.read(source2);
PCollection<KeyValue> data3 = data1.union(data2);
According to Apache Crunch read documentation, is the same pipeline used to read from both sources, and then the data are joined together?
Hi i am trying to do a mapside join in crunch using MapsideJoinStrategy class. It is working fine for inner join but it gives this error for full outer join :" Join type FULL_OUTER_JOIN not supported by MapsideJoinStrategy"
When you download Apache Crunch from their website (it comes as source code), it comes without the related MapReduce classes it's based on. Two questions:
1- How is this possible? Apache Crunch is an abstraction on top of MapReduce. How come it isn't packaged with the MapReduce classes?
2- What do I need to do to develop using Apache Crunch? Do I need to download Crunch and MapReduce separately? If so, how can I know which MapReduce version I need to match the Crunch version?
I am new to use crunchbase api.I am getting trouble to make schema for dataset. which i get in the form of JSON for
Its difficult to find relationship in between JSON.Is any way to create schema easily in mysql and persisit data?
hi i am working on a crunch job using mapside join strategy where I am able to process this job using MemPipeline but I am failing to run this job using MRPipeline don't why this is happening ..
I am trying to construct a MapSideJoinStrategy on two PTables passing small table on left , bigger one on right it is failing to create a join on these tables .. it joins those tables using MemPipeline fails on MRPipeline .. any input on this appreciated ..