Multiple dataframes for large data
I have a dataset that looks like the following: (has around 500 columns)
custid date store abc efg hij klm … xyz
1 1-Feb-13 a 2 0 2 1 1
1 5-Feb-13 c 0 3 3 0 0
1 9-Feb-13 a 3 3 0 0 1
1 31-Mar-13 a 3 0 0 0 0
As can be seen, abc, efg, hij are names of products being sold.... there
are 500 such columns for each product....it has sales for different
products by each trip of customer...
What i essentially need is to create 500 data frames (NOT a list), such
that each of the data frame will contain the column for that product and
other common columns like CustID,Date,store... So dataframe for productabc
will have the following columns only:
custid date store abc
1 1-Feb-13 a 2
1 5-Feb-13 c 0
1 9-Feb-13 a 3
1 31-Mar-13 a 3
On top of above, i need to filter only those rows that contain >0 value
for that product...So above will be transformed to:
custid date store abc
1 1-Feb-13 a 2
1 9-Feb-13 a 3
1 31-Mar-13 a 3
I thought i will put the product names abc, efg etc in a list and loop
through it, while i also create new variables on each of the product
datasets... So, i will also need a lag variable, a days between trips
variable on each of the brand level datasets.... I want to do this in such
a way that in one for loop, i can generate the product level
datasets...Something like below (it's not R-like, but please help)
colnames_df<-colnames(df[(c(4:500)]---- This will have the product names
in a dataframe/list called colnames_df
Then, i want to loop through this colnames_df in such a way that, when
loop is at beginning, the dataset for the first product, that is, abc
should be created as above and so on... And when abc is created, there
should also be Lag variable, days between trips variable by store..How do
i do this?. I want to leverage loops extensively here.. (see expected
final output for each product level dataframe below...)
custid date store abc lagdate daysbetweentrips
1 1-Feb-13 a 2 -
1 5-Feb-13 a 3 1-Feb-13 4
1 31-Mar-13 a 3 5-Feb-13 26
I have been going round and round with my question, but somehow not able
to address this directly. Any help is appreciated...
Thanks!
No comments:
Post a Comment